1. Introduction
Fluoride is a naturally occurring groundwater constituent which becomes a significant public-health issue when present at elevated concentrations [
1,
2]. Prolonged use of drinking water containing high content of fluoride has been linked to systemic health consequences such as dental fluorosis and, in the long term, skeletal fluorosis. In turn, the World Health Organization (WHO) provides a guideline range of 1.5 mg/L for drinking water, while acknowledging the fact that local requirements can be adjusted to the climate, extent of water consumption, and cumulative exposure pathways [
3]. Groundwater dependence and hydrogeochemical conditions that support fluoride mobilization cause persistent exceedance hotspots in many semi-arid to arid regions, making fluoride one of the most widespread geogenic contaminants worldwide [
4,
5].
In groundwater-dependent communities, untreated groundwater is a major fluoride exposure pathway. Long-term consumption of water exceeding the WHO guideline of 1.5 mg/L may increase the risk of dental and skeletal fluorosis, particularly in semi-arid regions where water intake is relatively high. Therefore, identifying high-fluoride wells is essential for safe drinking-water management and public-health protection [
6]. South Asia is one of the significant world regions that face intense constraints of groundwater quality because of excess population, high-intensity irrigation, and use of alluvial aquifers that can be classified as shallow to shallow–intermediate [
7]. In the Indus Basin and related doab systems, the behavior of fluoride is very much governed by the hydrogeochemical evolution particularly when the conditions are characterized by low calcium activity, high bicarbonate and sodium and when there is active exchange of ion activity. These states improve the solubility of fluoride and inhibit precipitation of fluorite, which favors enrichment of ground water [
8,
9]. Hydrochemical monitoring of the Punjab plains at large scales has recorded spatially organized fluoride concentrations and attributed them to fluorite equilibrium, alkalinity of carbonate and slowly progressive water–rock interaction [
8]. On smaller scales, isotope-hydrochemical measurements in the Rechna Doab have also shown salinity evolution and residence time of groundwater to be controlling factors of fluoride occurrence [
10].
Pakistan’s Punjab Province is especially susceptible because groundwater serves as the primary source of consumption and irrigation on the domestic level, and the aquifer systems are progressively strained due to the agricultural expansions, urbanization, and industry. Several studies conducted in Punjab have documented that there are high concentrations of fluoride in drinking-water wells and emphasized related health problems [
11,
12,
13,
14]. These analyses showed that fluoride contamination occurs in spatially heterogeneous pockets rather than uniformly across aquifers, implying localized hydrogeochemical controls and heterogeneity. In spite of the mounting evidence, significant knowledge gaps remain at the sub-district level, especially in emerging agricultural–industrial corridors where groundwater chemistry reflects both geogenic and anthropogenic impacts [
15,
16]. With the development of hydrogeochemistry, data-driven methods also have been used tangibly more in the prediction and mapping of groundwater quality.
Machine-learning (ML) techniques are also particularly suited to hydrochemical systems as they can capture nonlinear relationships and higher-order interactions between environmental predictors that challenge traditional statistical approaches [
17]. Nevertheless, numerous groundwater ML studies [
18,
19,
20] lack validation rigor, a well-tested model-reliability assessment, and interpretability, making them less trusted and hydrogeochemically informed. Explainable machine-learning techniques, such as Shapley Additive Explanations (SHAP), explicitly quantify the contribution of each predictor to model output and link model predictions with environmental mechanisms, thereby improving interpretability and scientific validity [
21]. Recent fluoride prediction experiments [
22,
23] have demonstrated that boosting algorithms with SHAP-based interpretation can clarify geochemical factors and improve risk communication.
There are two methodological gaps, which are crucial for developing groundwater fluoride modeling. To begin with, some predictive studies give great emphasis to discrimination metrics (ROC-AUC) whilst being insufficient in assessing probability calibration, which is essential to implement risk-based screening and decision-making [
24]. Secondly, deep learning methodologies for tabular hydrochemical datasets are still poorly explored. Feature-tokenizer transformers (FT-Transformer) present a promising approach for understanding complex feature interactions in structured environmental data; however, their performance, robustness, and interpretability in hydrogeochemical applications in the context of South Asian alluvial aquifers have largely been unproven [
21,
24].
To address these gaps, this study examines groundwater fluoride dynamics in Tehsil Jaranwala (Faisalabad District, Punjab, Pakistan) within the Rechna Doab alluvial aquifer system, where fluoride exceedance is significant issue. An integrated predictive framework is developed that (i) models fluoride concentration as a regression task and (ii) classifies exceedance risk relative to the 1.5 mg/L guideline threshold. ML models including linear regression, random forest, and gradient boosting are combined with a deep tabular learning model and validated using nested cross-validation and independent test analysis. In addition to predictive accuracy, the research assesses probability calibration, model resilience, and SHAP and permutation importance to synthesize model behavior into hydrogeochemical insight.
This study introduces a unified framework for the predictive modeling of groundwater quality using both hydrogeochemical analysis and explainable machine-learning algorithms. A dual task modeling strategy is developed to predict fluoride concentration and exceedance risk against WHO drinking-water guidelines, facilitating both quantitative estimates and risk-oriented groundwater screening. The study uses nested cross-validation and independent testing for rigorous validation, ensuring the predictive performance is reliable and unbiased. Additionally, explainable AI methods such as SHAP and permutation importance are utilized to connect model predictions with the hydrogeochemical processes affecting fluoride mobilization. By linking the data-driven insights with geochemical mechanisms, the proposed framework provides interpretable and reliable predictions that support targeted groundwater monitoring, risk assessment and enhanced understandings of fluoride contamination in intensively used alluvial aquifers in Punjab.
2. Materials and Methods
This section describes the study design, data preparation, and modeling approach for groundwater fluoride concentration and exceedance risk. To ensure accurate predictions, it covers model construction, optimization, evaluation, interpretability, calibration, and robust analysis.
2.1. Study Area
The research was conducted in Tehsil Jaranwala, Faisalabad District, which is one of the largest sub-districts in Punjab, Pakistan, located within the Rechna Doab, the interfluvial tract in the Ravi–Chenab rivers, as indicated in
Figure 1. This area is a part of the highly fertilized Indus Basin irrigation complex, where groundwater is the main source of domestic and agricultural water. It is a semi-arid climate, marked by hot summer and mild winters along with highly seasonal precipitation; most of the rain falls during the summer monsoon (June–September), supplying occasional groundwater recharge [
25,
26]. The geologic composition of the aquifer system is thick uncompacted quaternary alluvial beds of interlaced sand, silt and clay beds. These are laterally extending in nature and become multi-layered aquifers of sands providing the groundwater with high permeability to move vertically while few clayey interbeds locally vegetate with vertical mobility and transport solutes and hydrochemical evolution. Its major sources of recharge are rainfall infiltration and seeping of canals in the presence of the extensive irrigation system, but groundwater abstraction to support irrigation and household purposes has also led to a decrease in the water table and a limited-scale worsening of water quality [
27,
28].
Hydrogeologically, Tehsil Jaranwala belongs to the Rechna Doab aquifer system within the broader Indus Basin alluvial aquifer. Groundwater flow and residence time are controlled by regional hydraulic gradients, pumping intensity, recharge from the canal-irrigation network, and local lithological variability. These basin-scale hydrogeological conditions influence groundwater mineralization, salinity evolution, water–rock interaction, and the mobility of fluoride-bearing minerals [
27,
28]. Land use is primarily characterized by irrigated agriculture (wheat, rice, sugarcane, and fodder crops), while the growing urban and industrial expansion around Faisalabad intensifies the strain on groundwater due to fertilizer application, irrigation return flow, and wastewater inputs. Fluoride contamination has become a growing concern in Punjab’s alluvial aquifers where the levels may surpass the WHO drinking-water guideline of 1.5 mg/L in spatially heterogeneous pockets. Fluoride enrichment is predominantly controlled by geogenic and hydrochemical processes, including dissolution of fluoride-bearing minerals, longer water–rock interaction, alkaline conditions, ion exchange, carbonate equilibrium, reduced Ca
2+ activity, and evolution toward Ca-depleted Na-HCO
3-type groundwater [
29]. The dataset used in this work contains groundwater samples tested in respect of fluoride along with all kinds of physicochemical and major-ion parameters to be the core of quantitative modeling of fluoride level and exceedance risk in a high-use alluvial aquifer system.
Table 1 summarizes the descriptive statistics of 350 groundwater samples collected once from wells across Tehsil Jaranwala during February–September. The dataset represents a single sampling campaign covering part of the agricultural season, rather than continuous monitoring of the full agrarian cycle. Sampling sites were selected to cover major agricultural, settlement, and groundwater-use areas and to represent spatial variability in groundwater quality. Based on the mapped study-area extent of approximately 1770 km
2, the estimated sampling density was about 0.20 samples/km
2. Detailed well-depth information was not available for all sampling locations; therefore, vertical variability in fluoride occurrence could not be evaluated in this study. High variability in EC, TDS, Na
+, Cl
−, and SO
42− indicates substantial hydrochemical heterogeneity, likely related to variable recharge, groundwater residence time, and water–rock interaction. Fluoride shows a positively skewed distribution with localized exceedances above the WHO guideline, indicating spatially heterogeneous contamination rather than uniform aquifer-wide enrichment. The measured parameters included pH, EC, turbidity, TDS, Ca
2+, Mg
2+, TH, HCO
3−, alkalinity, Cl
−, K
+, Na
+, SO
42−, NO
3−, and F
−.
2.2. Sampling, Analytical Methods, and Quality Control
Groundwater samples were collected from selected wells and hand pumps across Tehsil Jaranwala and analyzed for physicochemical and major-ion parameters. Field parameters, including pH, electrical conductivity (EC), total dissolved solids (TDS), and turbidity, were measured using calibrated portable meters. Major cations and anions were analyzed using standard laboratory procedures: Na+ and K+ by flame photometry; Ca2+ and Mg2+ by EDTA titration; alkalinity and HCO3− by acid titration; Cl− by argentometric titration; and SO42−, NO3−, and F− by UV–visible spectrophotometry. Analytical-grade reagents, including EDTA, standard acid solution, silver nitrate, barium chloride, nitrate reagents, fluoride standards, and SPADNS reagent, were used with deionized water. Quality assurance and quality control were maintained through instrument calibration with standard solutions, calibration records, reagent blanks, duplicate sample analysis, and analytical precision checks. Major-ion reliability was further evaluated using ionic balance error between total cations and total anions expressed in milliequivalents per liter. Samples with unacceptable analytical deviation were rechecked before statistical analysis and model development. Overall, all measurements and analyses followed standard groundwater-quality analytical procedures to ensure data reliability.
2.3. Study Design and Analytical Framework
This study proposes and empirically validates a unified predictive framework for groundwater fluoride concentration by integrating hydrogeochemical predictors with machine-learning and deep tabular learning models. The overall methodological workflow is illustrated in
Figure 2. The analytical workflow was executed with a sequential pipeline consisting of thorough data quality screening and preprocessing to minimize possible bias and achieve methodological rigor, followed by building target variables for regression and classification. The models were trained with systematically hyperparameter optimization and were eventually tested on one independent hold-out test set. The model reliability was determined through the discrimination and calibration diagnostics of the classification exercise. The interpretability was considered based on feature-attribution and importance analysis to relate patterns of predictions to the potential hydrogeochemical constraints. Noise-sensitivity experiments were used to test the stability of the predictions between experiments to measure uncertainty and hydrochemical variation.
2.4. Data Preparation, Predictors, and Target Definition
The dataset consisted of 350 groundwater samples collected from Tehsil Jaranwala, Rechna Doab aquifer, Punjab, Pakistan. Each sample included measured physicochemical and major-ion parameters. A total of 15 hydrochemical variables were available: pH, EC, turbidity, TDS, Ca2+, Mg2+, TH, HCO3−, Cl−, K+, Na+, SO42−, NO3−, F−. For machine-learning modeling, fluoride concentration was treated as the target variable for the regression task, whereas the remaining 14 physicochemical and hydrochemical parameters were used as predictor features. The data matrix was structured as X ∈ Rn×p, where n represents the number of groundwater samples and p = 14 represents the predictor variables. The regression response variable was the measured fluoride concentration in mg/L. For classification, a binary exceedance-risk label was generated using the WHO drinking-water guideline of 1.5 mg/L. Samples with F− > 1.5 mg/L were classified as high-risk groundwater samples (Risk = 1), whereas samples with F− ≤ 1.5 mg/L were classified as low-risk groundwater samples (Risk = 0).
Before modeling, the dataset was screened to ensure data quality and consistency. Duplicate records were removed, parameter names and units were standardized, and all variables were checked for missing values, non-physical values, and abnormal entries. Descriptive statistics were calculated to examine data ranges, central tendency, dispersion, and coefficients of variation. Highly skewed variables were inspected, and preprocessing was applied only within the training pipeline to avoid information leakage. The dataset was then stratified according to the fluoride-risk class and divided into training, validation, and independent test subsets using a 64:16:20 ratio. The independent test set was reserved strictly for final model evaluation. Early stopping and learning-curve monitoring for the FT-Transformer were conducted only on the validation set, while classical machine-learning models were optimized using nested cross-validation.
2.5. Model Development and Hyperparameter Tuning
The regression analysis compared three model families: linear regression (base), XGBoost regressor, and random forest regressor. Classification analysis compared logistic regression (baseline), XGBoost classifier, and random forest classifier. Class imbalance was addressed by using class weighting for logistic regression and random forest, and scale-pos-weight parameter for XGBoost. Nested cross-validation was used for model selection and unbiased performance estimation, minimizing optimistic bias in optimized models [
30]. Performance was estimated by a five-fold outer loop to yield unbiased estimates, and hyperparameter optimization was carried out through a five-fold inner loop. The regression models were optimized to negative RMSE, and the classification models were optimized for ROC-AUC. Final regression model was selected based on the minimum mean outer-fold RMSE. The mean outer-fold recall (sensitivity) was the main criterion, and ROC-AUC was the secondary criterion for selecting the final classification model, complying with the screening goal to minimize false negatives. The selected models were chosen to represent different levels of model complexity and interpretability. Linear and logistic regression served as transparent baseline models, while Random Forest and XGBoost were used to capture nonlinear relationships and interactions among hydrochemical variables. The FT-Transformer was included to evaluate the potential of attention-based deep tabular learning for modeling higher-order feature interactions. This model selection enabled a systematic comparison of predictive accuracy, interpretability, and complexity for fluoride concentration prediction and exceedance-risk classification. All statistical analyses and machine-learning models were implemented in Python using the scikit-learn, XGBoost, SHAP, PyTorch, and PyTorch Tabular packages.
2.6. FT-Transformer Implementation and Learning Behavior
The FT-Transformer was implemented to evaluate whether attention-based deep tabular learning can better capture nonlinear hydrogeochemical interactions compared to tree-based models. A tabular deep-learning model based on the FT-Transformer architecture was trained for both regression and classification to represent complex nonlinear interactions between hydrogeochemical predictors. The FT-Transformer, tailored to tabular data, considers each input feature as a single token, allowing the model to learn higher-order interactions using self-attention [
31]. Assuming a predictor vector
, the
hydrogeochemical feature is represented in a continuous embedding space by a learnable feature tokenizer:
where Embed (.) is executed as a learnable projection layer, and
represents the embedding dimension. This allows viewing of each hydrochemical variable as a token in a transformer sequence. Multi-head self-attention between feature tokens using the scaled dot-product attention mechanism captures contextual dependencies between hydrochemical variables, e.g., interactions between salinity indicators and major ions:
where
Q is the query,
K is the key, and
V is the value projection of the embedded tokens. Feed-forward layers and stacked attention blocks learn complicated nonlinear relationships that dictate fluoride behavior. For classification, a pooled feature representation
is passed to a prediction head to estimate exceedance probability:
where
refers to the sigmoid activation function. Binary cross-entropy loss optimizes model parameters:
Regression model reduces the mean squared error between the measured and the predicted fluoride concentration. The FT-Transformer was trained with the pytorch-tabular framework, consisting of four attention blocks, eight attention heads, a learning rate of , and 0.1 dropout. Training continued up to 200 epochs, and early stopping (patience = 15) on a validation set to avoid over-fitting. Learning dynamics was studied through the analysis of training and validation loss curves to understand convergence and generalization stability. The final model performance was tested on the independent test set through the same regression and classification indicators used in the prediction of traditional ML models, enabling direct comparison of accuracy and reliability.
2.7. Explainable AI Analysis
To increase the transparency of the model, and to associate predictive behavior with hydrogeochemical processes, explainable artificial intelligence was implemented using two complementary approaches, i.e., SHAP and permutation-based feature importance. SHAP provides a game-theoretic attribution model that decomposes the model output as a sum of additive contributions of each predictor [
32,
33]. In the case of a trained model
, the prediction of a sample x is given as:
In which
is the Shapley value of the marginal contribution of feature
to the prediction relative to the expected model output (
. TreeExplainer computed SHAP values on the highest-performing tree-based model (Random Forest or XGBoost) on the fully preprocessed feature space ensuring compatibility with the trained prediction pipeline. The mean absolute SHAP magnitude provided global feature importance, whereas the dependence analysis analyzed nonlinear response patterns and feature interactions of importance to fluoride mobilization. Permutation importance, which is a model-agnostic and independent measure of feature relevance, was obtained by randomly permuting each predictor and assessing the resulting predictive performance degradation. The importance of a variable, feature
, is measured as:
where
refers to the evaluation loss (RMSE in regression) and
is the loss evaluated on the dataset with randomly permuted feature
. The increased performance degradation means more feature influence. To ensure the robustness of SHAP attribution towards hydrogeochemical drivers and not model-specific artefacts, consistency between SHAP attribution and permutation importance was employed. XAI analyses enable internal and interaction-level interpretation, relating machine-learning predictions to well-known hydrogeochemical controls of fluoride behavior.
2.8. Performance Metrics and Diagnostic Evaluation
The model performance was measured using a complementary metric that measured prediction accuracy, discrimination, ability to withstand class imbalance, and probability reliability.
Regression metrics: Coefficient of determination
was used to measure the predictive power of fluoride concentration, mean absolute error (MAE), and root-mean-squared error (RMSE):
where
,
, and
are the observed, predicted, and the mean observed fluoride concentrations of the samples, respectively, and
signifies the sample number. RMSE penalizes large errors more severely than MAE [
34].
Classification metrics: To predict exceedance risk, ROC-AUC [
35] measured the discriminatory performance of a model without reference to a decision point. The threshold-dependent metrics were calculated after transforming the predicted probabilities
into class labels with a default threshold of 0.5. Sensitivity (recall), precision, F1-score, and balanced accuracy were calculated as:
where
,
,
, and
are the true positives, true negatives, false positives, and false negatives, respectively. Balanced accuracy addresses the imbalance of classes in predicting exceedance risk by averaging sensitivity and specificity.
Probability calibration and reliability: Since exceedance risk is viewed in terms of probability, the calibration quality was measured in the form of the Brier score [
36]:
where true risk value is given by
and
denotes the predicted exceedance probability. A smaller Brier value implies a higher probability calibration. To assess the correspondence between predicted occurrence probability and observed event frequency, both 10-bin and 20-bin probability discretization were used to construct calibration (reliability) curves. Further diagnostic visualizations such as ROC curves, precision recall curves [
37], confusion matrices and calibration curves were supplementary with respect to discrimination, threshold behavior, and reliability.
4. Discussion
This study shows that fluoride contamination in the Rechna Doab alluvial aquifer is mainly associated with hydrogeochemical evolution rather than a single-solute anomaly. Fluoride concentrations ranged from 0.00 to 3.39 mg/L, with a mean of 0.88 mg/L, and 70 samples (20%) exceeded the WHO drinking-water guideline of 1.5 mg/L. The positively skewed distribution indicates localized geogenic enrichment rather than uniform aquifer-wide contamination. Similar spatially heterogeneous fluoride patterns have been reported in Punjab aquifers, where fluoride enrichment is commonly linked to water–rock interaction, aquifer heterogeneity, and chemically evolved groundwater [
11,
39,
40]. Since groundwater is widely used for domestic drinking purposes in the study area, wells exceeding 1.5 mg/L fluoride should be prioritized for monitoring, public awareness, and suitable treatment or alternative water-supply options. The strong difference between linear and nonlinear models confirms that fluoride behavior is controlled by interacting hydrogeochemical processes. The linear baseline explained less than half of the fluoride variability, whereas Random Forest and XGBoost achieved substantially higher accuracy. On the independent test set, XGBoost showed strong predictive performance with R
2 = 0.877, RMSE = 0.189 mg/L, and MAE = 0.084 mg/L. This indicates that the predictor set captured a clear hydrochemical signal of fluoride mobilization. Such behavior is hydrogeochemically reasonable because fluoride enrichment in alluvial aquifers is often controlled by progressive mineral dissolution, ion exchange, alkalinity buffering, salinity evolution, and groundwater residence time.
The interpretability results further support the hydrogeochemical relevance of the models. SHAP and permutation-importance analyses identified EC, Mg
2+, and Ca
2+ as key predictors. EC reflects groundwater mineralization and may increase with longer residence time, evaporation, irrigation return flow, and water–rock interaction. Calcium has a direct regulatory role because higher Ca
2+ activity can limit dissolved fluoride through fluorite precipitation, whereas Ca-depleted Na-HCO
3-type groundwater favors fluoride enrichment. Magnesium may indicate carbonate and silicate weathering, dolomite/calcite interaction, and ion-exchange processes under alkaline conditions. Agricultural inputs, atmospheric deposition, soil weathering, groundwater flow path, and residence time may also influence the Ca–Mg–Na balance and indirectly affect fluoride mobility. The fluoride occurrence observed in this study is comparable with previous groundwater studies in Pakistan and nearby Punjab aquifer systems. For example, previous studies in South Punjab and the wider Punjab plains reported fluoride exceedances above the WHO guideline and linked fluoride enrichment to water–rock interaction, fluoride-bearing mineral dissolution, ion exchange, low Ca
2+ activity, and alkaline Na-HCO
3-type groundwater [
12,
14]. Machine-learning-based studies in Pakistan and Punjab, India, also showed that Random Forest and XGBoost-type models are effective for predicting groundwater fluoride because they capture nonlinear relationships among hydrochemical predictors [
41,
42]. Compared with previous studies, the present work adds calibration analysis, FT-Transformer modeling, and SHAP-based hydrogeochemical interpretation.
The exceedance-classification task is important for groundwater management because intervention is usually threshold-based. Cross-validation showed strong discrimination among classifiers, confirming that hydrochemical signatures can distinguish high- and low-fluoride groundwater. On the independent test set, the FT-Transformer achieved ROC-AUC = 0.948, recall = 0.875, precision = 1.000, F1-score = 0.933, and Brier score = 0.021, indicating reliable classification and probability calibration. The confusion matrix showed only one false negative, which is important because missed high-fluoride wells may increase public-health risk. Robustness analysis also showed stable RMSE under noise perturbation, suggesting that the model predictions were driven mainly by consistent hydrogeochemical signals rather than random variation. Overall, the results demonstrate that combining hydrogeochemical indicators with explainable and calibration-aware machine-learning models can provide reliable support for groundwater fluoride screening. The framework is useful for sub-district-scale risk ranking, targeted monitoring, and prioritization of wells requiring treatment or alternative drinking-water supply.
Limitations
Although the proposed framework achieved accurate and interpretable predictions of groundwater fluoride contamination, some limitations should be noted. First, the dataset was limited to Tehsil Jaranwala in the Rechna Doab aquifer; therefore, model transferability to other aquifers requires external validation or retraining with local data. Second, the study mainly used physicochemical and major-ion parameters, while additional variables such as well depth, aquifer lithology, groundwater age, pumping rate, land use, seasonal water-table variation, and mineral saturation indices could improve interpretation and prediction. Third, machine-learning models identify statistical relationships but do not directly prove causality; therefore, future studies should combine ML with hydrogeochemical modeling, isotope analysis, and long-term monitoring. Finally, the relatively small number of high-fluoride samples and the lack of multi-season sampling may limit classification stability under changing recharge, evaporation, and irrigation-return-flow conditions.
5. Conclusions
This study demonstrates that integrating hydrogeochemical indicators with explainable machine learning models can support the reliable identification of fluoride contamination risk in the Rechna Doab alluvial aquifer. The results show that fluoride enrichment is spatially heterogeneous and mainly associated with groundwater mineralization, reduced Ca2+ activity, Mg2+ variation, ion exchange, carbonate equilibrium, and water–rock interaction. Nonlinear and deep tabular models provided strong predictive performance, while SHAP and permutation-importance analyses helped link model outputs with hydrogeochemical processes. These findings support the use of data-driven tools for targeted groundwater monitoring, risk-based screening, and drinking-water management in areas affected by fluoride. Future studies should include multi-season sampling, well-depth information, hydrogeological layers, mineral saturation indices, isotope data, and external validation datasets to improve model transferability and mechanistic interpretation.