Fluoride Risk Prognostication: A Pioneering Ensemble Machine Learning Approach for Groundwater Contamination Prediction in Parts of the East Coast of India

Alok Kumar Pati; Alok Ranjan Tripathy; Debabrata Nandi; Rakesh Ranjan Thakur; Bojan Ðurin; Dragana Dogančić; Osman Fetoshi

doi:10.3390/w17060909

,

and

¹

Department of Computer Science, Ravenshaw University, Cuttack 753003, Odisha, India

²

Department of Computer Science, Institute of Management and Information Technology, Cuttack 753008, Odisha, India

³

Department of Remote Sensing & Geographic Information System, Maharaja Sriram Chandra Bhanjadeo University, Baripada 757003, Odisha, India

⁴

Odisha Space Applications Centre, Science and Technology Department, Bhubaneswar 751023, Odisha, India

Water2025, 17(6), 909;https://doi.org/10.3390/w17060909

This article belongs to the Special Issue The Qualitative and Quantitative Management of Groundwater Resources in Urban Areas

Version Notes

Order Reprints

Abstract

Groundwater fluoride levels have begun to be a global concern, posing significant challenges to the safe utilization of water resources and mitigating potential impacts on human health. Chronic exposure to elevated levels of naturally occurring fluoride in groundwater affects millions worldwide. Prolonged exposure can lead to health issues such as dental fluorosis and skeletal fluorosis. The World Health Organization (WHO) has established a maximum fluoride concentration guideline of 1.5 mg/L for drinking water. However, groundwater quality is not regularly tested in many regions, leaving communities unaware if water sources, such as wells and springs, contain harmful fluoride levels. In the Balasore area, Odisha, India, rising fluoride concentrations and spatial variability necessitate accurate predictions for effective groundwater management. This article proposes four predictive models, Random Forest (RF), Support Vector Regression (SVR), Gradient Boosting (XGBoost), and Stacking Regressor (SR), to estimate fluoride concentrations using physicochemical parameters and sampling depth as predictor variables. The performance of these models is assessed using the coefficient of determination (accuracy), mean square error (MSE), and mean absolute error (MAE). This article compared fluoride concentrations of machine learning models, including SR, RF, XGBoost, and SVR, in groundwater in Balasore, Odisha. Based on predictive performance, an SR model yielded the lowest MSE and MAE scores at 0.01817 and 0.10327, respectively. These findings underscore the superiority of ensemble learning approaches in addressing complex datasets and provide a robust framework for effective groundwater fluoride management. This article highlights the potential of advanced machine learning in improving public health outcomes in fluoride-affected regions. Finally, the recommendations for decreasing the fluoride concentrations and the guidelines for future research will be proposed.

Keywords:

groundwater fluoride; RF; SVR; XGBoost; SR; water quality

1. Introduction

Groundwater contamination by fluoride (F⁻) is a significant global environmental concern, particularly in countries like India, where millions depend on groundwater for drinking and agriculture [1,2,3,4,5]. Fluoride levels between 0.5 and 1.5 mg/L are essential for human health, aiding bone and teeth development; however, concentrations above 1.5 mg/L pose severe health risks, including dental and skeletal fluorosis, hypertension, renal failure, and cancer [6,7,8,9,10]. With over 200 million people worldwide affected by high fluoride levels, the issue is critical in India, where 66 million, including 6 million children, are at risk [11,12,13,14]. The district of Balasore in Odisha, India, known for its agriculture, faces elevated fluoride contamination in groundwater due to the natural leaching from fluoride-bearing minerals like fluorite, mica, and apatite, which has been exacerbated by high alkalinity, low calcium levels, and sodium bicarbonate-type water [15,16,17]. Environmental factors such as the tropical climate, high evaporation, slow groundwater recharge, and human activities like phosphate fertilizer use, industrial waste disposal, and over-extraction for irrigation further worsen the problem [18,19,20]. Despite the health risks, comprehensive fluoride studies in Balasore are limited. Recent research demonstrates the potential of machine learning in predicting groundwater fluoride, as shown in studies across different regions [21,22]. For instance, [23] used a Random Forest for prediction in Pakistan, suggesting further accuracy improvements by including hydrological parameters [24]. At the same time, [25] applied a Random Forest model in the U.S., relying on regional variables like evapotranspiration. In India, [26] utilized various classifiers, including KNN, SVM, and XGBoost, to emphasize the importance of relevant feature selection. [27] ELM outperformed other models in Punjab, India, and highlighted data pre-treatment’s role in achieving better results. These studies underscore machine learning’s promise for groundwater quality prediction, with room for enhanced accuracy through additional predictor variables and data refinement techniques [28,29,30,31].

Traditionally, groundwater quality monitoring relies on field-based surveys and laboratory analysis, which are time-consuming and resource-intensive [32,33,34]. This makes developing effective groundwater management strategies for regions like Balasore challenging. To address this gap, a growing interest is in utilizing machine learning (ML) techniques for predicting groundwater contamination [35,36,37,38]. These advanced models offer the advantage of handling complex and nonlinear relationships between environmental parameters. Thereby, Random Forest (RF), Gradient Boosting (XGBoost), and Stacking Regressor (SR) have demonstrated robust performance in predicting fluoride levels in groundwater [39,40,41,42]. Recent studies have shown that machine learning models outperform classical geospatial and numerical models due to their flexibility, efficiency, and ability to handle large datasets [43,44]. These models can offer insights into the spatial distribution of fluoride contamination by incorporating hydrogeochemical variables like pH, electrical conductivity, and ion concentrations [45,46]. This study aims to employ various machine learning models to predict fluoride concentrations in the groundwater of Balasore, Odisha [47,48,49]. By comparing the performance of different models and identifying key predictor variables, the research seeks to develop an efficient, data-driven framework for managing fluoride contamination [50,51]. This work provides a scientific basis for groundwater quality assessment in Balasore and offers valuable insights for other fluoride-affected regions in India and beyond [52,53,54].

2. Study Area

The study area is located in Balasore District, situated in the northeastern part of Odisha, India, as portrayed in Figure 1 (geographic coordinates: 21°30′ N 86°54′ E 21.5° N 86.9° E). This region spans 3634 km² and borders the Bay of Bengal to the east [55]. The topography is predominantly flat, with an average elevation of 15 m above sea level [56]. The area is characterized by a tropical climate, with an annual rainfall of 1500 mm, most of which occurs during the monsoon season (June–September) [57]. Summers are hot, reaching 40 °C, while winters are mild [58,59]. The region is primarily agricultural, with paddy being the main crop [60,61]. Groundwater is a critical resource for irrigation and drinking water, but fluoride contamination has been a growing concern, affecting local health [62,63]. The study area was selected due to its ongoing issues with groundwater quality, especially the high fluoride content that impacts the rural population [64].

Figure 1. Index map: (a) Country map showing the national context. (b) State map illustrating the regional location. (c) The study area map provides detailed boundaries and sample locations for the area under investigation.

2.1. Hydrogeological Characteristics

The study area is located in Balasore District, Odisha, India, covering 3634 sq. Considering that the M-Basin covers a total area of km² and is constantly evolving with an average height of 15 m over sea level, its main characteristic is a flat topography. The region has a tropical climate, with annual rainfall totaling around 1500 mm, primarily during the monsoon season, and temperatures varying from mild winter to hot summer days at upwards of 40 °C. Groundwater, obtained from wells and borehole wells in the area, is an important water source for drinking and irrigation purposes. The aquifer system is influenced by the geological presence of fluoride-bearing minerals such as fluorite, mica, and apatite, which, coupled with high alkalinity, low calcium levels, and sodium bicarbonate-type water, contribute to fluoride leaching into groundwater. Sampling depths were included as a predictor variable in this study, ensuring an assessment of shallow and potentially deeper aquifers to analyze groundwater quality comprehensively.

2.2. Methodology and Model Specification

2.2.1. Datasets Description

The Central Ground Water Board (CGWB) dataset, Bhubaneswar, Odisha, provides groundwater quality data for Balasore, encompassing geographic details such as blocks, panchayats, villages, habitations, and locations to identify sampling sites precisely. It includes information on the groundwater source type (wells, boreholes) and the lab where samples were analyzed. Key dates like sample receipt dates and lab testing dates ensure temporal accuracy. The dataset features crucial water quality indicators, including pH, conductivity, turbidity, chloride, hardness, alkalinity, fluoride, iron, and total dissolved solids (TDS), as presented in Table 1, and the spatial distribution of fluoride through an isoline map is portrayed in Figure 2, offering insights into the groundwater’s chemical and physical properties, which are vital for assessing fluoride contamination and health-related risks.

Table 1. Statistical summary of the dataset parameters based on descriptive statistics.

Figure 2. Isoline map of fluoride distribution.

2.2.2. Model Description and Development

In Figure 3, the workflow diagram illustrates the process for predicting groundwater fluoride levels using machine learning models. It begins with collecting groundwater fluoride data from the Central Ground Water Board (CGWB), consisting of 2151 data points, as portrayed in Figure 1. The data undergo standardization to ensure consistency. The dataset is split into training and testing sets in 80:20 ratios. The target variable is fluoride concentration, while predictor variables include water quality indicators such as pH, conductivity, turbidity, chloride, total hardness, total alkalinity, iron, and total dissolved solids (TDS). Machine learning models, including Support Vector Regression (SVR), Random Forest (RF), Gradient Boosting (GB), and Stacking Regressor (SR), are then trained to predict fluoride levels [65,66]. Model performance is evaluated using metrics like Mean Squared Error (MSE), Mean Absolute Error (MAE), and accuracy, with the best-performing model selected based on these criteria [67]. This process aims to identify the most accurate groundwater fluoride prediction model.

Figure 3. Workflow diagram for demonstrating the methodology.

Figure 4 and Figure 5 show box plots of the groundwater quality parameters, including pH, conductivity, turbidity, chloride, total hardness, total alkalinity, iron, TDS, and fluoride; this process finds and removes outliers from multivariate data. The distribution of the raw data is shown in Figure 4, showing the median value of each of the parameters denoted by the line in the box and the interquartile range (IQR) showing the box itself, which spans the 25th to 75th percentiles. Any data points outside the whiskers’ 1.5-fold IQR are regarded as outliers and are shown as circles. There are a lot of outliers in parameters like conductivity and TDS, which suggests a lot of variation or the existence of extreme values. These outliers, which can distort statistical studies and impair model performance, can result from sampling mistakes, natural variability, or inconsistent data entry. After excluding any outliers from the data, the cleaned data distribution is depicted in Figure 5, ensuring a cleaner data set is available for further studies. Outlier removal enhances data integrity by reducing skewness and better capturing the central tendency and spread of the parameters. Moreover, as an illustrative example, the distribution of metrics such as conductivity and TDS is more compact and homogenous after removal. By reducing the impact of extreme numbers on statistical computations like the mean and standard deviation, this step produces more accurate and objective findings. Depending on the dataset’s characteristics and the study’s goals, common techniques for removing outliers include using a threshold that is 1.5 times the IQR, Z-score filtering, or domain-specific criteria. Removing outlier values is an important preprocessing step in groundwater quality evaluations because it improves the accuracy and robustness of subsequent analysis, including correlation analysis, trend analysis, and machine learning modeling. Outlier resolution makes the sample more representative of typical groundwater conditions and ensures that outliers do not distort forecasts and insights. Ultimately, this method facilitates better groundwater quality assessments, thus assisting environmental management and planning authorities in decision-making.

Figure 4. Outlier detection.

Figure 5. Outlier removed.

Figure 6 illustrates the correlation matrix heat map, highlighting the relationships among water quality parameters, including fluoride, pH, conductivity, and total dissolved solids (TDS). The color-coded cells indicate the strength and direction of these correlations, with darker shades signifying stronger relationships. This visualization aids in quickly identifying significant associations and understanding how these parameters interact, which is essential for assessing groundwater quality and pinpointing factors influencing fluoride levels.

Figure 6. Correlation matrix heatmap.

2.2.3. Machine Learning Models

They are algorithms that learn from data to make predictions or decisions. They include supervised models like linear regression and decision trees and unsupervised models like K-means. Deep learning, with neural networks, excels in complex tasks. The model choice depends on the problem and data characteristics [68,69].

Support Vector Regression (SVR)

It is applied to predict fluoride concentration in groundwater. It models the relationship as follows:

f (x) = ω^{T} Φ (x) + b

(1)

In Equation (1),

Φ (x)

maps input features (water quality indicators) to a higher dimensional space, ω represents the weight vector, and

b

is the bias. SVR optimizes a cost function using an ε-insensitive loss, allowing flexibility in prediction while maintaining a trade-off between accuracy and model complexity for groundwater fluoride estimation [70].

Random Forest (RF)

It is used to predict fluoride concentration in groundwater by combining multiple decision trees [25,71]. Each tree outputs a prediction; the final result is the average prediction across all trees. The model aggregates individual tree predictions as follows:

g (x) = \frac{1}{N} \sum_{i = 1}^{N} g_{i} (x)

(2)

where in Equation (2), N is the total number of trees, and

g_{i} (x)

represents the prediction from the i^th tree. This ensemble approach improves prediction accuracy and reduces overfitting for fluoride estimation [72,73].

Gradient Boosting (XGBoost)

It is utilized to predict fluoride concentration in groundwater by iteratively combining weak learners (decision trees) [74,75]. Each tree corrects the errors of the previous ones, with predictions updated as follows:

k (x) = \sum_{m = 1}^{M} γ_{m} h_{m} (x)

(3)

In Equation (3), M is the number of trees,

h_{m}

is the prediction from the m^th tree, and

γ_{m}

is the learning rate. This approach optimizes prediction accuracy by minimizing the loss function, making it practical for fluoride concentration estimation.

Stacking Regressor (SR)

It combines Random Forest (RF) and Gradient Boosting (XGBoost) to predict fluoride concentration in groundwater [76,77]. The SR model leverages the complementary strengths of RF and XGBoost to enhance prediction accuracy [78,79]. The final prediction is obtained by training a meta-learner on the outputs of these base models:

s (x) = l (g (x), k (x))

(4)

where in Equation (4),

g (x)

and

k (x)

represent the predictions from the Random Forest and Gradient Boosting models, respectively, and

l

is the meta-learner.

3. Results

The target variable was predicted using the SVR model with the Radial Basis Function (RBF) kernel. The K-fold technique was used to perform five-fold cross-validation, which assesses the model’s consistency over several subsets of the training data, to guarantee robust performance. During cross-validation, evaluation metrics such as MAE, MSE, and R² scores were computed to evaluate the correctness and stability of the model. Metrics were calculated for the test set to confirm the model’s capacity for generalization after it had been trained on unknown data. This method ensures the model’s dependability for upcoming applications by thoroughly evaluating its prediction performance.

In Equation (4) and Equation (5), the results demonstrate the predictive capabilities of various regression models (in Table 2) for estimating fluoride concentration in groundwater. The SVR exhibited a Mean Squared Error (MSE) of 0.02299 and a Mean Absolute Error (MAE) of 0.11562, reflecting a moderate level of predictive performance [80,81]. In contrast, the Stacking Regressor (SR), which integrates the predictions of both Random Forest and Gradient Boosting models, achieved a significantly improved MSE of 0.01817 and an MAE of 0.10327, highlighting the advantages of using ensemble methods in complex datasets [82]. The best-performing Random Forest configuration resulted in an MSE of 0.01924 and an MAE of 0.10616, while XGBoost produced an MSE of 0.02146. These findings underscore the potential of these models yet indicate room for enhancement in predictive accuracy [82].

M S E = \frac{1}{m} \sum_{k = 1}^{m} {{(x}_{k} - {\hat{x}}_{k})}^{2}

(5)

M A E = \frac{\sum_{i = 1}^{m} | x_{k} - {\hat{x}}_{k} |}{m}

(6)

Table 2. Pseudocode for the best model.

In Table 3, four machine learning models—SVR, Stacking Regressor, Random Forest, and XGBoost—are compared for performance, and the results show that the Stacking Regressor performs the best. With a maximum accuracy of 0.896, the lowest Mean Absolute Error (MAE) of 0.103, and Mean Squared Error (MSE) of 0.018, it has exceptional prediction abilities. With a somewhat higher accuracy of 0.8411, MSE of 0.019, and MAE of 0.106, the Random Forest model comes in second. With an accuracy of 0.8334, an MSE of 0.021, and an MAE of 0.111, XGBoost exhibits a moderate level of performance. With the lowest accuracy (0.826) and the largest MAE (0.115) and MSE (0.022), the Support Vector Regressor (SVR) performs the worst.

Table 3. Performance measures in the testing stages of proposed models.

Figure 7 compares actual and predicted fluoride levels using different models, highlighting the models’ performance in capturing the variations in fluoride concentration across the test dataset.

Figure 7. Comparison of model accuracy for fluoride prediction (* indicating the Stacking regressor gives the better result among all models).

Figure 8 compares actual and predicted fluoride levels for three models: Stacking Regressor, Random Forest, and XGBoost with regularization. The scatter plot helps visualize how well each model predicts fluoride concentration, with the points plotted against the actual values. The closer the points lie to the diagonal line, the better the model’s accuracy in predicting fluoride levels in groundwater samples.

Figure 8. Actual vs. predicted fluoride levels for different models.

We recommend employing advanced feature engineering techniques to improve performance, such as creating interaction terms and applying transformations like logarithmic or polynomial features. Additionally, optimization can refine model accuracy. Implementing these strategies could lead to more effective predictions and better-informed decisions regarding fluoride contamination in groundwater [83].

Using a Stacking Regressor model, in Figure 9, the feature importance plot illustrates the relative contributions of different water quality metrics in predicting the target variable. With an importance score close to 0.2, iron stood out as the most impactful parameter, underscoring its crucial function in the prediction task. Total alkalinity and total hardness trailed closely behind, demonstrating their significant influence. With equivalent contributions, turbidity, chloride, and conductivity also played substantial roles, explaining their significance in comprehending changes in water quality. Among the chosen criteria, TDS showed the least importance, whereas parameters like pH had a substantial impact on the predictive model. This analysis emphasizes how several physical and chemical characteristics, especially hardness and iron, dominate when assessing the quality of groundwater. These results offer valuable insights for developing efficient machine learning models to evaluate water quality and guide water management practices.

Figure 9. Feature importance plot illustrating the relative contributions of different water quality metrics.

4. Discussion

4.1. Summary

This study assessed various regression models for predicting groundwater fluoride levels, including a Support Vector Regressor (SVR) model, a Random Forest (RF) model, an XGBoost model, and a Stacking Regressor (SR) model combining RF and Gradient Boosting. The SR model achieved the best results, with the lowest Mean Squared Error (MSE) of 0.01817 and Mean Absolute Error (MAE) of 0.10327, highlighting the advantage of ensemble approaches. The findings suggest that machine learning can significantly improve groundwater management, particularly in regions like Balasore, by providing more accurate predictions and a better understanding of fluoride contamination. This approach offers a promising alternative to traditional, labor-intensive monitoring methods.

4.2. Interpretations

The results indicate that the Stacking Regressor model, which integrates Random Forest and Gradient Boosting, accurately predicted groundwater fluoride levels in Balasore, Odisha. This performance surpasses traditional models used nationally, such as ELM in Punjab, and internationally, like Random Forest applications in Pakistan and the U.S., suggesting that ensemble methods can significantly enhance prediction accuracy. By outperforming other approaches, this study confirms that using multiple environmental predictors in machine learning models offers a more effective strategy for identifying and managing fluoride contamination, potentially guiding policy improvements in India and globally. This study reveals alarming exceedances of national (BIS: 1.5 mg/L) and international (WHO: 1.5 mg/L) standards. The results inform evidence-based policy decisions, targeted interventions, and public awareness campaigns. Comparison with national and international guidelines underscores the need for effective mitigation strategies, such as defluorination technologies and alternative water sources, to safeguard public health and environmental sustainability.

4.3. Local Environmental and Social Factors

Fluoride leaches from minerals, including fluorite, mica, and apatite, in groundwater due to local environmental conditions like high alkalinity, low calcium levels, and sodium bicarbonate-type water.

The tropical climate, with high temperatures, significant evaporation, and slow groundwater recharge, exacerbates fluoride concentration. Human activities, including extensive use of phosphate fertilizers, industrial waste disposal, and over-extraction for irrigation, further elevate fluoride levels. Social factors such as reliance on agriculture and lack of regular groundwater quality monitoring expose the rural population to health risks like dental and skeletal fluorosis, impacting community well-being and economic productivity.

It has impacted the predictive mapping of fluoride-contaminated groundwater zones like geology, soil type, hydrology, agriculture, population density, water usage, economic status, and awareness of influence contamination. It also reveals correlations between high-fluoride areas, intensive agriculture, alkaline soils, and rural communities with limited access to safe water. Integrating these factors enhances the model’s predictive power, informing targeted interventions and policy decisions to mitigate fluoride-related health risks, particularly in vulnerable populations.

4.4. Comparison with Other Studies

This study found that the Stacking Regressor model, combining Random Forest and Gradient Boosting, achieved the highest accuracy in predicting groundwater fluoride in Balasore, Odisha (MSE: 0.01817), outperforming similar models applied in China and the U.S. Unique local challenges in Balasore, such as natural mineral leaching, agricultural activities, high alkalinity, and low calcium levels, necessitate accounting for complex hydrogeochemical interactions. Comparatively, studies in Punjab, India, and Pakistan highlighted improvements using an Extreme Learning Machine (ELM) and additional hydrological parameters, suggesting tailored data-driven approaches are crucial to enhance predictive accuracy in regions with distinct groundwater characteristics.

4.5. Implications

The findings can significantly influence policy-making by providing a data-driven approach to managing groundwater fluoride contamination. Accurate predictions from the Stacking Regressor model can help identify high-risk areas in Balasore, enabling targeted mitigation efforts, such as community water treatment initiatives and safe drinking water access. It can inform evidence-based policy decisions, targeting high-risk areas for mitigation strategies, improved water resource management, and public health programs. These insights can inform policies on groundwater extraction, agricultural practices, and fertilizer use to reduce fluoride leaching. Additionally, the approach can be extended to other regions, guiding national and international strategies for groundwater quality management and encouraging the adoption of machine learning techniques to enhance water safety standards and resource allocation. It also helps researchers, communities, and stakeholders by refining predictive models, raising awareness, and guiding water management initiatives. By addressing fluoride contamination, our research contributes to improved public health, environmental sustainability, and evidence-driven policy-making, ultimately enhancing the quality of life for vulnerable populations.

4.6. Limitations

This study’s results have limitations that affect their scope and applicability. Firstly, the predictions rely on the available physicochemical parameters and sampling depth, potentially missing other essential factors like land use, precipitation, or groundwater flow that could influence fluoride levels. Secondly, the models used do not provide insight into the underlying geochemical processes causing fluoride contamination, such as specific mineral interactions or sources of leaching. Additionally, while the models show good accuracy for the studied area, their performance might not generalize well to regions with different hydrogeological conditions without recalibration. Also, this study does not address long-term temporal changes in groundwater fluoride. Future research should address these limitations by integrating diverse data sources, refining model resolution, and incorporating dynamic monitoring to enhance accuracy and inform effective fluoride mitigation strategies.

4.7. Recommendations and Future Work

While demonstrating the potential of machine learning models for predicting groundwater fluoride levels, this study has limitations. The accuracy of predictions depends heavily on the quality and completeness of the dataset, and this study is limited to the available physicochemical parameters in Balasore. Additional factors such as land use, geological conditions, and hydrological parameters could improve model performance. Moreover, the spatial variability of fluoride across different regions may limit the generalizability of the findings.

Future research should focus on integrating multi-source data, including satellite-based remote sensing and hydrogeochemical data, to improve prediction accuracy. Additionally, exploring deep learning models and optimizing hyperparameters with advanced techniques like Bayesian optimization could further enhance the robustness of fluoride prediction models.

5. Conclusions

According to the data, the Stacking Regressor fared better than the other models with the lowest MSE (0.01817) and MAE (0.10327), demonstrating its greater accuracy in forecasting the fluoride levels in groundwater. Random Forest and XGBoost showed encouraging results, with MSEs of 0.01924 and 0.02146, respectively. The SVR demonstrated a moderate capacity for prediction. These results demonstrate the advantages of ensemble approaches and point to additional model improvements via feature engineering and optimization.

This study addresses the critical issue of groundwater fluoride pollution in Balasore, Odisha, where natural leaching, industrial operations, and agricultural practices compound the situation. The research utilizes machine learning algorithms SVR, RF, XGBoost, and Stacking Regressor to forecast fluoride levels, emphasizing the need to combine hydrogeochemical parameters and environmental aspects for more accurate predictions. The research goal of offering a data-driven method for regulating fluoride contamination was in line with the Stacking Regressor’s superior accuracy when comparing model performances.

The results demonstrate how intricately local environmental and socioeconomic factors affect fluoride dispersion. According to this study, the region’s tropical temperature, human activity, high alkalinity, and low calcium levels significantly contribute to the heightened fluoride levels. These findings highlight the necessity of focused mitigation techniques, such as defluorination technology, to lower health risks and guarantee that impacted communities can access safe drinking water.

This study informs broader applications in fluoride-affected areas and advances groundwater quality assessment in Balasore. The promise of machine learning models in groundwater prediction supports future research areas that could integrate data from several sources and investigate deep learning techniques for even more reliable prediction frameworks.

Author Contributions

Conceptualization, A.K.P., A.R.T. and R.R.T.; methodology, D.N.; software, A.K.P.; validation, A.K.P., A.R.T. and D.N.; formal analysis, B.Đ.; investigation, D.D.; resources D.D.; data curation, D.N. and A.R.T.; writing—original draft preparation, A.R.T. and D.N.; writing—review and editing, B.Đ. and O.F.; visualization, A.R.T.; supervision, B.Đ. and R.R.T.; project administration, B.Đ.; funding acquisition, B.Đ. All authors have read and agreed to the published version of the manuscript.

Funding

In 2024, the University North, Croatia, funded this research under the “Hydrological and Geodetic Analysis of the Watercourse” project.

Data Availability Statement

All data are available from the corresponding author upon reasonable request.

Acknowledgments

The authors are thankful for the support of the University North, Croatia, within the scientific project.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Narsimha, A. Application of GIS to Evaluate the Groundwater Quality for Drinking Purposes in Semiarid Region of Telangana State, India. In Groundwater Contamination in Coastal Aquifers; Elsevier: Amsterdam, The Netherlands, 2022; pp. 191–200. [Google Scholar] [CrossRef]
Mrozik, W.; Rajaeifar, M.A.; Heidrich, O.; Christensen, P. Environmental Impacts, Pollution Sources and Pathways of Spent Lithium-Ion Batteries. Energy Environ. Sci. 2021, 14, 6099–6121. [Google Scholar] [CrossRef]
Barad, S.; Mishra, P.; Sahu, P.C.; Sarkar, T.; Amin MF, M.; Choudhury, T.; Edinur, H.A.; Kari, Z.A.; Nandi, D.; Pati, S. Comparative Approach of Decision Tree and CWQI Analysis for Classification of Groundwater with a Special Reference to Fluoride Ion in Drought-Prone Boudh District of Odisha, India. Sustain. Water Resour. Manag. 2021, 7, 94. [Google Scholar] [CrossRef]
Yadav, K.; Raphi, M.; Jagadevan, S. Geochemical Appraisal of Fluoride Contaminated Groundwater in the Vicinity of a Coal Mining Region: Spatial Variability and Health Risk Assessment. Geochemistry 2021, 81, 125684. [Google Scholar] [CrossRef]
Jha, S.K.; Singh, R.K.; Damodaran, T.; Mishra, V.K.; Sharma, D.K.; Rai, D. Fluoride in Groundwater: Toxicological Exposure and Remedies. J. Toxicol. Environ. Health Part B Crit. Rev. 2013, 16, 52–66. [Google Scholar] [CrossRef] [PubMed]
Shaji, E.; Sarath, K.; Santosh, M.; Krishnaprasad, P.; Arya, B.; Babu, M.S. Fluoride Contamination in Groundwater: A Global Review of the Status, Processes, Challenges, and Remedial Measures. Geosci. Front. 2024, 15, 101734. [Google Scholar] [CrossRef]
Vikas, D.; Sharma, S. Fluoride Contamination in Drinking Water and Associated Health Risk Assessment in the Malwa Belt of Punjab, India. Environ. Adv. 2022, 8, 100242. [Google Scholar] [CrossRef]
Steven, L. Greater Tooth-Brushing Frequency, Fluoride Supplement Use, and Tap Water Fluoride Concentrations above 0.7 Mg/L Are Risk Factors for Dental Fluorosis. J. Evid.-Based Dent. Pract. 2005, 5, 37–38. [Google Scholar] [CrossRef]
Wood, R.J.K.; Lu, P. Coatings and Surface Modification of Alloys for Tribo-Corrosion Applications. Coatings 2024, 14, 99. [Google Scholar] [CrossRef]
Mridha, D.; Priyadarshni, P.; Bhaskar, K.; Gaurav, A.; De, A.; Das, A.; Joardar, M.; Chowdhury, N.R.; Roychowdhury, T. Fluoride Exposure and Its Potential Health Risk Assessment in Drinking Water and Staple Food in the Population from Fluoride Endemic Regions of Bihar, India. Groundw. Sustain. Dev. 2021, 13, 100558. [Google Scholar] [CrossRef]
Gogoi, R.R.; Khanikar, L.; Gogoi, J.; Neog, N.; Deka, D.J.; Sarma, K.P. Geochemical Sources, Hydrogeochemical Behaviour of Fluoride Release and Its Health Risk Assessment in Some Fluorosis Endemic Areas of the Brahmaputra Valley of Assam, India. Appl. Geochem. J. Int. Assoc. Geochem. Cosmochem. 2021, 127, 104911. [Google Scholar] [CrossRef]
Indrani, M.; Singh, U.K. Fluoride Abundance and Their Release Mechanisms in Groundwater along with Associated Human Health Risks in a Geologically Heterogeneous Semi-Arid Region of East India. Microchem. J. Devoted Appl. Microtech. All Branches Sci. 2020, 152, 104304. [Google Scholar] [CrossRef]
Chakraborti, D.; Rahman, M.M.; Chatterjee, A.; Das, D.; Das, B.; Nayak, B.; Pal, A.; Chowdhury, U.K.; Ahmed, S.; Biswas, B.K.; et al. Fate of over 480 Million Inhabitants Living in Arsenic and Fluoride Endemic Indian Districts: Magnitude, Health, Socio-Economic Effects and Mitigation Approaches. J. Trace Elem. Med. Biol. Organ Soc. Miner. Trace Elem. (GMS) 2016, 38, 33–45. [Google Scholar] [CrossRef]
Ayoob, S.; Gupta, A.K. Fluoride in Drinking Water: A Review on the Status and Stress Effects. Crit. Rev. Environ. Sci. Technol. 2006, 36, 433–487. [Google Scholar] [CrossRef]
Kar, B.; Patra, B.; Mohapatra, P.D. Distribution of Fluoride in the Environment of Balasore District, Odisha, India. Asian J. Water Environ. Pollut. 2013, 10, 87–90. [Google Scholar]
Saurabh, S.; Saxena, A. Global Status of Nitrate Contamination in Groundwater: Its Occurrence, Health Impacts, and Mitigation Measures. In Handbook of Environmental Materials Management; Springer International Publishing: Berlin/Heidelberg, Germany, 2018; pp. 1–21. [Google Scholar] [CrossRef]
Sinha Ray, S.P. Fluoride Contamination in Groundwater—Some Mitigation Options. In Ground Water Contamination in India; Springer Nature: Cham, Switzerland, 2024; pp. 313–323. [Google Scholar] [CrossRef]
Job, C.A. Groundwater Quality Treatment and Waste Disposal. In Production, Use, and Sustainability of Groundwater; CRC Press: Boca Raton, FL, USA, 2021; pp. 233–271. [Google Scholar] [CrossRef]
Patle, G.T.; Singh, D.K.; Sarangi, A. Modelling of Climate-Induced Groundwater Recharge for Assessing Carbon Emission from Groundwater Irrigation. Curr. Sci. 2018, 115, 64. [Google Scholar] [CrossRef]
Mohammadi, A.A.; Ghaderpoori, M.; Yousefi, M.; Rahmatipoor, M.; Javan, S. Prediction and Modeling of Fluoride Concentrations in Groundwater Resources Using an Artificial Neural Network: A Case Study in Khaf. Environ. Health Eng. Manag. 2016, 3, 217–224. [Google Scholar] [CrossRef]
Lidberg, W.; Karlsson, C.; Sohlenius, G.; Westphal, F.; Larson, J.; Ågren, A.M.; Lin, Y. Evaluating Machine Learning Methods for Predicting Surface Deposits Across Physiographic Regions in Sweden. 2024. Available online: https://www.ssrn.com/abstract=4965382 (accessed on 1 December 2024).
Joel, P.; Berg, M. Global Machine-Learning Model of Naturally Occurring Fluoride in Groundwater. Available online: https://meetingorganizer.copernicus.org/EGU23/EGU23-12956.html (accessed on 15 May 2023).
Ling, Y.; Podgorski, J.; Sadiq, M.; Rasheed, H.; Eqani, S.A.; Berg, M. Monitoring and Prediction of High Fluoride Concentrations in Groundwater in Pakistan. Sci. Total Environ. 2022, 839, 156058. [Google Scholar] [CrossRef]
Faheem, Z.; Kazmi, J.H.; Shaikh, S.; Arshad, S.; Noreena; Mohammed, S. Random Forest-Based Analysis of Land Cover/Land Use LCLU Dynamics Associated with Meteorological Droughts in the Desert Ecosystem of Pakistan. Ecol. Indic. 2024, 159, 111670. [Google Scholar] [CrossRef]
Rosecrans, C.Z.; Belitz, K.; Ransom, K.M.; Stackelberg, P.E.; McMahon, P.B. Predicting Regional Fluoride Concentrations at Public and Domestic Supply Depths in Basin-Fill Aquifers of the Western United States Using a Random Forest Model. Sci. Total Environ. 2022, 806, 150960. [Google Scholar] [CrossRef]
Anbarasu, S.; Ganesan, S. Human Health Risk and Water Quality Assessment Due to Fluoride and Nitrate around Cauvery River Basin, Southern India. Environ. Monit. Assess. 2024, 196, 880. [Google Scholar] [CrossRef]
Huang, S.; Xia, J.; Wang, Y.; Lei, J.; Wang, G. Water Quality Prediction Based on Sparse Dataset Using Enhanced Machine Learning. Environ. Sci. Ecotechnol. 2024, 20, 100402. [Google Scholar] [CrossRef] [PubMed]
Hlaing, P.T.; Humphries, U.W.; Waqas, M. Hydrological Model Parameter Regionalization: Runoff Estimation Using Machine Learning Techniques in the Tha Chin River Basin, Thailand. MethodsX 2024, 13, 102792. [Google Scholar] [CrossRef] [PubMed]
Zhong, H.; Yuan, Y.; Luo, L.; Ye, J.; Zhong, C. Water Quality Prediction of Mbr Based on Machine Learning: A Novel Dataset Contribution Analysis Method. SSRN Electron. J. 2022, 50, 103296. [Google Scholar] [CrossRef]
Madhumita, S. Evaluation of Machine Learning-Based Modeling Approaches in Groundwater Quantity and Quality Prediction. In Advances in Remediation Techniques for Polluted Soils and Groundwater; Elsevier: Amsterdam, The Netherlands, 2022; pp. 87–103. [Google Scholar] [CrossRef]
Essamlali, I.; Nhaila, H.; Khaili, M.E. Advances in Machine Learning and IoT for Water Quality Monitoring: A Comprehensive Review. Heliyon 2024, 10, e27920. [Google Scholar] [CrossRef]
Yang, X.; Chen, Q.; Pan, Z.; Cheng, J.; Zheng, W.; Liang, Y.; Chen, H.; Chen, G.; Wang, W. Application of Patient-Based Real-Time Quality Control Based on Artificial Intelligence Monitoring Platform in Continuously Quality Risk Monitoring of Down Syndrome Serum Screening. J. Clin. Lab. Anal. 2024, 38, e25019. [Google Scholar] [CrossRef]
Priskilla Angel Rani, J.; Nivasini, R.; Yesubai Rubavathi, C.; Jona, P. Machine Learning Based Real Time Water Quality Monitoring System. In Proceedings of the 2023 Third International Conference on Artificial Intelligence and Smart Energy (ICAIS), Coimbatore, India, 2–4 February 2023; IEEE: Piscataway, NJ, USA, 2023. [Google Scholar] [CrossRef]
Bansal, H.; Devarakonda, V.; Dixit, M. Nitrate Contamination Prediction in Groundwater Data in Karnataka, India, Using Machine Learning (ML) Techniques. Available online: https://meetingorganizer.copernicus.org/EGU24/EGU24-14857.html (accessed on 9 March 2024).
Haggerty, R.; Sun, J.; Yu, H.; Li, Y. Application of Machine Learning in Groundwater Quality Modeling—A Comprehensive Review. Water Res. 2023, 233, 119745. [Google Scholar] [CrossRef]
Huang, X.; Jin, M.; Liang, X.; Su, J.; Ma, B. Predicting the Risk of Groundwater Nitrate Contamination Using Machine Learning Tools. Available online: https://meetingorganizer.copernicus.org/EGU22/EGU22-1945.html (accessed on 27 March 2022).
Azimi, S.; Moghaddam, M.A.; Monfared, S.H. Prediction of Annual Drinking Water Quality Reduction Based on Groundwater Resource Index Using the Artificial Neural Network and Fuzzy Clustering. J. Contam. Hydrol. 2019, 220, 6–17. [Google Scholar] [CrossRef]
Mikail, O.; Karaca, H. Optimization of Process Parameters at Direct Liquefaction of Waste PETs. Process Saf. Environ. Prot. Trans. Inst. Chem. Eng. Part B 2023, 171, 986–994. [Google Scholar] [CrossRef]
Oldemar, R. PredictoR: Predictive Data Analysis System. CRAN: Contributed Packages, The R Foundation. Available online: https://cran.r-project.org/web/packages/predictoR/index.html (accessed on 3 March 2019).
Barzegar, R.; Asghari Moghaddam, A.; Adamowski, J.; Fijani, E. Comparison of Machine Learning Models for Predicting Fluoride Contamination in Groundwater. Stoch. Environ. Res. Risk Assess. Res. J. 2017, 31, 2705–2718. [Google Scholar] [CrossRef]
Nadiri, A.A.; Fijani, E.; Tsai, F.T.; Asghari Moghaddam, A. Supervised Committee Machine with Artificial Intelligence for Prediction of Fluoride Concentration. J. Hydroinformatics 2013, 15, 1474–1490. [Google Scholar] [CrossRef]
Van Cranenburgh, S.; Wang, S.; Vij, A.; Pereira, F.; Walker, J. Choice Modelling in the Age of Machine Learning—Discussion Paper. J. Choice Model. 2022, 42, 100340. [Google Scholar] [CrossRef]
Malekzadeh, M.; Kardar, S.; Shabanlou, S. Simulation of Groundwater Level Using MODFLOW, Extreme Learning Machine and Wavelet-Extreme Learning Machine Models. Groundw. Sustain. Dev. 2019, 9, 100279. [Google Scholar] [CrossRef]
Yadav, S.; Bansal, S.K.; Yadav, S.; Kumar, S. Fluoride Distribution in Underground Water of District Mahendergarh, Haryana, India. Appl. Water Sci. 2019, 9, 62. [Google Scholar] [CrossRef]
Rossi, G.; Valadas, L.; Squassi, A. Fluoride and Silver Ion Concentrations and pH in Silver Diamine Fluoride Solutions from Argentina. Acta Odontol. Latinoam. 2022, 35, 120–124. [Google Scholar] [CrossRef] [PubMed]
Bhadani, V.; Singh, A.; Kumar, V.; Gaurav, K. Machine Learning Models to Predict Groundwater Level in a Semi-Arid River Catchment, Central India. Available online: https://meetingorganizer.copernicus.org/EGU23/EGU23-12629.html (accessed on 15 May 2023).
Madani, A.; Hagage, M.; Elbeih, S.F. Random Forest and Logistic Regression Algorithms for Prediction of Groundwater Contamination Using Ammonia Concentration. Arab. J. Geosci. 2022, 15, 1619. [Google Scholar] [CrossRef]
Mishra, P.; Nandi, D.; Sahu, P.; Mohanta, K.; Edinur, H.; Sarkar, T.; Pati, S. Hydro-Geochemical Attributes Based Classifiers for Groundwater Analysis. Ecol. Eng. Environ. Technol. 2021, 22, 28–39. [Google Scholar] [CrossRef]
DemirYetiş, A.; İlhan, N.; Kara, H. Integrating Deep Learning and Regression Models for Accurate Prediction of Groundwater Fluoride Contamination in Old City in Bitlis Province, Eastern Anatolia Region, Türkiye. Environ. Sci. Pollut. Res. Int. 2024, 31, 47201–47219. [Google Scholar] [CrossRef]
Stoffel, P.; Berktold, M.; Müller, D. Real-Life Data-Driven Model Predictive Control for Building Energy Systems Comparing Different Machine Learning Models. 2023. Available online: https://www.ssrn.com/abstract=4596970 (accessed on 15 November 2024).
Yasaswini, G.; Kushala, S.; Santhosh, G.S.; Naik, M.T.; Mondal, M.; Dey, U.; Das, K.; Sarkar, S.; Kumar, P. Occurrence and Distribution of Fluoride in Groundwater and Drinking Water Vulnerability of a Tropical Dry Region of Andhra Pradesh, India. Water 2024, 16, 577. [Google Scholar] [CrossRef]
Singh, K.; Hundal, H.S.; Singh, D. Groundwater Quality Assessment of Arid Regions of Punjab, India with Special Reference to Fluoride. J. Agric. Sci. Appl. 2013, 2, 1–7. [Google Scholar] [CrossRef]
Jha, S.; Sinha, S.; Hazra, S. Hydrochemical Evolution and Assessment of Groundwater Quality in Fluorosis-Affected Area, Mandla District, Central India. Groundw. Sustain. Dev. 2021, 14, 100614. [Google Scholar] [CrossRef]
Barman, N.K.; Chatterjee, S.; Khan, A. Trends of Shoreline Position: An Approach to Future Prediction for Balasore Shoreline, Odisha, India. Open J. Mar. Sci. 2015, 5, 13–25. [Google Scholar] [CrossRef]
Hazra, S.; Ghosh, A.; Ghosh, S.; Pal, I.; Ghosh, T. Assessing Coastal Vulnerability and Governance in Mahanadi Delta, Odisha, India. Prog. Disaster Sci. 2022, 14, 100223. [Google Scholar] [CrossRef]
Mausam (Editor). Monsoon Season (June–September 2012). Mausam 2013, 64, 569–584. [Google Scholar] [CrossRef]
Pulak, M.; Behera, B. Socio-Economic and Environmental Implications of Solar Electrification: Experience of Rural Odisha. Renew. Sustain. Energy Rev. 2016, 56, 953–964. [Google Scholar] [CrossRef]
Kerr, R.A. European Climate. Mild Winters Mostly Hot Air, Not Gulf Stream. Science 2002, 297, 2202. [Google Scholar] [CrossRef]
Anand, C.; Moses, S.C. A Study on Agricultural Mechanization Status under Paddy and Wheat Crop Production in Central Region of Uttar Pradesh, India. Int. J. Environ. Clim. Change 2023, 13, 2835–2841. [Google Scholar] [CrossRef]
Shun-Ichi, K.; Yoshida, T. Identification of Main Paddy Rice Cultivars in Tochigi Prefecture by RAPD Analysis. Jpn. J. Crop Sci. 2005, 74, 207–211. [Google Scholar] [CrossRef]
Camacho, L.; Dumée, L.; Zhang, J.; Li, J.; Duke, M.; Gomez, J.; Gray, S. Advances in Membrane Distillation for Water Desalination and Purification Applications. Water 2013, 5, 94–196. [Google Scholar] [CrossRef]
Schweitzer, G.E. Improved Monitoring Techniques to Assess Groundwater Quality near Sources of Contamination. In Safe Drinking Water; CRC Press: Boca Raton, FL, USA, 2017; pp. 168–176. [Google Scholar] [CrossRef]
Ahmad, S.; Singh, R.; Arfin, T.; Neeti, K. Fluoride contamination, consequences and removal techniques in water: A review. Environ. Sci. Adv. 2022, 1, 620–661. [Google Scholar]
Kushwaha, N.L.; Kudnar, N.S.; Vishwakarma, D.K.; Subeesh, A.; Jatav, M.S.; Gaddikeri, V.; Ahmed, A.A.; Abdelaty, I. Stacked Hybridization to Enhance the Performance of Artificial Neural Networks (ANN) for Prediction of Water Quality Index in the Bagh River Basin, India. Heliyon 2024, 10, e31085. [Google Scholar] [CrossRef]
Lakhera, S.; Chandra, S.; Rahi, D.C. Development of Water Quality Prediction Model for Narmada River Using Artificial Neural Networks. Research Square. Available online: https://www.researchsquare.com/article/rs-1166542/v1 (accessed on 29 December 2021).
Hodson, T.O. Root-Mean-Square Error (RMSE) or Mean Absolute Error (MAE): When to Use Them or Not. Geosci. Model Dev. 2022, 15, 5481–5487. [Google Scholar] [CrossRef]
Idemudia, O.; Ehiorobo, J.O.; Izinyon, C.O.; Ilaboya, I. Evaluating the performance of Random Forest, Decision Tree, Support Vector Regression and Gradient Boosting for streamflow prediction. CTU J. Innov. Sustain. Dev. 2024, 16, 116–130. [Google Scholar] [CrossRef]
Marsland, S. Machine Learning: An Algorithmic Perspective, 2nd ed.; CRC Press: Boca Raton, FL, USA, 2014. [Google Scholar]
Kocaoğlu, A. Efficient Optimization of a Support Vector Regression Model with Natural Logarithm of the Hyperbolic Cosine Loss Function for Broader Noise Distribution. Appl. Sci. 2024, 14, 3641. [Google Scholar] [CrossRef]
Tran, D.A.; Tsujimura, M.; Ha, N.T.; Nguyen, V.T.; Van Binh, D.; Dang, T.D.; Doan, Q.; Bui, D.T.; Ngoc, T.A.; Phu, L.V.; et al. Evaluating the predictive power of different machine learning algorithms for groundwater salinity prediction of multi-layer coastal aquifers in the Mekong Delta, Vietnam. Ecol. Indic. 2021, 127, 107790. [Google Scholar] [CrossRef]
Joel, P.; Berg, M. Global Analysis and Prediction of Fluoride in Groundwater. Nat. Commun. 2022, 13, 4232. [Google Scholar] [CrossRef]
Johannes, L.; Laaha, G. Effect of Merging Large Datasets on Prediction Accuracy of Low Flow Estimation by Random Forest. Available online: https://meetingorganizer.copernicus.org/EGU22/EGU22-7312.html (accessed on 28 March 2022).
Kashiyama, M.; Hirokawa, M.; Matsuno, R.; Sakuma, K.; Itoh, T. Interactive Visualization of Ensemble Decision Trees Based on the Relations among Weak Learners. In Proceedings of the 2024 28th International Conference Information Visualisation (IV), Coimbra, Portugal, 23–26 July 2024; IEEE: Piscataway, NJ, USA, 2024; Volume 119, pp. 1–6. [Google Scholar] [CrossRef]
Yajima, D.; Ohkawa, T.; Muroi, K.; Imaishi, H. Predicting Toxicity of Food-Related Compounds Using Fuzzy Decision Trees. Int. J. Biosci. Biochem. Bioinform. 2014, 4, 33–38. [Google Scholar] [CrossRef]
Afikah, A.; Findawati, Y. Classification of Vacational High School Graduates’ Ability in Industry Using Extreme Gradient Boosting (XGBoost), Random Forest and Logistic Regression. Available online: https://archive.umsida.ac.id/index.php/archive/preprint/view/734/version/726 (accessed on 11 April 2023).
Naghibi, S.A.; Hashemi, H.; Berndtsson, R.; Lee, S. Application of Extreme Gradient Boosting and Parallel Random Forest Algorithms for Assessing Groundwater Spring Potential Using DEM-Derived Factors. J. Hydrol. 2020, 589, 125197. [Google Scholar] [CrossRef]
El Hafyani, M.; El Himdi, K.; El Adlouni, S. Improving Monthly Precipitation Prediction Accuracy Using Machine Learning Models: A Multi-View Stacking Learning Technique. Front. Water 2024, 6, 1378598. [Google Scholar] [CrossRef]
Ji, C. Research on an Integrated Index Prediction Model Based on RF-XGBOOST-ANN. In Proceedings of the 2023 IEEE International Conference on Control, Electronics and Computer Technology (ICCECT), Jilin, China, 28–30 April 2023; IEEE: Piscataway, NJ, USA, 2023. [Google Scholar] [CrossRef]
Ahmar, A.S. Forecast Error Calculation with Mean Squared Error (MSE) and Mean Absolute Percentage Error (MAPE). J. Inf. Vis. 2020, 1, 94–96. [Google Scholar] [CrossRef]
Robi’atul, A.; Fitrani, A.S. Sentiment Analysis on Twitter About Domestic Violence Using Random Forest and Extreme Gradient Boosting Methods. Available online: https://archive.umsida.ac.id/index.php/archive/preprint/view/2459/version/2451 (accessed on 22 August 2023).
M’hamdi, O.; Takács, S.; Palotás, G.; Ilahy, R.; Helyes, L.; Pék, Z. A Comparative Analysis of XGBoost and Neural Network Models for Predicting Some Tomato Fruit Quality Traits from Environmental and Meteorological Data. Plants 2024, 13, 746. [Google Scholar] [CrossRef]
Garima, S.; Mehta, S. Prediction of Geogenic Source of Groundwater Fluoride Contamination in Indian States: A Comparative Study of Different Supervised Machine Learning Algorithms. J. Water Health 2024, 22, 1387–1408. [Google Scholar] [CrossRef]
Chen, J.; Chen, S.; Fu, R.; Li, D.; Jiang, H.; Wang, C.; Peng, Y.; Jia, K.; Hicks, B.J. Remote Sensing Big Data for Water Environment Monitoring: Current Status, Challenges, and Future Prospects. Earth’s Future 2022, 10, e2021EF002289. [Google Scholar] [CrossRef]

Figure 1. Index map: (a) Country map showing the national context. (b) State map illustrating the regional location. (c) The study area map provides detailed boundaries and sample locations for the area under investigation.

Figure 2. Isoline map of fluoride distribution.

Figure 3. Workflow diagram for demonstrating the methodology.

Figure 4. Outlier detection.

Figure 5. Outlier removed.

Figure 6. Correlation matrix heatmap.

Figure 7. Comparison of model accuracy for fluoride prediction (* indicating the Stacking regressor gives the better result among all models).

Figure 8. Actual vs. predicted fluoride levels for different models.

Figure 9. Feature importance plot illustrating the relative contributions of different water quality metrics.

Table 1. Statistical summary of the dataset parameters based on descriptive statistics.

	pH	Conductivity (μS/cm)	Turbidity (NTU)	Chloride (mg/L)	Total Hardness (mg/L)	Total Alkalinity (mg/L)	Iron (mg/L)	TDS (mg/L)	Fluoride (mg/L)
count	1853	1853	1853	1853	1853	1853	1853	1853	1853
Mean	7.40	572.86	1.74	87.32	226.16	249.20	0.55	372.49	0.67
Std	0.30	87.89	1.26	39.85	41.81	39.38	0.22	57.22	0.14
Min	6.51	332.0	0.02	14.0	90.0	144.0	0.002	216.0	0.248
25%	7.18	524.0	0.66	58.0	198.0	226.0	0.461	341.0	0.581
50%	7.37	588.0	1.7	82.0	232.0	252.0	0.593	382.0	0.682
75%	7.59	632.0	2.7	116.0	256.0	276.0	0.726	411.0	0.778
max	8.27	814.0	5.27	206.0	340.0	358.0	1.2	529.0	1.09

Table 2. Pseudocode for the best model.

Pseudocode for Stacking Regressor:
INPUT: X_train, y_train, X_test, y_test INITIALIZE base_models = [RF, GB] INITIALIZE meta_model = GradientBoostingRegressor CREATE stacking_model = StackingRegressor (base_models, meta_model) TRAIN stacking_model.fit (X_train, y_train) PREDICT y_pred_stack = stacking_model. predict(X_test) EVALUATE MSE, MAE, R2, Accuracy OUTPUT: MSE, MAE, Accuracy

Table 3. Performance measures in the testing stages of proposed models.

Models	MAE	MSE	Accuracy
SVR	0.115	0.022	0.826
Stacker Regressor	0.103	0.018	0.896
Random Forest	0.106	0.019	0.841
XGBoost	0.111	0.021	0.833

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Fluoride Risk Prognostication: A Pioneering Ensemble Machine Learning Approach for Groundwater Contamination Prediction in Parts of the East Coast of India

Abstract

1. Introduction

2. Study Area

2.1. Hydrogeological Characteristics

2.2. Methodology and Model Specification

2.2.1. Datasets Description

2.2.2. Model Description and Development

2.2.3. Machine Learning Models

Support Vector Regression (SVR)

Random Forest (RF)

Gradient Boosting (XGBoost)

Stacking Regressor (SR)

3. Results

4. Discussion

4.1. Summary

4.2. Interpretations

4.3. Local Environmental and Social Factors

4.4. Comparison with Other Studies

4.5. Implications

4.6. Limitations

4.7. Recommendations and Future Work

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Article Metrics

Citations

Article Access Statistics