Machine Learning Models for Water Quality Prediction: A Comprehensive Analysis and Uncertainty Assessment in Mirpurkhas, Sindh, Pakistan

: Groundwater represents a pivotal asset in conserving natural water reservoirs for potable consumption, irrigation, and diverse industrial uses. Nevertheless, human activities intertwined with industry and agriculture contribute significantly to groundwater contamination, highlighting the critical necessity of appraising water quality for safe drinking and effective irrigation. This research primarily focused on employing the Water Quality Index (WQI) to gauge water’s appropriateness for these purposes. However, the generation of an accurate WQI can prove time-intensive owing to potential errors in sub-index calculations. In response to this challenge, an artificial intelligence (AI) forecasting model was devised, aiming to streamline the process while mitigating errors. The study collected 422 data samples from Mirpurkash, a city nestled in the province of Sindh, for a comprehensive exploration of the region’s WQI attributes. Furthermore, the study probed into unraveling the interdependencies amidst variables in the physiochemical analysis of water. Diverse machine learning classifiers were employed for WQI prediction, with findings revealing that Random Forest and Gradient Boosting lead with 95% and 96% accuracy, followed closely by SVM at 92%. KNN exhibits an accuracy rate of 84%, and Decision Trees achieve 77%. Traditional water quality assessment methods are time-consuming and error-prone; a transformative approach using artificial intelligence and machine learning addresses these limitations. In addition to WQI prediction, the study conducted an uncertainty analysis of the models using the R-factor, providing insights into the reliability and consistency of predictions. This dual approach, combining accurate WQI prediction with uncertainty assessment, contributes to a more comprehensive understanding of water quality in Mirpurkash and enhances the reliability of decision-making processes related to groundwater utilization.


Introduction
Water, as an indispensable resource, plays a fundamental role in sustaining life and supporting various human activities.Among its many sources, groundwater stands as a crucial reservoir essential for drinking, agriculture, and industrial processes in Pakistan.
Water 2024, 16 However, the escalating impact of human interventions, particularly in industrial and agricultural sectors, poses a substantial threat to the quality of this invaluable resource [1][2][3][4].
The condition of groundwater in Pakistan, including areas like Mirpurkhas in the province of Sindh, has faced mounting challenges due to extensive usage, urbanization, and agricultural runoff, leading to contamination concerns and a decline in overall quality.The region's reliance on groundwater for daily consumption and agricultural needs amplifies the urgency for effective water quality assessment measures [2,[5][6][7].Contamination of groundwater due to these anthropogenic activities has heightened concerns regarding its suitability for consumption and irrigation purposes, necessitating robust methods for accurate evaluation and monitoring [8][9][10][11].
Traditionally, assessing water quality, especially the determination of the Water Quality Index (WQI), relied heavily on manual calculations and established formulas based on a set of parameters [9,[12][13][14][15].These methods often entail time-consuming processes and are prone to human errors, particularly in complex calculations involving multiple interdependent factors [16][17][18].In recent years, the integration of artificial intelligence and machine learning techniques, implemented using programming languages like Python (version 3.10), alongside specialized libraries such as scikit-learn (version 0.24), XGBoost (version 1.5), and pandas (version 1.3), has emerged as a transformative approach to overcome the limitations of traditional methods.Machine learning models, such as Random Forest, Gradient Boosting, Support Vector Machine (SVM), K-Nearest Neighbors (KNN), and Decision Trees, were developed and trained using these tools.They offer the advantage of learning patterns and relationships from vast datasets, enabling more accurate and efficient prediction of WQI [19][20][21].
This study, conducted in the geographic area of Mirpurkhas in Sindh, collected an extensive dataset of 422 samples to comprehensively understand the region's water quality characteristics.Leveraging Python and various machine learning libraries such as scikit-learn, XGBoost, and pandas, the research employed these tools to preprocess data, build, train, and evaluate machine learning classifiers for predicting WQI [22,23].The results indicate that Random Forest and Gradient Boosting outperformed other algorithms, achieving an exceptional accuracy rate of 99%.Following closely were SVM and XGBoost, scoring approximately 95% and 93% accuracy, respectively, while KNN and Decision Trees demonstrated accuracy rates of 88% and 87%, respectively.These findings underscore the efficacy of Python-based machine learning techniques implemented with specialized libraries in accurately predicting WQI, showcasing their potential for advancing water quality assessment methods, particularly in groundwater evaluation [24][25][26].In a similar vein, Reza Mohammadpour's article [27] employed Support Vector Machine (SVM) and two artificial neural network (ANN) methods, feed forward back propagation (FFBP) and radial basis function (RBF), for Water Quality Index (WQI) prediction in a constructed wetland.The SVM model outperformed, achieving a high coefficient of correlation (R 2 ) of 0.9984 and a low mean absolute error (MAE) of 0.0052, demonstrating its effectiveness in streamlining WQI calculations and optimizing computational efforts in free surface constructed wetland environments.Afaq Juna's article [28] supports our experimental findings, where Random Forest (RF) and XGBoost both attain an 80% accuracy for Water Quality Index (WQI).RF demonstrates precision, recall, and an F1 score of 80%, while XGBoost achieves 80% precision and recall, with an F1 score slightly lower at 79%.In contrast, KNN and SGDC exhibit the lowest WQI accuracy at 59%.Mehedi Hassan's article [19] in WQI prediction demonstrates outstanding accuracy, with Kappa, Accuracy Lower, and Accuracy Upper scores reaching 99.83, 99.17, and 99.07, respectively.These results underscore the crucial role of machine learning in precisely categorizing water quality, highlighting its significance for effective water management and corroborating our high accuracy in machine learning models for Water Quality Index prediction.
Through an interdisciplinary approach integrating environmental science and machine learning, this research aims to contribute to the advancement of accurate and efficient water quality assessment methods, utilizing the potential of artificial intelligence and predictive Water 2024, 16, 941 3 of 19 modeling implemented through Python-based tools and specialized libraries.Beyond WQI prediction, this study integrates an uncertainty analysis using the R-factor, providing a nuanced perspective on the reliability and consistency of our predictive models.The combined approach of accurate WQI prediction and uncertainty assessment contributes to a more holistic understanding of water quality dynamics in Mirpurkash.Ultimately, this research aims to inform robust decision-making processes regarding groundwater utilization, considering both the accuracy of predictions and the inherent uncertainties associated with them.The integration of artificial intelligence (AI) forecasting models, specifically machine learning classifiers, such as Random Forest, Gradient Boosting, SVM, XGBoost, KNN, and Decision Trees, has proven to be instrumental in predicting the Water Quality Index (WQI) with remarkable accuracy [19,[29][30][31][32].However, the accuracy of predictions alone does not provide a complete picture, and understanding the structure of these models is essential for a comprehensive assessment of uncertainty.
This research paper is structured to encompass several key sections.Beginning with an Introduction that highlights the significance of groundwater, particularly in the context of Mirpurkhas in Sindh, it emphasizes the challenges of water quality and the need for advanced assessment methods, summarizing previous studies on groundwater quality, traditional Water Quality Index (WQI) determination methods, their limitations, and existing research on applying machine learning in water quality assessment.The Section 2.2 outlines the steps undertaken, including data collection of 422 samples, data preprocessing, feature selection, and the utilization of machine learning algorithms such as Random Forest, Gradient Boosting, SVM, XGBoost, KNN, and Decision Trees and evaluation of uncertainty in the above machine learning algorithms.The subsequent Section 3 presents the performance metrics of these models in predicting WQI accuracy rates.Following this, the Discussion interprets the outcomes, compares model performances, addresses limitations, and suggests further research avenues.Finally, a Conclusion summarizes the key findings, reinforces the significance of employing machine learning in water quality assessment, and suggests future implications.

Study Area
Mirpurkhas, situated in the Sindh province of Pakistan, experiences an arid to semiarid climate characterized by scorching summers with temperatures often exceeding 40 degrees Celsius (104 degrees Fahrenheit) from April to September Figure 1.Monsoons, occurring between July and September, bring moderate to heavy rainfall, providing relief from the intense heat.Winters are relatively mild, ranging from around 10 to 20 degrees Celsius (50 to 68 degrees Fahrenheit).Geographically, Mirpurkhas is located near the Indus River in the southern part of Pakistan and is renowned for its agricultural activities (Figure 1) [7,33].Wells play a crucial role in providing groundwater for various purposes, including drinking water supply and agricultural irrigation, supporting the local livelihoods within this semi-arid region.
Mirpurkhas, a town located in the Sindh province of Pakistan, relies heavily on well water for various purposes.The inhabitants of Mirpurkhas primarily utilize well water for drinking, agricultural irrigation, and domestic needs [34,35].Wells in the region serve as a primary source of groundwater, supplying water to the local community.Well water in Mirpurkhas is crucial for sustaining daily activities and agricultural practices.However, like many areas reliant on groundwater, the water quality in wells can be susceptible to contamination from various sources such as agricultural runoff, industrial activities, and natural factors [36][37][38].to contamination from various sources such as agricultural runoff, industrial activities, and natural factors [36][37][38].

Methodology
The research methodology involved the collection of 422 water samples from multiple sites across Mirpurkhas, Sindh, Pakistan, during the period from April to May 2022, covering various locations deemed significant for groundwater extraction and consumption.Parameters, including pH levels, temperature, dissolved oxygen, turbidity, nitrates, and other physiochemical characteristics, were measured using standardized water testing procedures and equipment [39][40][41].The dataset comprises a total of 422 samples, filtered to 0.45 µm for further analysis, with their locations recorded using a global positioning system (GPS).Standard methods outlined by the American Public Health Association [42] were followed for analysis.The well depths vary widely, ranging from 5.7 m to 590 m, indicating a diverse dataset that includes samples from both shallow and deep aquifers.The variation in well depths is essential to consider, as it may influence groundwater characteristics, impacted by geological and hydrological factors associated with different depth ranges.Following data collection, a rigorous preprocessing phase was conducted to ensure data accuracy and suitability for machine learning analysis Figure 2.This stage encompassed handling missing values through imputation methods, outlier removal, and normalization or scaling to ensure uniformity across parameters.Feature engineering was performed to extract pertinent features and reduce dimensionality for enhanced model performance.Feature selection techniques were employed, including Variance Inflation Factor (VIF) and Information Gain (IG), to identify influential parameters affecting water quality.These methods aimed to reduce redundancy and select the most informative features for modeling [43,44].
The evaluation of groundwater suitability for human consumption involved the computation of the Water Quality Index (WQI) based on the standards established by the World Health Organization (WHO).The WQI calculation comprised a three-step procedure.Initially, an individual weight (wi) was assigned to each parameter, encompassing TDS, Sodium, Calcium, Magnesium, Bicarbonate, Sulfate, Chloride, pH, EC, Nitrate (NO 3-

Methodology
The research methodology involved the collection of 422 water samples from multiple sites across Mirpurkhas, Sindh, Pakistan, during the period from April to May 2022, covering various locations deemed significant for groundwater extraction and consumption.Parameters, including pH levels, temperature, dissolved oxygen, turbidity, nitrates, and other physiochemical characteristics, were measured using standardized water testing procedures and equipment [39][40][41].The dataset comprises a total of 422 samples, filtered to 0.45 µm for further analysis, with their locations recorded using a global positioning system (GPS).Standard methods outlined by the American Public Health Association [42] were followed for analysis.The well depths vary widely, ranging from 5.7 m to 590 m, indicating a diverse dataset that includes samples from both shallow and deep aquifers.The variation in well depths is essential to consider, as it may influence groundwater characteristics, impacted by geological and hydrological factors associated with different depth ranges.Following data collection, a rigorous preprocessing phase was conducted to ensure data accuracy and suitability for machine learning analysis Figure 2.This stage encompassed handling missing values through imputation methods, outlier removal, and normalization or scaling to ensure uniformity across parameters.Feature engineering was performed to extract pertinent features and reduce dimensionality for enhanced model performance.Feature selection techniques were employed, including Variance Inflation Factor (VIF) and Information Gain (IG), to identify influential parameters affecting water quality.These methods aimed to reduce redundancy and select the most informative features for modeling [43,44].
The evaluation of groundwater suitability for human consumption involved the computation of the Water Quality Index (WQI) based on the standards established by the World Health Organization (WHO).The WQI calculation comprised a three-step procedure.Initially, an individual weight (wi) was assigned to each parameter, encompassing TDS, Sodium, Calcium, Magnesium, Bicarbonate, Sulfate, Chloride, pH, EC, Nitrate (NO 3− ), Well Depth, and Potassium.Subsequently, the relative weight (W i ) for each parameter was determined.Lastly, quality-rating scales (q i ) and sub-indices (SI i ) were computed for each parameter, and the overall WQI was derived by summing the sub-indices.The resulting classification into five groups, ranging from Group 1 (0-25), indicating Excellent water quality, to Group 5 (above 100), signifying Very Poor to Unacceptable water quality, was employed in collaboration with machine learning models for a more comprehensive analysis.
For model development, thePython programming language was utilized, along with machine learning libraries such as scikit-learn, XGBoost, pandas, and numpy.Supervised learning algorithms, including Random Forest, Gradient Boosting, Support Vector Machine (SVM), XGBoost, K-Nearest Neighbors (KNN), and Decision Trees, were implemented and trained using the preprocessed dataset [45,46].Hyperparameter tuning through techniques like grid search and cross-validation optimized the models.The performance of the developed models was evaluated using common metrics such as accuracy, confusion matrix, Friedman test, and Nemenyi test [47,48].The uncertainty of model predictions has been evaluated using R-factor and bootstrapping.The data underwent resampling using a cross-validation technique to assess model robustness and generalizability.The training and testing ratios, as evidenced by our confusion matrices across all classifiers, fall within the range of 20% to 33%.XGB, Random Forest, and SVC are generally regarded as robust and less susceptible to overfitting, enabling them to perform effectively with a smaller testing set (20%).In contrast, KNN, Gradient Boosting, and Decision Tree models may exhibit greater sensitivity to the nuances of the training data, suggesting potential benefits from a larger testing set (30%) for a more thorough evaluation [49].Results interpretation involved comparing and analyzing the outcomes of various machine learning classifiers to identify the most accurate models for predicting the Water Quality Index (WQI).Models demonstrating the highest accuracy rates were further analyzed to understand the impact of different parameters on WQI prediction and water quality assessment.
The variables that have been used in our research to determine the Water Quality Index are shown in Figure 3.For model development, thePython programming language was utilized, along with machine learning libraries such as scikit-learn, XGBoost, pandas, and numpy.Supervised learning algorithms, including Random Forest, Gradient Boosting, Support Vector Machine (SVM), XGBoost, K-Nearest Neighbors (KNN), and Decision Trees, were implemented and trained using the preprocessed dataset [45,46].Hyperparameter tuning through techniques like grid search and cross-validation optimized the models.The performance of the developed models was evaluated using common metrics such as accuracy, confusion matrix, Friedman test, and Nemenyi test [47,48].The uncertainty of model predictions has been evaluated using R-factor and bootstrapping.
The data underwent resampling using a cross-validation technique to assess model robustness and generalizability.The training and testing ratios, as evidenced by our confusion matrices across all classifiers, fall within the range of 20% to 33%.XGB, Random Forest, and SVC are generally regarded as robust and less susceptible to overfitting, enabling them to perform effectively with a smaller testing set (20%).In contrast, KNN, Gradient Boosting, and Decision Tree models may exhibit greater sensitivity to the nuances of the training data, suggesting potential benefits from a larger testing set (30%) for a more thorough evaluation [49].Results interpretation involved comparing and analyzing the outcomes of various machine learning classifiers to identify the most accurate models for predicting the Water Quality Index (WQI).Models demonstrating the highest accuracy rates were further analyzed to understand the impact of different parameters on WQI prediction and water quality assessment.
The variables that have been used in our research to determine the Water Quality Index are shown in Figure 3.
The VIF analysis Table 1 highlights varying degrees of multicollinearity among the features considered for water quality assessment in Mirpurkhas, Sindh, Pakistan.Notably, certain parameters, such as 'TDS', 'Sodium', 'Calcium', and 'Magnesium', exhibited notably high VIF values, indicative of strong multicollinearity among these variables.Conversely, 'Potassium', 'Well Depth', and 'Nitrate (NO 3− )' demonstrated relatively lower VIF values, suggesting lower levels of multicollinearity in comparison [50].The VIF analysis Table 1 highlights varying degrees of multicollinearity among the features considered for water quality assessment in Mirpurkhas, Sindh, Pakistan.Notably, certain parameters, such as 'TDS', 'Sodium', 'Calcium', and 'Magnesium', exhibited notably high VIF values, indicative of strong multicollinearity among these variables.Conversely, 'Potassium', 'Well Depth', and 'Nitrate (NO 3-)' demonstrated relatively lower VIF values, suggesting lower levels of multicollinearity in comparison [50].The Variance Inflation Factor (VIF) values, obtained from the assessment of water quality parameters in Mirpurkhas, Sindh, reveal varying degrees of multicollinearity among features considered for predicting the Water Quality Index (WQI).Features such as 'TDS', 'Sodium', 'Calcium', and 'Magnesium' exhibit notably high VIF values, suggesting strong interdependencies among these variables.This significant multicollinearity  The Variance Inflation Factor (VIF) values, obtained from the assessment of water quality parameters in Mirpurkhas, Sindh, reveal varying degrees of multicollinearity among features considered for predicting the Water Quality Index (WQI).Features such as 'TDS', 'Sodium', 'Calcium', and 'Magnesium' exhibit notably high VIF values, suggesting strong interdependencies among these variables.This significant multicollinearity potentially impacts the accuracy of predictive models developed for water quality assessment [51].Parameters with lower VIF values, including 'Potassium', 'Well Depth', and 'Nitrate (NO 3− )', indicate weaker correlations, potentially posing less influence on multicollinearity issues within predictive models.Addressing high multicollinearity, particularly among variables with elevated VIF values, becomes crucial in enhancing the reliability and precision of predictive models for more accurate water quality assessment in the Mirpurkhas region.
Tree-based models (Decision Trees, Random Forest, Gradient Boosting, XGBoost) and K-Nearest Neighbors (KNN) are generally less sensitive to multicollinearity compared to linear models like linear regression or logistic regression.Support Vector Machine (SVM) can be sensitive to multicollinearity to some extent, depending on the kernel used; therefore, we use a linear kernel with SVM [52].The linear kernel computes the dot product between two observations.It is less sensitive to multicollinearity because it effectively works in the original feature space without introducing non-linear transformations.
Although elevated VIF values may signal multicollinearity and potential challenges in linear models, opting to include all variables based on Information Gain remains a viable strategy, particularly when employing tree-based models such as KNN, RF, Gradient Boosting, XGBoost, and Decision Trees.In addition, to make SVM less sensitive to multicollinearity, we use a linear kernel in our research.Nevertheless, it is crucial to empirically validate this decision by evaluating the model's performance on independent datasets or employing robust cross-validation techniques.
The Information Gain (IG) analysis Table 2 highlights the relevance of various features in predicting the Water Quality Index (WQI) in Mirpurkhas, Sindh, Pakistan.Features such as 'Nitrate (NO 3 -N)', 'Calcium', 'Sodium', 'Sulfate', 'Chloride', 'Potassium', and 'Magnesium' exhibit higher IG values, indicating their considerable relevance in predicting WQI.Conversely, 'pH', 'Bicarbonate', 'Well Depth', 'EC', and 'TDS' present relatively lower IG values, suggesting comparatively lesser impact in predicting the WQI.Understanding the relevance of these features assists in selecting the most influential variables for the development of accurate predictive models for water quality assessment.However, it is important to note that while IG values help identify influential features, the absolute value of IG alone might not necessarily determine the direct impact or importance of a feature in predicting the WQI [53].Other factors, such as domain knowledge, the nature of the dataset, and the specific context of the water quality assessment, should also be considered when selecting influential variables for building accurate predictive models.Therefore, while IG values provide valuable insights, the selection of the most influential variables should involve a comprehensive analysis that integrates multiple factors beyond IG values alone.

Uncertainty Analysis 2.3.1. R-Factor
While various factors contribute to the uncertainty in predicting Water Quality Index (WQI), including modeling, sampling errors, data preparation, and pre-processing, this study specifically addresses the uncertainty linked to individual model structures and input parameter selection.To assess model structure uncertainty, the analysis involves examining a set of three predicted WQI values during the testing phase for each observed WQI.These predictions are generated by the aforementioned predictive models.
The mean and standard deviation are computed for each predicted set, serving as parameters for a designated normal distribution function.Employing the 'Monte Carlo' simulation method, 1000 WQI values are generated for each observed value based on this distribution.While other methods like Latin Hypercube [54], Lagged Average [55], and Multimodal Nesting [56] are utilized for sample generation, the Monte Carlo technique has demonstrated greater applicability, especially in hydrology and water-related sciences [57].To quantify the uncertainty associated with WQI prediction, the 95% prediction confidence interval (i.e., the interval between the 97.5% and 2.5% quantiles), known as the prediction uncertainty of 95% (95PPI), is determined using the generated WQI values for each observed WQI.Specifically, the uncertainty is computed using the defined R-factor (Equation ( 1)).
The formula for the calculation of the R-factor is expressed as: R-factor = s p s x Here, s x represents the standard deviation of the observed values, and s p is determined using Equation (2): In this equation, J denotes the number of observed data points, while U Li and L Li correspond to the i-th values of the upper quartile (97.5%) and lower quartile (2.5%) of the 95% prediction confidence interval band (95PPI).
Other approaches, such as the Coefficient of Variation (CV), Prediction Interval Coverage Probability (PICP), and Prediction Interval Normalized Root-mean-square Width (PINRW), have been proposed as substitutes for the R-factor method [58].Nevertheless, these alternative methods solely rely on either observed or predicted data.In contrast, the R-factor method takes into account both observed and predicted data, making it a more comprehensive metric for characterizing prediction uncertainty [59,60].The inherent uncertainty in predictive models arises from various sources, including the complexity of the underlying data and the dynamic nature of water quality parameters.The structure of machine learning models contributes significantly to this uncertainty, and exploring their characteristics sheds light on the reliability of predictions.

Bootstrapping
In the uncertainty analysis of predictive models for Water Quality Index, generating prediction intervals is crucial for understanding the range of possible values for each prediction.This step involves using bootstrapping, a resampling technique that provides a measure of the uncertainty associated with the model's predictions.Bootstrapping involves creating multiple bootstrap samples by randomly drawing observations with replacements from the original dataset.For each bootstrap sample, the model is trained, and predictions are made on the test set.This process is repeated numerous times (in our case, 1000 iterations), resulting in a distribution of predicted values for each data point Figure 4.
The Mean Squared Error on the test set (0.108) indicates the average squared difference between the actual Water Quality Index values and the predicted values.A lower MSE generally suggests better model performance, demonstrating that the model's predictions are, on average, close to the true values.However, the MSE alone may not provide a complete picture, as it does not account for the uncertainty in the predictions.This is where prediction intervals come into play.The generated prediction intervals using bootstrapping offer insights into the variability and uncertainty associated with the model's predictions.The lower and upper bounds of the intervals (calculated at the 2.5th and 97.5th percentiles, respectively) represent the plausible range within which the true Water Quality Index values are likely to fall.The scatter plot (Figure 4) of actual versus predicted values, along with the shaded gray area representing the prediction intervals, provides a clear visualization of the model's performance and the associated uncertainty.The narrower the prediction intervals, the more confident we can be in the model's predictions.
predictions.The lower and upper bounds of the intervals (calculated at the 2.5th and 97.5th percentiles, respectively) represent the plausible range within which the true Water Quality Index values are likely to fall.The scatter plot (Figure 4) of actual versus predicted values, along with the shaded gray area representing the prediction intervals, provides a clear visualization of the model's performance and the associated uncertainty.The narrower the prediction intervals, the more confident we can be in the model's predictions.A narrow prediction interval suggests that the model has a high degree of certainty in its predictions.A wider prediction interval indicates higher uncertainty, emphasizing the need for caution when relying on specific predictions in these regions.By incorporating bootstrapping to generate prediction intervals, we not only assess the model's accuracy through MSE but also gain a comprehensive understanding of the uncertainty inherent in the Water Quality Index predictions.This holistic approach enhances the reliability and robustness of the predictive modeling process, making it more applicable and informative for water quality management and decision-making.

Random Forest and Gradient Boosting
These ensemble methods aggregate predictions from multiple decision trees, which individually capture different patterns in the data.The robustness of Random Forest and Gradient Boosting lies in their ability to mitigate overfitting and enhance predictive accuracy.However, the ensemble nature introduces uncertainty due to the variability in individual tree predictions [61,62].A narrow prediction interval suggests that the model has a high degree of certainty in its predictions.A prediction interval indicates higher uncertainty, emphasizing the need for caution when relying on specific predictions in these regions.By incorporating bootstrapping to generate prediction intervals, we not only assess the model's accuracy through MSE but also gain a comprehensive understanding of the uncertainty inherent in the Water Quality Index predictions.This holistic approach enhances the reliability and robustness of the predictive modeling process, making it more applicable and informative for water quality management and decision-making.

Random Forest and Gradient Boosting
These ensemble methods aggregate predictions from multiple decision trees, which individually capture different patterns in the data.The robustness of Random Forest and Gradient Boosting lies in their ability to mitigate overfitting and enhance predictive accuracy.However, the ensemble nature introduces uncertainty due to the variability in individual tree predictions [61,62].

Support Vector Machine (SVM) and XGBoost
SVM focuses on finding the hyperplane that best separates data into classes, while XGBoost optimizes the performance of weak learners through boosting.The structural complexity of SVM and the iterative refinement process of XGBoost contribute to their predictive power but also introduce uncertainty, particularly in capturing non-linear relationships and intricate patterns [63].

K-Nearest Neighbors (KNN) and Decision Trees
KNN relies on proximity-based classification, and Decision Trees partition the data based on feature splits.These models are interpretable and less complex, but their simplicity can lead to uncertainty when faced with intricate relationships in the data.KNN's reliance on neighbors introduces variability, while Decision Trees' sensitivity to data changes may affect stability [64].
Understanding the interplay between model structure and uncertainty is crucial for reliable water quality assessments.The ensemble nature of Random Forest and Gradient Boosting, along with the iterative optimization in SVM and XGBoost, contributes to their robust performance but introduces variability.Simpler models like KNN and Decision Trees may be more interpretable but can exhibit uncertainty in capturing complex relationships.The uncertainty associated with each model's structure emphasizes the importance of a nuanced approach to water quality prediction.Integrating uncertainty analysis, such as the R-factor, alongside accurate predictions allows for a more informed and cautious interpretation of water quality assessments, fostering a holistic understanding for effective decision-making, as shown in 3.

AUC-Based Performance Evaluation
The AUC values, as presented in Table 4 and Figure 5, offer valuable insights into the performance of various machine learning models in predicting the Water Quality Index.Decision Trees (DTs) exhibit reasonable discriminatory power with an AUC of 0.77, while the Random Forest (RF) and XGBoost models outperform, showcasing high AUC values of 0.95 and 0.96, respectively.These results underscore their robust performance in accurately categorizing water quality.The Gradient Boosting model also demonstrates excellent discriminatory power, with an AUC of 0.95.The Support Vector Machine (SVM) performs admirably with an AUC of 0.92, indicating effective classification.K-Nearest Neighbors (KNN) exhibits good discriminatory power, though slightly lower compared to some other models, with an AUC of 0.84.These varying AUC values emphasize the importance of selecting models with superior discriminatory capabilities when predicting the Water Quality Index, contributing to informed decision-making in environmental management.The ROC curve for Class 5 closely resembles that of Class 1, positioned near the ideal top-left corner Figure 6.The AUC score of 0.99 highlights exceptional performance in distinguishing Class 5 from other classes.The ROC curves provide a visual representation of  The ROC curve for Class 5 closely resembles that of Class 1, positioned near the ideal top-left corner Figure 6.The AUC score of 0.99 highlights exceptional performance in distinguishing Class 5 from other classes.The ROC curves provide a visual representation of the trade-off between sensitivity and specificity for each class, showcasing that all the classifiers performed well to make accurate predictions regarding WQI predictions in our experiment.

Statistical Analysis Using Friedman Test
The Friedman Test was employed to assess the overall performance variation among multiple machine learning algorithms utilized for predicting the Water Quality Index (WQI) in the Mirpurkhas region of Sindh, Pakistan Table 5.The computed Friedman Test statistic yielded an F-value of 5.0 with a corresponding p-value of 0.4159 [65].This analysis examines whether there exists a statistically significant difference in the performance of the various machine learning models employed for water quality assessment.The obtained p-value of 0.4159, exceeding the conventional significance level of 0.05, indicates insufficient evidence to reject the null hypothesis.Therefore, based on this statistical test, there appears to be no significant difference observed in the predictive performance of the machine learning algorithms utilized for WQI prediction in the Mirpurkhas region [66].

Statistical Analysis Using Friedman Test
The Friedman Test was employed to assess the overall performance variation among multiple machine learning algorithms utilized for predicting the Water Quality Index (WQI) in the Mirpurkhas region of Sindh, Pakistan Table 5.The computed Friedman Test statistic yielded an F-value of 5.0 with a corresponding p-value of 0.4159 [65].This analysis examines whether there exists a statistically significant difference in the performance of the various  The Water Quality Index (WQI) ranges in Table 13 were calculated based on a general classification scheme.These ranges are commonly used in water quality assessments, and the specific values may vary depending on the guidelines or standards adopted by environmental agencies.

Discussion
This study focused on predicting the Water Quality Index (WQI) in the Mirpurkhas region, Sindh, Pakistan, utilizing various machine learning algorithms and exploring the significance of model structures and variable importance.The extensive analysis encompassed data collection, preprocessing, model development, performance evaluation, statistical tests, and uncertainty analysis, providing a comprehensive understanding of the water quality assessment process.
The AUC-based performance evaluation shed light on the efficacy of machine learning models in discriminating between different water quality classes.Notably, Random Forest and XGBoost demonstrated high AUC values of 0.99 and 0.95, respectively, indicating robust discriminatory power.Gradient Boosting and SVM also exhibited excellent performance, with AUC values of 0.95 and 0.93.Decision Trees, while showing reasonable discriminatory power (AUC of 0.87), stood out as a viable model.The findings of this study shed light on the effectiveness of machine learning models in predicting the Water Quality Index (WQI) for the Mirpurkhas region in Sindh, Pakistan.Notably, XGBoost and Gradient Boosting demonstrated remarkable accuracy rates of 95%, outperforming other models.Random Forest closely followed suit, showcasing its effectiveness in WQI prediction.These outcomes align with the growing body of literature emphasizing the potential of machine learning in water quality assessment.The high accuracy of XGBoost, Gradient Boosting, and Random Forest models suggests their robust performance in capturing the intricate relationships among water quality parameters.Such findings resonate with studies conducted in various regions, where ensemble methods and tree-based models have shown superiority in water quality prediction [70].The ability of these models to handle non-linear relationships and complex patterns in water quality data enhances their utility in environmental monitoring.Support Vector Machine (SVM) also exhibited commendable performance, with an AUC of 0.93, indicating effective classification.This aligns with studies that have highlighted the versatility of SVM in handling diverse datasets and its efficacy in water quality modeling [71,72].However, it is essential to acknowledge the variations in model sensitivity to multicollinearity, as indicated by the Variance Inflation Factor (VIF) analysis.Features such as 'TDS', 'Sodium', 'Calcium', and 'Magnesium' exhibited high VIF values, suggesting strong interdependencies among these variables.While tree-based models are generally less sensitive to multicollinearity, addressing high VIF values remains crucial for enhancing the reliability of predictive models, especially in linear models like SVM.Information Gain analysis highlighted the relevance of specific physiochemical variables, such as 'Nitrate (NO 3 -N)', 'Calcium', and 'Sodium', in WQI prediction.The study recommended future research to address limitations, including dataset size and variable scope.Advanced strategies like feature engineering, ensemble methods, and integration of remote sensing data were proposed to enhance predictive accuracy and provide a nuanced understanding of water quality dynamics.The Information Gain (IG) analysis provided insights into the relevance of different features in predicting WQI.Variables such as 'Nitrate (NO 3 -N)', 'Calcium', 'Sodium', 'Sulfate', 'Chloride', 'Potassium', and 'Magnesium' exhibited higher IG values, underscoring their considerable impact on water quality assessment.This finding aligns with existing literature emphasizing the importance of specific physiochemical variables in influencing water quality [73,74].
The Friedman Test, employed to assess overall performance variation, yielded an F-value of 5.0 with a p-value of 0.4159.The non-significant p-value suggests consistent performance across machine learning algorithms, emphasizing their similarity in predictive accuracy for WQI in the Mirpurkhas region.The lack of significant differences supports the reliability and consistency of the models.The Nemenyi Test for pairwise comparisons provided critical distance values, offering insights into statistically significant differences in algorithm performance.The absence of significant differences between certain pairs of algorithms highlighted their comparable performance.While the critical distance values can guide algorithm ranking, the overall consistency observed in the Friedman Test aligns with the notion that various algorithms perform similarly in WQI prediction.Confusion matrices detailed the classification results for each machine learning algorithm, presenting true positives, false positives, false negatives, and true negatives for each water quality class.These matrices provide a granular view of model errors and successes, aiding in the interpretation of classification performance.The consistently high true-positive rates and low false-positive rates across classifiers reflect the models' abilities to accurately predict water quality classes.Defined water quality ranges and corresponding classes facilitated the interpretation of model predictions.Approximately 88.63% of WQI values fell into Class 5, representing very poor to unacceptable water quality.This distribution underscores the predominance of deteriorated water quality in the Mirpurkhas region, emphasizing the urgency of effective water resource management.The uncertainty analysis, incorporating the R-factor and bootstrapping, added a crucial layer of insight into the reliability of model predictions.The R-factor addressed structural uncertainty, while bootstrapping provided prediction intervals, aiding in understanding the range of possible values for each prediction.The approach acknowledged and quantified uncertainty, contributing to a more informed interpretation of water quality assessments.Uncertainty analysis, including the R-factor and bootstrapping, contributed to a nuanced understanding of predictive model reliability.The Monte Carlo simulation method provided a robust approach to assessing the uncertainty associated with WQI predictions.The incorporation of bootstrapping not only assessed model accuracy through Mean Squared Error (MSE) but also provided valuable insights into prediction intervals, offering a more comprehensive understanding of uncertainty in the models.

Conclusions
In conclusion, this study demonstrates the efficacy of machine learning models, particularly XGBoost, Gradient Boosting, Random Forest, and SVM, in predicting the Water Quality Index for the Mirpurkhas region.The high accuracy rates of these models underscore their potential for precise water quality assessment.Feature importance analysis highlights the critical role of specific variables, emphasizing the need for targeted monitoring and management.This study's findings contribute to the broader discourse on machine learning applications in environmental science.The identified variables and models can serve as valuable tools for water resource management, aiding in informed decision-making.Despite the promising results, it is crucial to acknowledge the study's limitations, including dataset size and variable scope.Future research should explore advanced strategies and incorporate additional parameters for a more comprehensive understanding of water quality dynamics in the region.Overall, this study not only showcases the capabilities of machine learning in water quality prediction but also underscores the importance of considering uncertainties for robust environmental assessments.

Figure 1 .
Figure 1.Study area and groundwater sampling points.

Figure 1 .
Figure 1.Study area and groundwater sampling points.

Figure 2 .
Figure 2. Methodology used for predicting Water Quality Index for given input variables.

Figure 2 .
Figure 2. Methodology used for predicting Water Quality Index for given input variables.

Figure 3 .
Figure 3. Input parameters used for Water Quality Index prediction and assessment.

Figure 3 .
Figure 3. Input parameters used for Water Quality Index prediction and assessment.

Figure 4 .
Figure 4. Predicted values, along with the prediction intervals, using bootstrapping.

Figure 4 .
Figure 4. Predicted values, along with the prediction intervals, using bootstrapping.

Figure 5 .
Figure 5.The AUC score for different classifiers used for Water Quality Index prediction and assessment.

Figure 5 .
Figure 5.The AUC score for different classifiers used for Water Quality Index prediction and assessment.

Water 2024 , 20 Figure 6 .
Figure 6.The ROC Curves for different classifiers used for Water Quality Index prediction and assessment.

Figure 6 .
Figure 6.The ROC Curves for different classifiers used for Water Quality Index prediction and assessment.

Table 1 .
Variance Inflation Factor (VIF) values indicating multicollinearity among water quality assessment features in Mirpurkhas, Sindh, Pakistan.

Table 1 .
Variance Inflation Factor (VIF) values indicating multicollinearity among water quality assessment features in Mirpurkhas, Sindh, Pakistan.

Table 2 .
Information Gain (IG) values indicating corresponding information gain for each water quality assessment feature in Mirpurkhas, Sindh, Pakistan.

Table 3 .
R-factor obtained for all the machine learning algorithms in WQI prediction.

Table 4 .
Performance Evaluation of Machine Learning Algorithms in WQI Prediction.

Table 7 .
The table represents a confusion matrix detailing the classification results for a multi-class classification for XGB Classifier.

Table 8 .
The table represents a confusion matrix detailing the classification results for a multi-class classification for Random Forest Classifier.

Table 9 .
The table represents a confusion matrix detailing the classification results for a multi-class classification for SVC (Support Vector Classifier with probability = True).

Table 10 .
The table represents a confusion matrix detailing the classification results for a multi-class classification for KNN classifier.

Table 11 .
The table represents a confusion matrix detailing the classification results for a multi-class classification for Gradient Boosting Classifier.

Table 12 .
The table represents a confusion matrix detailing the classification results for a multi-class classification for Decision Tree Classifier.

Table 13 .
Water quality ranges and their corresponding classes.

Table 14 .
Testing sample results for RF classifier.