From Indices to Algorithms: A Hybrid Framework of Water Quality Assessment Using WQI and Machine Learning Under WHO and FAO Standards

Güneş Şen, Senem

doi:10.3390/w17213050

Open AccessArticle

From Indices to Algorithms: A Hybrid Framework of Water Quality Assessment Using WQI and Machine Learning Under WHO and FAO Standards

by

Senem Güneş Şen

Department of Forest Engineering, Faculty of Forestry, Kastamonu University, 37150 Kastamonu, Türkiye

Water 2025, 17(21), 3050; https://doi.org/10.3390/w17213050

Submission received: 8 October 2025 / Revised: 18 October 2025 / Accepted: 23 October 2025 / Published: 24 October 2025

(This article belongs to the Special Issue Water Modeling Using Combined Machine Learning and Fieldwork Investigation)

Download

Browse Figures

Versions Notes

Abstract

Assessing water quality is essential for the sustainable use of freshwater resources, especially under increasing climatic and agricultural pressures. Small irrigation ponds are particularly sensitive to pollution due to their limited buffering capacity. This study evaluates the water quality of the Taşçılar and Yumurtacılar ponds in Kastamonu, Türkiye, by combining conventional Water Quality Indices (WQI) with machine-learning-based interpretation. Physicochemical parameters were measured monthly for one year, and water quality was classified according to WHO and FAO thresholds using the CCME-WQI and weighted arithmetic methods. The integrated approach identified significant differences among standards and highlighted the parameters most responsible for water quality degradation. Machine-learning models improved the interpretation of these indices and supported consistent classification across datasets. The findings emphasize that coupling index-based and data-driven methods can enhance routine monitoring and provide actionable insights for sustainable irrigation-water management, thereby contributing to achieving the SDGs 6, 13, and 15.

Keywords:

water quality index; WA-WQI; CCME-WQI; logistic regression; machine learning; decision tree; random forest; XGBoost

1. Introduction

Water resources globally face mounting stress from climate change, population growth, urbanization, industrial discharge, and intensified agricultural inputs. Small reservoirs and irrigation ponds are particularly vulnerable due to their limited buffering capacity and low self-renewal ability. Issues such as eutrophication, nutrient enrichment, increased suspended solids, and chemical contamination can lead to rapid deterioration in the quality of small-scale water bodies [1,2].

Water quality (WQ) reflects the combined influence of multiple physicochemical and biological parameters that determine a water body’s suitability for drinking, irrigation, and supporting ecosystems. Indicators such as pH, electrical conductivity (EC), dissolved oxygen (DO), total phosphorus (TP), biochemical oxygen demand (BOD), and suspended solids (SS) are widely used to characterize water quality [3]. Because interpreting all these parameters simultaneously is challenging for decision-makers, holistic indices such as the Water Quality Index (WQI) have been developed to facilitate this process. The WQI condenses multiple indicators into a single score, allowing for classification from “very good” to “poor” [4,5]. However, the choice of parameter thresholds, weighting schemes, and aggregation functions can introduce significant uncertainty and subjectivity into WQI assessments [6,7]. Recent efforts propose that machine learning can help optimize weights and aggregation rules, thereby reducing model uncertainty [5,8].

Traditional laboratory-based analyses provide high precision, but are often costly, labor-intensive, and may fail to capture abrupt events. These limitations have motivated a shift toward data-driven approaches, especially machine learning (ML) and deep learning (DL) techniques [9,10,11,12]. Data-driven modeling offers predictive capabilities by learning complex, nonlinear relationships within observational datasets [13]. Such approaches can either supplement or replace process-based models, offering more flexible, scalable solutions [14,15,16].

ML algorithms, such as Random Forest (RF), Extreme Gradient Boosting (XGBoost), Support Vector Machines (SVM), and k-Nearest Neighbors (k-NN), have demonstrated strong capabilities in predicting WQI classes by capturing hidden nonlinear interactions [17,18]. DL architectures, such as LSTM and CNN, can uncover spatial–temporal patterns in water quality dynamics [19,20]. Explainable AI (XAI) tools, such as SHAP and LIME, are increasingly being adopted to interpret model decisions and enhance policy relevance [21]. In particular, optimizing WQI models with machine learning has been demonstrated to reduce uncertainties stemming from subjective choices in index construction [5,8].

The objective of this study is to evaluate water quality in the Taşçılar and Yumurtacılar ponds in Kastamonu, Türkiye, based on multiple physicochemical parameters; compare classification results using CCME-WQI and WA-WQI under both WHO and FAO thresholds; and assess the predictability of these results using ML algorithms (Logistic Regression, Decision Tree, Random Forest, XGBoost). By integrating index-based and data-driven approaches, this work proposes a scalable hybrid framework for small irrigation reservoirs in similar hydroclimatic settings. The proposed methodology aims not only for regional applicability but also for transferability to small water bodies under analogous environmental conditions.

2. Materials and Methods

2.1. Study Area

The study area consists of the Taşçılar (41°28′14″ N, 33°24′01″ E) and Yumurtacılar (41°28′51″ N, 33°26′29″ E) reservoirs, located within the administrative boundaries of Daday district, Kastamonu Province, in the Western Black Sea Region of Türkiye. Both reservoirs are located in the upper Kızılırmak Basin, characterized by complex topography and steep slopes. Figure 1 illustrates the precise geographic location of the reservoirs, including basin boundaries, elevation gradients, and the surrounding drainage network.

The Daday district has a continental climate, with snowy winters, cool and rainy springs and autumns, and hot, dry summers [22]. According to long-term meteorological data from the General Directorate of Meteorology [23], the average annual temperature is 9.9 °C and the mean annual precipitation is 552.7 mm. The region’s geological formation dates back to the Triassic and Lower Jurassic periods [24]. Soils are generally classified as chestnut-colored, medium-depth (50–90 cm) with moderate slopes (20–30%) and a medium risk of water erosion [25].

The Taşçılar Reservoir was constructed for irrigation purposes, with a total capacity of 1,020,000 m³, an irrigation area of 126 ha, and an annual water withdrawal of 280,000 m³. It is an earthfill dam, and its basin is dominated by south-facing slopes, 83% of which are categorized as very steep and rugged, ranging in elevation from 671 m to 1652 m [26].

The Yumurtacılar Reservoir also serves irrigation purposes, with a total capacity of 930,000 m³, an irrigation area of 124 ha, and an annual water withdrawal of 230,000 m³. Similar to Taşçılar, it is an earthfill dam, and 53% of its basin area consists of steep to very steep slopes, with elevations ranging from 860 m to 1243 m [26].

2.2. Dataset and Data Analysis

Seven different sampling points were identified at each of the Taşçılar and Yumurtacılar Reservoirs located in Kastamonu. Water sampling was conducted over 12 consecutive months (May 2024–April 2025) to capture seasonal variability in hydrochemical conditions across the irrigation season, autumn rainfall period, winter low-temperature and low-flow conditions, and spring snowmelt transition. Previous studies on small ponds and reservoirs have shown that sampling in representative months (e.g., January, April, July, and October) can provide a general overview of seasonal variation [27,28,29,30,31]. However, monthly sampling throughout an entire hydrological year allows for a more reliable evaluation of temporal dynamics and inter-seasonal transitions [32,33,34,35], which is consistent with recent international approaches that emphasize continuous temporal coverage for detecting hydrochemical responses to climatic and anthropogenic drivers [36,37,38].

Before sampling, amber-colored glass bottles were thoroughly rinsed with distilled water to ensure that the chemical characteristics of the water samples remained unchanged and were not affected by external factors. To minimize temperature variations and diurnal effects, all samples were collected between 09:00 and 12:00 in each season under similar meteorological conditions. Immediately after collection, the samples were transported to the laboratory in insulated coolers at low temperature to prevent any chemical or biological alteration. All field measurements were performed using portable instruments that were calibrated before each sampling event in accordance with the manufacturers’ specifications.

A total of 336 water samples were collected twice per month from 14 different sampling points, and various measurements and analyses were performed.

The pH, electrical conductivity (EC), temperature, salinity, total dissolved solids (TDS), and dissolved oxygen (DO) parameters of the water samples were determined using the AZ Instrument 86031 Combo multi-parameter measurement device (AZ Instrument Corp., Taichung, Taiwan). Turbidity values were measured using the Milwaukee Mi415 turbidity meter (Milwaukee Instruments, Rocky Mount, NC, USA). Analyses for calcium, magnesium, sodium, potassium, copper, lead, phosphorus, chromium, nickel, and silver elements were performed using an ICP-OES device (Spectro Blue, Spectro Analytical Instruments GmbH, Kleve, Germany). Total suspended solids (TSS) analyses were performed using the gravimetric method [39]. Nitrite and nitrate analyses were performed using the PhotoLab^® 7600 UV-VIS spectrophotometer with special photometric test kits (WTW—Xylem Analytics Germany GmbH, Weilheim, Germany). The Matriks Nitrate test kit used in nitrate analysis was 0.4–111 mg/L (Cat. No: 2.187.025), while the Matrix Nitrite test kit used in nitrite analysis has a measurement range of 0.03–2.3 mg/L (Cat. No: 1.214.323) (Matriks Kimya Analitik Sistemler Ltd. Şti., Ankara, Türkiye) [40].

All analyses were performed in the laboratories of Kastamonu University, following the standard methods outlined in the “Water Pollution Control Regulation, Sampling and Analysis Methods Bulletin” [41]. To ensure the accuracy and consistency of the analyses, all analytical instruments were calibrated according to the manufacturers’ instructions prior to each sampling session, and procedural blanks were used where applicable. All field and laboratory analyses were conducted by the same researcher, ensuring methodological consistency and minimizing external variability. Each measurement result was evaluated in conjunction with device logs and field observations. When verification was required, the retained water sample was re-analyzed to confirm the consistency of the measurement results. As a result, no missing or erroneous data were obtained during the study period.

2.3. Determination of Water Quality Classes

The data obtained from the parameters mentioned earlier, based on analyses conducted on samples taken from the reservoirs, were evaluated using different water quality determination indices commonly used in the literature, and water quality classes were determined. These indices are the Canadian Council of Ministers of the Environment Water Quality Index (CCME WQI) and the Weighted Arithmetic Water Quality Index (WA-WQI). The relevant water quality guidelines of the WHO and FAO were used to establish the classification threshold values for the water quality parameters required when applying these water quality determination methods [42,43].

To facilitate a direct comparison between drinking and irrigation water standards, the threshold limits adopted from the World Health Organization (WHO) [42] and the Food and Agriculture Organization (FAO) [43] guidelines are summarized in Table 1.

These indices were chosen because they enable reservoirs to be examined in a multifaceted manner, both in terms of drinking water quality and agricultural irrigation [44,45,46,47].

2.3.1. Canadian Council of Ministers of the Environment Water Quality Index (CCME WQI)

This method is a comprehensive index that evaluates water quality in five classes based on comparing measured parameters with national or international standards. Three key factors are considered in calculating the index: coverage (F1), frequency (F2), and deviation (F3). These components are combined to produce a score ranging from 0 to 100, and the results are categorized as “Excellent,” “Good,” “Fair,” “Marginal,” and “Poor” [45].

F1 (Coverage): Percentage of parameters not meeting standards,
F2 (Frequency): Ratio of the total number of measurements exceeding standards,
F3 (Deviation): Magnitude of deviation from standard values.

These three components are combined to obtain a quality score ranging from 0 to 100. The formula is given below [48].

C C M E - W Q I = 100 - (\frac{\sqrt{F_{1}^{2} + F_{2}^{2} + F_{3}^{2}}}{1.732})

In the study, the Canadian Council of Ministers of the Environment—World Health Organization Water Quality Index (CCME WQI-WHO) version was calculated using the limit values set by the World Health Organization for drinking water, and the suitability of the ponds for human consumption was tested [49].

In the study, the same methodological approach was also applied to the widely accepted criteria of the Food and Agriculture Organization (FAO) to determine the suitability of irrigation water [43].

2.3.2. Weighted Arithmetic Water Quality Index (WA-WQI)

This method involves multiplying the normalized concentrations of the parameters by their standard values, using predefined weight coefficients, and then obtaining the index from the sum of these values [50]. Thus, a more balanced evaluation has been made by considering the relative importance of the parameters. The general formula is given below:

W Q I = \frac{\sum W_{i} Q_{i}}{\sum W_{i}}

In the formula;

W_{i}

The weight coefficient of the i-th parameter, indicating its relative importance in the water quality assessment.

Q_{i}

The quality rating of the i-th parameter is calculated using the measured value (

V_{i}

), the ideal value (

V_{0}

), and the standard value (

S_{i}

) [50].

In this study, the index was calculated according to both WHO standards (WA-WQI-WHO) and FAO irrigation criteria (WA-WQI-FAO).

The Friedman Test was used to evaluate whether the results obtained according to water quality indices differed from each other. The Nemenyi post hoc test was used to determine which methods differed significantly among groups [51].

2.4. Machine Learning Approach for Determining Water Quality

In machine learning applications, four different algorithms—Logistic Regression (LR), Decision Tree (DT), Random Forest (RF), and Extreme Gradient Boosting (XGBoost)—were employed, and their comparative prediction performance was analyzed. The analyses were performed using the Python programming language (version 3.11.11). Specifically, the scikit-learn library was utilized for preprocessing, regression, and tree-based models; the XGBoost package was employed for gradient boosting; and supporting libraries, such as NumPy and Pandas were utilized for data handling. All algorithms were implemented as supervised learning methods, trained using input variables and corresponding target classes.

Logistic Regression (LR): Logistic Regression (LR) examines the probability of a specific outcome occurring based on explanatory variables. It models the log odds of occurrence using a linear combination of predictors [52]. Detailed mathematical expressions are provided in the Supplementary Information (Equation (S1)).

Decision Tree (DT): is a rule-based method that recursively divides data into non-linear subregions. The tree structure consists of root, decision, and leaf nodes representing data splits and predicted classes [53]. The outputs obtained from leaf nodes generate classification or prediction results [54].

Random Forest (RF): is an ensemble algorithm that combines multiple decision trees, where predictions are aggregated through majority voting to enhance accuracy and reduce overfitting [55,56].

Extreme Gradient Boosting Modeling (XGBoost): is an ensemble learning technique based on the gradient boosting approach and decision trees, offering high scalability and computational efficiency. It incrementally minimizes the loss function while regularizing tree complexity through gamma (γ) and lambda (λ) parameters to prevent overfitting [57]. Detailed formulations for the objective and regularization functions are presented in the Supplementary Information (Equations (S2) and (S3)).

2.5. Model Optimization

To minimize the effect of scale differences between variables, all features were standardized using the StandardScaler method in scikit-learn. This technique transforms variables to have a mean of zero and a standard deviation of one, ensuring consistent scaling before model training [58]. The mathematical expression for this transformation is provided in the Supplementary Information (Equation (S4)).

The tested hyperparameter ranges and their optimal values are summarized in Table 2 and Table 3. The ranges were selected based on values commonly reported in the literature to maintain an optimal balance between model complexity and generalization capability.

Tables present the hyperparameter ranges tested for each algorithm, along with the optimal values. The best results in logistic regression were obtained with an L2 penalty, the LBFGS solver, balanced class weights, and a high iteration limit. For the decision tree, the optimal configuration was mainly influenced by maximum depth and minimum sample thresholds. In the random forest, the number of trees and feature subset selection were key determinants of performance. The XGBoost algorithm achieved the highest predictive accuracy by jointly optimizing learning rate, depth, and regularization parameters.

Hyperparameters were optimized using the GridSearchCV method with 5-fold cross-validation. This approach systematically evaluates all possible parameter combinations, measures performance, and selects the configuration that yields the highest accuracy and lowest validation error [59].

2.6. Model Evaluation Methods

The predictive performance of all classification models was assessed using multiple evaluation metrics to ensure a comprehensive and reliable comparison. These metrics included Accuracy, F1-macro score, Receiver Operating Characteristic–Area Under Curve (ROC–AUC), Balanced Accuracy, Cohen’s Kappa, Matthews Correlation Coefficient (MCC), Expected Calibration Error (ECE), and Brier Score.

This multi-metric approach enabled a detailed assessment of model performance in terms of classification success, robustness against class imbalance, and calibration quality.

Accuracy is the ratio of correctly classified examples to all examples. It is used to measure overall model performance [60].

F1-macro represents the harmonic mean of precision and recall averaged across all classes, providing a balanced evaluation under imbalanced data conditions [61].

Receiver Operating Characteristic–Area Under Curve (ROC–AUC) quantifies the model’s ability to distinguish between classes under varying thresholds; a higher AUC value indicates stronger discriminative power [62].

Balanced Accuracy corrects for unequal class distributions by averaging recall scores across all classes [63].

Cohen’s Kappa evaluates classification agreement while accounting for random chance [64].

Matthews Correlation Coefficient (MCC) jointly considers true and false classifications, making it suitable for imbalanced datasets [60].

Expected Calibration Error (ECE) assesses how closely predicted probabilities align with observed outcomes, reflecting the model’s calibration reliability [65].

Finally, the Brier Score measures the mean squared difference between predicted probabilities and actual class labels, where lower scores indicate better-calibrated models [66].

All mathematical formulations of these metrics are provided in the Supplementary Information (Equations (S5)–(S12)) for clarity and reproducibility.

2.7. Relative Ranking Method

Comparing the performance of machine learning models is challenging because different metrics have varying scales and directions. In this study, the relative ranking method proposed by Poudel and Cao [67] was applied to provide a holistic evaluation of the models [68,69,70].

In this method, each performance metric (e.g., Accuracy, Macro-F1, ROC-AUC, Balanced Accuracy, Kappa, MCC, ECE, Brier) was considered, and models were ranked accordingly. For metrics in the “higher is better” category (e.g., Accuracy, F1, ROC-AUC), the model with the highest value was assigned rank 1, while the model with the lowest value was assigned rank k (where k = number of models). For metrics in the “lower is better” category (e.g., ECE, Brier), the ranking was reversed: the model with the lowest value received rank 1, and the one with the highest value received rank k.

The ranking values obtained for each model were then averaged using equal weighting, resulting in an average rank value for each model. This average rank represents the overall performance of the model, where a lower average indicates higher performance. Through this method, diverse performance indicators were consolidated under a single scale, enabling an objective ranking of model performance.

3. Results

Within the scope of the study, the descriptive statistical parameters of the dataset were first analyzed. For each variable, the number of observations, mean, minimum and maximum values, the 25th, 50th (median), and 75th percentiles, as well as the standard deviation, were calculated. The results are summarized in Table 4.

Upon examining Table 4, different distribution characteristics among the variables become evident. The pH values exhibit a narrow range (6.28–7.91) and are generally close to the neutral level. EC and TDS, on the other hand, display a wide range and high standard deviations, indicating variability in terms of conductivity and dissolved solids. While NO₂ remains at low levels, NO₃ shows a broader distribution. Turbidity and DO also demonstrate wide variability. Metal concentrations (e.g., Na, Ca, Pb, Ag) vary considerably, with Pb and Ag showing high mean values and large variances. These findings indicate heterogeneity, suggesting the presence of potential outliers in some cases.

To ensure data integrity and comparability before modeling, all physicochemical variables were subjected to a structured preprocessing and normalization procedure.

Boxplots and z-score analysis were employed to identify potential outliers; no extreme values requiring removal were found. Normality was evaluated using the Shapiro–Wilk test, and variables showing minor deviations from normality were standardized using the StandardScaler method (zero mean, unit variance).

These preprocessing steps enhanced numerical stability and ensured that all predictors contributed equally during model training, thereby improving the robustness and interpretability of subsequent analyses.

3.1. Water Quality Classification Results

The categories, counts, and percentages corresponding to the CCME-WQI-WHO, WA-WQI-WHO, and WA-WQI-FAO variables are presented in Table 5.

Table 5 shows that the distribution of water quality indices varies significantly depending on the standard applied. In the CCME-WQI-WHO classification, most observations fall into the Fair (47.62%) and Marginal (46.73%) categories, suggesting that water quality is generally at acceptable levels. In contrast, under the WA-WQI-WHO classification, 80.6% of the samples fall into the Good category, indicating that the same water sources are evaluated as having higher quality according to WHO criteria. On the other hand, in the WA-WQI-FAO classification, more than half of the samples fall into the Poor category (51.79%), with only 41.37% classified as Good. This finding suggests that the FAO criteria provide a stricter assessment of water quality. Overall, these results demonstrate that different classification systems yield varying quality levels for the same dataset, highlighting that the chosen standard plays a decisive role in water quality interpretation.

To statistically validate these differences among the three water quality classification schemes (CCME-WQI-WHO, WA-WQI-WHO, and WA-WQI-FAO), a non-parametric Kruskal–Wallis H test was conducted. The analysis revealed a statistically significant difference among the schemes (H = 261.24, df = 2, p < 0.001). The calculated effect size (ε² = 0.26) indicated a medium-to-strong magnitude of difference. Subsequently, a Bonferroni-adjusted Dunn post hoc test confirmed statistically significant pairwise differences across all comparisons (p_ad_j < 0.001). Examination of the mean rank values showed that the WA-WQI-WHO scheme yielded the highest mean rank, reflecting a more optimistic assessment of water quality, whereas the CCME-WQI-WHO scheme produced the lowest mean rank, suggesting a more conservative classification trend. The WA-WQI-FAO scheme fell between these two extremes, reflecting the influence of FAO’s stricter irrigation water quality thresholds. These findings indicate that, although all three indices rely on similar physicochemical parameters, variations in threshold values and weighting systems lead to statistically significant divergences in classification outcomes. Therefore, the selection of the reference standard (WHO or FAO) is crucial for the accurate interpretation of water quality indices and has significant implications for environmental management and policy decisions.

Since FAO standards define wider tolerance ranges compared to drinking water quality criteria, all observations were evaluated in the highest quality class (“Excellent”). This prevented the FAO-based CCME (CCME-WQI-FAO) results from capturing variation within the dataset; hence, the index was not suitable as a discriminating variable in the modeling phase.

3.2. Model Implementation and Optimal Model Selection

To predict water quality in the ponds, the designed models were tested using 6720 data records obtained from 19 parameters measured at 14 sampling points. Table 6, Table 7 and Table 8 present the results of Accuracy, Macro-F1, ROC-AUC, Balanced Accuracy, Kappa, MCC, ECE, and Brier metrics for the four models across three different water quality classification schemes, prior to hyperparameter tuning and cross-validation. Although no independent external dataset was available for validation, model generalization was ensured through a 5-fold cross-validation scheme and independent test set evaluation, supported by multiple performance metrics (Balanced Accuracy, Macro-F1, ROC-AUC, ECE, and Brier scores).

3.2.1. Comparative Overview of Model Performance Across WQI Schemes

Across the three water quality index (WQI) classification schemes—CCME-WQI-WHO, WA-WQI-WHO, and WA-WQI-FAO—all machine learning models demonstrated high predictive capacity, with test accuracies generally exceeding 90%. The comparative rankings revealed that XGBoost achieved the highest overall predictive performance, leading in two out of three schemes (WA-WQI-WHO and WA-WQI-FAO). Meanwhile, Random Forest (RF) provided the most reliable calibration results (lowest ECE and Brier scores) and was identified as the top-performing model in the CCME-WQI-WHO classification. Logistic Regression (LR) consistently exhibited stable generalization, reflected by the smallest gap between training and test metrics, while Decision Tree (DT) models tended to overfit, as indicated by their near-perfect training scores (≈1.00) and slightly lower test performance.

Although all models maintained high ROC-AUC values (≥0.93), minor differences were observed in class-level sensitivities. Balanced Accuracy and Macro-F1 scores indicated that tree-based models were more sensitive to majority classes, whereas LR preserved better performance across minority classes. This outcome highlights the influence of class imbalance in the dataset, particularly for the “Poor” and “Excellent” water quality categories. However, no oversampling or resampling techniques (e.g., SMOTE or under-sampling) were applied in this study to preserve the natural class distribution. Instead, the imbalance effect was acknowledged and discussed through complementary metrics such as Balanced Accuracy and Macro-F1, which provided a fair assessment of model robustness under imbalanced conditions.

The variance analysis across k-fold validation folds confirmed that the models achieved stable generalization performance, despite overfitting tendencies observed in DT and RF. Among the models, XGBoost and RF consistently ranked highest across all evaluation metrics, while LR provided the most interpretable and computationally efficient alternative.

3.2.2. Performance Results of Machine Learning Models Based on the CCME-WQI-WHO Classification

The predictive performance of the four machine learning models was evaluated using the CCME-WQI-WHO classification scheme. The combined results, both pre- and post-hyperparameter optimization, are presented in Table 6, which reports test performance metrics solely for clarity and conciseness.

Before optimization, all models achieved high predictive accuracy (≥0.95), with tree-based methods—particularly Random Forest (RF) and XGBoost—demonstrating strong discriminative ability (ROC-AUC ≥ 0.99). However, the near-perfect training scores (≈1.00) observed for DT, RF, and XGBoost indicated mild overfitting, whereas Logistic Regression (LR) provided more stable generalization, showing smaller gaps between training and test performance.

After hyperparameter optimization, all models maintained high predictive power. Ensemble-based approaches preserved their dominance: RF achieved the lowest calibration errors (ECE = 0.0095; Brier = 0.0139), and XGBoost retained the highest ROC-AUC (≈1.00). Despite these improvements, the Decision Tree (DT) still exhibited overfitting tendencies, while LR remained the most balanced and interpretable alternative across metrics.

The relative ranking analysis (Figure 2) confirmed RF as the best-performing model (average rank = 1.83), followed closely by XGBoost (1.97). Although both models performed comparably in accuracy and discrimination, RF’s superior calibration quality made it the preferred model under the CCME-WQI-WHO classification.

The Friedman test (Χ² = 24.019, p = 0.001 < 0.05) revealed significant differences among the models, but the subsequent Nemenyi post hoc test found no statistically significant pairwise differences (p ≥ 0.05). Consequently, the final model selection was guided by secondary criteria—calibration reliability, sensitivity to class imbalance, and interpretability—highlighting RF as the most robust model for this classification.

3.2.3. Performance Results of Machine Learning Models Based on the WA-WQI-WHO Classification

The predictive performance of the four machine learning models was evaluated under the WA-WQI-WHO classification scheme. The pre- and post-optimization results are summarized in Table 7, while the relative ranking of models is presented in Figure 3.

Before optimization, all models achieved satisfactory accuracy levels (≥0.80) with ensemble-based approaches—particularly Random Forest (RF) and XGBoost—outperforming the others across most evaluation metrics. RF and XGBoost demonstrated the highest ROC-AUC values (≈1.00), reflecting strong discriminative ability. However, the tree-based methods exhibited nearly perfect training scores, suggesting a degree of overfitting. Logistic Regression (LR) showed the smallest discrepancy between training and test performance, indicating better generalization stability.

After hyperparameter optimization, the performance of all models slightly improved or remained stable across accuracy and calibration metrics. Among them, RF and XGBoost retained their superiority: RF achieved the lowest calibration errors (ECE = 0.1952; Brier = 0.1128), while XGBoost attained the highest ROC-AUC (≈0.9999). DT continued to display mild overfitting, and LR remained the most balanced yet comparatively less accurate model.

The comparative results highlight that although tree-based models achieved higher discriminative scores, they were more prone to overfitting than LR. This trade-off between interpretability and predictive strength underscores the importance of both calibration and class-level performance metrics when evaluating water-quality prediction models.

As shown in Figure 3, the XGBoost model ranked first (average rank = 1.49), followed by RF (2.24), DT (2.61), and LR (3.44). Although RF showed slightly better calibration, XGBoost outperformed others in overall predictive ranking due to its superior ROC-AUC and Macro-F1 values. Hence, XGBoost was identified as the best model under the WA-WQI-WHO classification, offering both high accuracy and robust class discrimination.

3.2.4. Performance Results of Machine Learning Models Based on the WA-WQI-FAO Classification

The predictive performance of the four machine learning models was assessed under the WA-WQI-FAO classification scheme. The consolidated results before (Pre) and after (Post) hyperparameter optimization, along with the test metrics, are summarized in Table 8. The relative ranking of the models is presented in Figure 4.

Before optimization, all models achieved high predictive accuracy (≥0.88), with the ensemble-based models—particularly Random Forest (RF) and XGBoost—demonstrating strong discriminative ability (ROC-AUC ≈ 0.93−0.95). However, the nearly perfect training metrics (≈1.00) observed for the tree-based models indicated mild overfitting. Logistic Regression (LR) achieved lower overall accuracy but exhibited more stable generalization across folds.

The Macro-F1 results revealed notable class-imbalance effects: RF and XGBoost produced lower Macro-F1 scores (≈0.47−0.48), whereas LR achieved a more balanced performance (0.74). Balanced Accuracy followed a similar trend, showing that LR and XGBoost handled minority classes more consistently than RF or DT. All models maintained high ROC-AUC scores, confirming reliable discrimination between water-quality categories. In calibration metrics, XGBoost achieved the lowest Brier Score (0.068), while DT displayed the lowest ECE, though this value may reflect artificial improvement due to its discrete probability outputs.

After hyperparameter optimization, the general performance of all models improved or remained stable (Table 8). RF achieved the highest test accuracy (0.94) and maintained strong discriminative power (ROC-AUC ≈ 0.97). Despite its slightly lower accuracy, LR remained the most balanced model in terms of class sensitivity (Balanced Accuracy ≈ 0.83) and calibration quality (Brier ≈ 0.076). DT continued to exhibit minor overfitting, while XGBoost demonstrated robust classification capability but showed a weaker balance across classes.

Overall, RF stood out for its combination of accuracy, discrimination, and calibration, whereas LR offered a more consistent and interpretable alternative. XGBoost achieved comparably high predictive strength but was more affected by class imbalance.

As shown in Figure 4, the XGBoost model ranked first (average rank = 2.17), followed by Logistic Regression (2.53) and Random Forest (2.53), which performed at similar levels. Decision Tree (DT) ranked last (3.25). XGBoost excelled in key classification metrics such as Macro-F1, Kappa, and MCC, while RF delivered stronger calibration performance but fell slightly behind overall. Although LR provided the most balanced results, its lower accuracy placed it below the ensemble models.

According to the Friedman test (p = 0.266 ≥ 0.05), no statistically significant difference was found among the median model ranks. Consequently, final model selection relied on secondary criteria—calibration quality (ECE/Brier), ROC-AUC, Balanced Accuracy, MCC/Kappa, and interpretability—leading to the identification of RF as the most reliable model for practical deployment, while LR was favored in cases emphasizing class-balance sensitivity.

3.3. Feature Importance and Model Explainability

To enhance the interpretability of ensemble-based models and identify the contribution of individual variables to the classification process, a parameter importance analysis was conducted using the SHAP (Shapley Additive exPlanations) method. SHAP values quantify the influence of each physicochemical variable on the model’s prediction for a specific water quality class, providing a transparent understanding of model behavior.

Under the CCME-WQI-WHO classification, dissolved oxygen (DO) was found to be the most influential variable (Figure 5), followed by pH and calcium (Ca). These parameters indicate that oxygen concentration, the balance of acidity and alkalinity, and hardness are key determinants of water quality. Ionic and metal parameters such as Ag, P, Ni, Na, and Mg contributed to a lesser extent, reflecting secondary hydrochemical variations among classes.

For the Weighted-WQI-WHO classification (Figure 6), electrical conductivity (EC), and total dissolved solids (TDS) were identified as the most impactful predictors. This highlights the central role of ionic strength and dissolved matter in differentiating water quality classes, while DO, salinity, and NO₃⁻ exhibited moderate importance consistent with their known effects on aquatic chemistry.

Under the Weighted-WQI-FAO scheme (Figure 7), sodium (Na) emerged as the dominant predictor, followed by EC and TDS, confirming the strong link between sodium ion concentration, salinity, and water quality categorization. Parameters such as NO₂, Pb, and salinity exhibited intermediate effects, indicating supportive yet less dominant roles compared to the major ionic constituents. Other variables, including turbidity, temperature, pH, and trace metals, had lower SHAP values but still contributed to refining the classification boundaries.

Overall, the SHAP-based interpretability analysis revealed that the models successfully captured the dominant hydrochemical processes controlling water quality in the studied reservoirs. These results enhance the transparency and explainability of machine learning models, aligning them with best practices in environmental AI research.

4. Discussion

The findings of this study provide important insights into both the sensitivity of water quality indices (WQI) and the capacity of machine learning (ML) models to deliver reliable predictions. Notably, when CCME-WQI was applied using FAO threshold values, all samples were classified within a single Excellent category, indicating that the approach lost its discriminative power. By contrast, under the CCME-WQI-WHO classification, 64.3% of the samples were categorized as Fair or Marginal, while the WA-WQI-WHO classification placed 72.1% of the samples in the Good category. Meanwhile, the WA-WQI-FAO classification assigned 51.7% of the samples to the Poor category. These discrepancies clearly demonstrate the sensitivity of water quality interpretations to the choice of standards, particularly when considering irrigation and drinking water purposes. Previous studies similarly emphasize that WQI results vary depending on the threshold values used for water quality parameters [46,47].

The machine learning analyses further highlighted these differences. Prior to hyperparameter optimization, ensemble-based models such as Random Forest (RF) and Extreme Gradient Boosting (XGBoost) achieved accuracies above 95% and ROC-AUC scores in the range of 0.96–0.97. However, their Macro-F1 scores remained around 0.72, pointing to persistent class imbalance issues. Logistic Regression (LR), while yielding a lower accuracy (0.88), presented a more balanced error distribution across classes. After optimization, the Macro-F1 score of RF improved to 0.83, while XGBoost reached a ROC-AUC of approximately 0.99, indicating a marked improvement in the generalization capacity of the models. Although no resampling techniques, such as SMOTE or under-sampling, were applied, the evaluation using Balanced Accuracy and Macro-F1 metrics revealed that the models provided fair performance under imbalanced class distributions. This approach ensured the preservation of the natural class structure of the dataset while still maintaining a reliable assessment of model robustness.

These results underscore the importance of considering not only overall accuracy but also secondary metrics, such as calibration measures (e.g., Brier score) and performance on minority classes, in model selection. This aligns with prior literature, which highlights the significance of calibration and balanced classification performance for robust ML applications in environmental contexts [53,60,65]. Moreover, the variance analysis across k-fold validation folds confirmed that the models exhibited stable generalization performance, despite minor overfitting tendencies observed in tree-based algorithms, which supports the reliability of their predictions.

Tree-based ensemble models maintained their superiority across different index classifications. For instance, under the CCME-WQI-WHO classification, RF achieved the highest performance. In contrast, under the WA-WQI-WHO and WA-WQI-FAO classifications, XGBoost outperformed other models, achieving an accuracy of 95% and ROC-AUC values of approximately 0.98–0.99. By contrast, Logistic Regression consistently showed the lowest performance, particularly under WHO-based classifications. This result indicates that linear models struggle to capture complex environmental relationships, whereas ensemble methods provide more generalizable and stable outcomes. Similar findings have been reported by Patel et al. [50] in India and Hidayat et al. [59] in Indonesia, where RF and XGBoost achieved ROC-AUC values ranging from 0.95 to 0.97. Likewise, Zhi et al. [20] and Sajib et al. [21] demonstrated that XGBoost outperforms linear and conventional algorithms in capturing nonlinear hydro-chemical interactions. The high ROC-AUC values obtained in this study (≈0.99) confirm that the models can capture the complex physicochemical dynamics of ponds in Kastamonu with remarkable accuracy.

Comparable ML-based WQI prediction studies from South Asia, West Asia, and the Mediterranean further validate these results. Nishat et al. [71] analyzed fourteen ML algorithms for WQI prediction in the polluted rivers of Dhaka (Bangladesh) and reported that ensemble-based models such as XGBoost and Random Forest achieved R² ≈ 0.97, emphasizing the failure of linear models under tropical variability. Similarly, Shaheed et al. [72] applied XGBoost, SVM, and Naïve Bayes to classify river water quality across Southeast, South, and West Asia and found XGBoost to be the most robust under diverse hydro-climatic pressures. Uddin et al. [2] successfully predicted coastal WQI in the Bay of Bengal using ensemble learning (RF, XGBoost) with R² > 0.95, while Uddin et al. [73] confirmed their superior interpretability in estuarine systems. In the Middle Eastern context, Mohammadpour et al. [74] integrated ML-based WQI prediction with probabilistic health-risk analysis in southern Iran, demonstrating the method’s ability to identify contamination sources such as Pb, Cd, and Ni with high precision. In Egypt, Elshaarawy and Eltarabily [75] optimized ML algorithms for groundwater quality prediction in El Moghra and found that Random Forest and Gradient Boosting achieved the highest performance. Collectively, these studies reveal that ensemble-based approaches consistently outperform linear and shallow models across diverse regions and environmental conditions, reinforcing the generalizability of the current study’s findings.

From a statistical perspective, the Friedman and Nemenyi tests confirmed that RF and XGBoost ranked highest overall. However, pairwise comparisons in the Nemenyi test did not yield statistically significant differences (p ≥ 0.05). This suggests that model superiority may vary depending on the weighting of performance metrics. Muhammad et al. [76] also emphasized that relying on a single performance indicator can be misleading, and that a multi-metric approach provides a more holistic assessment of model performance.

Although the physicochemical data used in this study were obtained through standardized field and laboratory protocols, minor uncertainties may still arise from temporal and environmental fluctuations inherent to hydrological systems. Throughout the irrigation year, variations in temperature, precipitation, and evaporation—particularly during transitional periods such as late summer and early spring—can influence the ionic composition and dissolved oxygen dynamics of small reservoirs. These seasonal differences may cause short-term fluctuations in parameters such as EC, NO₃⁻, and TDS, reflecting the natural hydrochemical response to climatic variability. Nevertheless, the consistent sampling design, standardized measurement procedures, and full-year measurement coverage ensured that these temporal variations were clearly represented in the dataset, thereby strengthening its hydrological validity. Consequently, the model predictions reflect not only the physicochemical characteristics of the reservoirs but also their seasonal hydrodynamic behavior, providing a realistic representation of water quality dynamics under varying climatic conditions.

From a practical perspective, the inconsistencies between WQI classifications and ML predictions present both opportunities and risks for decision-makers. For example, while 72.1% of the water was classified as Good under the WA-WQI-WHO standard, nearly half of the same samples fell into the Poor category according to WA-WQI-FAO, highlighting the critical importance of standard selection in irrigation management. On the other hand, advanced models such as RF and XGBoost can account for such variability, offering more balanced and reliable predictions. This dual approach (index + ML) provides a more coherent decision-support framework for managing multipurpose ponds, bridging international standards with local conditions [20,47,59]. In this context, the high concentrations of Pb (lead) and Ag (silver) observed in several samples (Table 4) are of environmental concern. Pb, a persistent toxic metal, can accumulate in sediments and biota, impairing aquatic biodiversity and posing chronic health risks through bioaccumulation [74]. Elevated Ag levels, although less common, can disrupt microbial activity and photosynthetic processes in aquatic systems, reducing self-purification capacity and ecosystem stability. These findings underscore the need for continuous monitoring and stricter control of trace metal inputs, particularly in small lentic systems used for irrigation and recreation.

From an operational standpoint, the combined WQI–ML workflow can be directly embedded into water-quality planning for small reservoirs. Routine, low-cost WQI screening can maintain continuity, while weekly or bi-weekly ML inference can flag degradation risk and trigger targeted sampling at specific sites or seasons (adaptive sampling). Threshold-aware reporting enables predictions to be mapped to FAO/WHO standards, facilitating actionable decisions in irrigation and drinking water use. Confusion matrices and class-specific metrics enable risk-tolerant operating points (e.g., prioritizing high recall for the “Poor” class to minimize false-safe decisions). Feature-importance/SHAP analyses identify controllable drivers (e.g., EC/TDS, NO₃⁻), informing interventions such as aeration, optimized fertilizer timing, or blending strategies. The workflow is lightweight to maintain: models can be retrained quarterly as new data arrive, and calibration indicators (ECE/Brier) can be monitored as drift alarms to schedule re-training. In practice, this supports early warning for exceedances, reduces blanket sampling effort in favor of event-based campaigns, and prioritizes remedial actions across ponds with the highest predicted risk.

Conceptually, the hybrid framework proposed in this study also aligns with the hydrodynamic perspective presented by Piazza et al. [77] which emphasizes that diffusion and dispersion processes play a crucial role in interpreting spatial and temporal variability in water quality data. Integrating these physical mechanisms with data-driven models, such as WQI and ML, can enhance both interpretability and predictive robustness, thereby bridging the gap between physical understanding and computational modeling in aquatic systems.

The SHAP-based feature importance analysis further confirmed that key physicochemical drivers such as dissolved oxygen, electrical conductivity, and sodium concentration play a decisive role in the classification process, aligning model predictions with established hydrochemical principles.

Overall, this study contributes to the growing body of literature advocating for the combined use of classical WQI-based approaches and advanced ML algorithms in water quality management. The findings demonstrate that WQI-based classifications are highly sensitive to threshold values, whereas models such as RF and XGBoost, with discriminative power reaching 0.99, provide more stable and generalizable predictions. When interpreted in conjunction with the referenced regional case studies, the present results confirm that ensemble-based ML models have high transferability and can support sustainable water governance in Mediterranean-type and semi-arid environments. Future research should expand datasets to capture seasonal and inter-annual variability, integrate satellite-derived indicators, and apply explainable AI (XAI) methods (e.g., SHAP, LIME) to enhance model interpretability and reliability for decision-makers [21].

These findings align directly with the United Nations Sustainable Development Goals (SDGs). By enabling reliable monitoring of water resources (SDG 6: Clean Water and Sanitation), supporting climate-resilient water management systems (SDG 13: Climate Action), and protecting freshwater-dependent ecosystems (SDG 15: Life on Land), the combined use of index-based and machine-learning approaches provides applicable, reliable, and scalable tools for sustainable pond and reservoir management. Moreover, the quantitative outputs of ML models can be linked to global sustainability indicators, such as SDG 6.4.2 (Level of Water Stress) and SDG 6.5.1 (Implementation of Integrated Water Resources Management), facilitating international comparability of local water management practices. In this context, previous studies have demonstrated how ML-based modeling can support the monitoring of these indicators by improving freshwater withdrawal ratios, enhancing resilience to climate variability, and strengthening integrated management capacity [69,78,79]. Consequently, the present study contributes not only methodologically but also strategically to aligning local-scale water governance in Türkiye with the global SDG framework.

5. Conclusions

This study integrated widely used water quality indices (CCME-WQI and WA-WQI) with supervised machine learning algorithms to comprehensively assess the water quality of two irrigation reservoirs (Taşçılar and Yumurtacılar) in Kastamonu. It represents one of the first comparative applications of WQI–ML integration for small inland reservoirs in Türkiye, bridging classical water assessment indices with data-driven modeling.

The findings revealed that the selection of the classification standard (WHO vs. FAO) strongly influences water quality interpretation, underscoring the need for context-specific index calibration in local management practices. Ensemble-based models (RF and XGBoost) consistently provided the highest predictive performance (ROC-AUC ≈ 0.99), while Logistic Regression demonstrated greater stability and interpretability under class imbalance conditions.

These results collectively suggest that combining WQI-based indices with machine learning improves the reliability and interpretability of water quality assessments, offering practical value for irrigation and ecosystem management.

However, this study has several limitations. The analysis was conducted using data from a single hydrological period and limited sampling points, which may constrain the temporal generalization of the models. Additionally, no external validation was performed across different basins, and the dataset did not include certain biological and heavy metal parameters that could influence long-term water quality trends.

Future research should expand its temporal coverage to include multi-seasonal data, incorporate remote sensing indicators (e.g., NDVI, LST) for dynamic monitoring, and employ explainable AI (XAI) frameworks, such as SHAP and LIME, to enhance interpretability. Cross-basin validation and integration with real-time monitoring systems will further increase the operational applicability of the proposed approach in sustainable water resource management.

In conclusion, this study highlights the importance of integrating classical WQI methods with modern ML tools to achieve transparent, scalable, and data-driven water management strategies under changing climatic and anthropogenic pressures.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/w17213050/s1, Equation (S1): Logistic Regression (LR) model; Equation (S2) and (S3): XGBoost objective function; Equation (S4): Standardization transformation; Equation (S5): Accuracy; Equation (S6): F1 macro; Equation (S7): ROC–AUC (Receiver Operating Characteristic–Area Under Curve; Equation (S8): Balanced Accuracy; Equation (S9): Cohen’s Kappa (κ); Equation (S10): Matthews Correlation Coefficient (MCC); Equation (S11): Expected Calibration Error (ECE); Equation (S12): Brier score.

Funding

This research received no external funding.

Data Availability Statement

The data presented in this study are available on request from the corresponding author.

Conflicts of Interest

The author declares no conflicts of interest.

References

Ahmed, U.; Mumtaz, R.; Anwar, H.; Shah, A.A.; Irfan, R.; García-Nieto, J. Efficient water quality prediction using supervised machine learning. Water 2019, 11, 2210. [Google Scholar] [CrossRef]
Uddin, M.G.; Nash, S.; Diganta, M.T.M.; Rahman, A.; Olbert, A.I. Robust machine learning algorithms for predicting coastal water quality index. J. Environ. Manag. 2022, 321, 115923. [Google Scholar] [CrossRef]
Gidey, A. Geospatial distribution modeling and determining suitability of groundwater quality for irrigation purpose using geospatial methods and water quality index (WQI) in Northern Ethiopia. Appl. Water Sci. 2018, 8, 82. [Google Scholar] [CrossRef]
Abdessamed, D.; Jodar-Abellan, A.; Ghoneim, S.S.; Almaliki, A.; Hussein, E.E.; Pardo, M.Á. Groundwater quality assessment for sustainable human consumption in arid areas based on GIS and water quality index in the watershed of Ain Sefra (SW of Algeria). Environ. Earth Sci. 2023, 82, 510. [Google Scholar] [CrossRef]
Uddin, M.G.; Nash, S.; Rahman, A.; Olbert, A.I. A novel approach for estimating and predicting uncertainty in water quality index model using machine learning approaches. Water Res. 2023, 229, 119422. [Google Scholar] [CrossRef] [PubMed]
Kouadri, S.; Elbeltagi, A.; Islam, A.R.M.T.; Kateb, S. Performance of machine learning methods in predicting water quality index based on irregular data set: Application on Illizi region (Algerian southeast). Appl. Water Sci. 2021, 11, 190. [Google Scholar] [CrossRef]
Seifi, A.; Dehghani, M.; Singh, V.P. Uncertainty analysis of water quality index (WQI) for groundwater quality evaluation: Application of Monte-Carlo method for weight allocation. Ecol. Indic. 2020, 117, 106653. [Google Scholar] [CrossRef]
Yuan, P.; Li, H.; Yi, X.; Wang, J.; Ning, C.; Xu, X.; Nong, X. Optimizing water quality index using machine learning: A six-year comparative study in riverine and reservoir systems. Sci. Rep. 2025, 15, 33919. [Google Scholar] [CrossRef] [PubMed]
Ahmed, A.N.; Othman, F.B.; Afan, H.A.; Ibrahim, R.K.; Fai, C.M.; Hossain, M.S.; Mohammad, E.; Elshafie, A. Machine learning methods for better water quality prediction. J. Hydrol. 2019, 578, 124084. [Google Scholar] [CrossRef]
Zhu, M.; Wang, J.; Yang, X.; Zhang, Y.; Zhang, L.; Ren, H.; Wu, B.; Ye, L. A review of the application of machine learning in water quality evaluation. Eco-Environ. Health 2022, 1, 107–116. [Google Scholar] [CrossRef]
Nasir, N.; Kansal, A.; Alshaltone, O.; Barneih, F.; Sameer, M.; Shanableh, A.; Al-Shamma’a, A. Water quality classification using machine learning algorithms. J. Water Process Eng. 2022, 48, 102920. [Google Scholar] [CrossRef]
Hussein, E.E.; Jat Baloch, M.Y.; Nigar, A.; Abualkhair, H.F.; Aldawood, F.K.; Tageldin, E. Machine learning algorithms for predicting the water quality index. Water 2023, 15, 3540. [Google Scholar] [CrossRef]
García-Feal, O.; González-Cao, J.; Fernández-Nóvoa, D.; Astray Dopazo, G.; Gómez-Gesteira, M. Comparison of machine learning techniques for reservoir outflow forecasting. Nat. Hazards Earth Syst. Sci. Discuss. 2022, 22, 3859–3874. [Google Scholar] [CrossRef]
Solomatine, D.P.; Ostfeld, A. Data-driven modelling: Some past experiences and new approaches. J. Hydroinformatics 2008, 10, 3–22. [Google Scholar] [CrossRef]
Şen, G.; Ergün Karak, B. Applications of Artificial Intelligence in Predicting Timber Prices. In Turkish Forestry Researches in the Perspective of Climate Change, 1st ed.; Duvar Publishing: Izmir, Turkey, 2024; Chapter 4; pp. 69–93. ISBN 978-625-5530-40-0. [Google Scholar]
Demirkıran Ada, E. Artificial intelligence applications in taxation processes: The digital transformation of the revenue administration. Int. J. Adv. Nat. Sci. Eng. Res. 2025, 9, 212–217. Available online: https://as-proceeding.com/index.php/ijanser/article/view/2766 (accessed on 15 May 2025).
Rashidi, H.H.; Tran, N.K.; Betts, E.V.; Howell, L.P.; Green, R. Artificial intelligence and machine learning in pathology: The present landscape of supervised methods. Acad. Pathol. 2019, 6, 2374289519873088. [Google Scholar] [CrossRef]
Almubaidin, M.A.A.; Ahmed, A.N.; Sidek, L.B.M.; Elshafie, A. Using metaheuristics algorithms (MHAs) to optimize water supply operation in reservoirs: A review. Arch. Comput. Methods Eng. 2022, 29, 3677–3711. [Google Scholar] [CrossRef]
Derdour, A.; Jodar-Abellan, A.; Pardo, M.Á.; Ghoneim, S.S.M.; Hussein, E.E. Designing efficient and sustainable predictions of water quality indexes at the regional scale using machine learning algorithms. Water 2022, 14, 2801. [Google Scholar] [CrossRef]
Zhi, W.; Appling, A.P.; Golden, H.E.; Podgorski, J.; Li, L. Deep learning for water quality. Nat. Water 2024, 2, 228–241. [Google Scholar] [CrossRef] [PubMed]
Sajib, A.M.; Diganta, M.T.M.; Moniruzzaman, M.; Rahman, A.; Dabrowski, T.; Uddin, M.G.; Olbert, A.I. Assessing water quality of an ecologically critical urban canal incorporating machine learning approaches. Ecol. Inform. 2024, 80, 102514. [Google Scholar] [CrossRef]
Aközlü, A.; Şen, G. Perceptions of log market actors on revisions to the regulations of the sale of standing tree. Turk. J. For. 2023, 24, 378389. [Google Scholar] [CrossRef]
MGM; Çevre, T.C. Şehircilik ve İklim Değişikliği Bakanlığı, Meteoroloji Genel Müdürlüğü: Kastamonu ili Uzun Yıllar Iklim Verileri [Long-Term Climate Data of Kastamonu Province]. 2025. Available online: https://www.mgm.gov.tr/Veridegerlendirme/il-ve-ilceler-istatistik.aspx?k=&m=KASTAMONU (accessed on 6 August 2025).
MTA, T.C. Enerji ve Tabii Kaynaklar Bakanlığı, Maden Tetkik ve Arama Genel Müdürlüğü [General Directorate of Mineral Research and Exploration]. 2025. Available online: http://yerbilimleri.mta.gov.tr/anasayfa.aspx (accessed on 15 May 2025).
Anonymous. Kastamonu İli Arazi Varlığı Haritası [Soil Resources Map of Kastamonu Province]. Baskı İşleri Şube Müdürlüğü, Baskı No:189, Ankara. 1993. Available online: https://kutuphane.tarimorman.gov.tr/vufind/Record/10028 (accessed on 20 August 2025).
Anonymous. Kastamonu İli 2014 Yılı Çevre Durum Raporu [2014 Environmental Status Report of Kastamonu Province]. Kastamonu Çevre ve Şehircilik İl Müdürlüğü, Kastamonu. 2014. Available online: https://webdosya.csb.gov.tr/db/ced/editordosya/Kastamonu%202014.pdf (accessed on 15 May 2025).
Gökkaya, Z.; Pulatsu, S. Seasonal Changes of Some Water Quality Parameters of Sakaryabaşı East Pond. J. Agric. Sci. 2001, 7, 20–26. [Google Scholar] [CrossRef]
Uysal, E. Determination of Water Quality Indices of Irrigation Ponds in Eskişehir and Their Ecological Evaluation. Master’s Thesis, Department of Environmental Engineering, Graduate School of Natural and Applied Sciences, Anadolu University, Eskişehir, Turkey, 2015. [Google Scholar]
Çiçek, A.; Uysal, E.; Köse, E.; Tokatlı, C. Eskişehir’de yer alan bazı sulama göletlerinin su kalitesinin değerlendirilmesi. Nevşehir Bilim Teknol. Derg. 2017, 6, 440–446. [Google Scholar] [CrossRef]
Selvi, K.; Tepeli, S.Ö.; Kaya, B. Evaluation of seasonal changes in terms of irrigation water quality of Terzialan Pond (Çan, Çanakkale). Ege J. Fish. Aquat. Sci. 2021, 38, 317–328. [Google Scholar] [CrossRef]
Kaya, N.; Şen, F. Kabaklı Göleti (Diyarbakır) Suyunun Su Kalitesi Özellikleri. J. Kırşehir Ahi Evran Univ. Fac. Agric. 2022, 2, 174–184. [Google Scholar]
Mutlu, E.; Tepe, A.Y. Evaluation of Some of Physical and Chemical Characteristics of Yayladağı Irrigation Pond (Hatay). Alınteri Zirai Bilim. Derg. 2014, 27, 18–23. [Google Scholar]
Mutlu, E.; Paruğ, Ş.Ş. Investigation of Some Water Quality Parameters of Dereköy Pond (Kilimli-Zonguldak). J. Kastamonu Univ. Fac. Fish. 2018, 4, 20–28. [Google Scholar]
Kahriman, A. Evaluation of the Water Quality of Bezirgan Hazım Kılıç Pond (Daday–Kastamonu). Master’s Thesis, Graduate School of Natural and Applied Sciences, Kastamonu University, Kastamonu, Turkey, 2019. [Google Scholar]
Özay, G. Investigation of the Effects of Multiple Run-of-River Type Hydroelectric Power Plants (HPPs) on the Water Quantity, Water Quality, and Suspended Sediment Values of Kabaca Stream. Master’s Thesis, Department of Forest Engineering, Graduate School of Natural and Applied Sciences, Artvin Çoruh University, Artvin, Turkey, 2019. [Google Scholar]
Yang, S.; Liang, M.; Qin, Z.; Qian, Y.; Li, M.; Cao, Y. A novel assessment considering spatial and temporal variations of water quality to identify pollution sources in urban rivers. Sci. Rep. 2021, 11, 8714. [Google Scholar] [CrossRef]
Moon, Y.E.; Kim, H.S. Inter-Annual and seasonal variations of water quality and trophic status of a reservoir with fluctuating monsoon precipitation. Int. J. Environ. Res. Public Health 2021, 18, 8499. [Google Scholar] [CrossRef]
Son, J.Y.; Han, H.J.; Cho, Y.C.; Kang, T.; Im, J.K. Seasonal Variations in the Thermal Stratification Responses and Water Quality of the Paldang Lake. Water 2024, 16, 3057. [Google Scholar] [CrossRef]
American Public Health Association (APHA); American Water Works Association (AWWA); Water Environment Federation (WEF). Method 2540 D—Total Suspended Solids Dried at 103–105 °C. In Standard Methods for the Examination of Water and Wastewater, 23rd ed.; American Public Health Association: Washington, DC, USA, 2017. [Google Scholar]
Güneş Şen, S. The effects of Forestry Practices on Water Quality in the Kastamonu Region. Doctoral Dissertation, Kastamonu University, Kastamonu, Turkey, 2021. [Google Scholar]
SKKY. Regulation on Water Pollution Control (Su Kirliliği Kontrolü Yönetmeliği). 2008. Available online: https://www.resmigazete.gov.tr/eskiler/2008/02/20080213-13.htm (accessed on 14 November 2024).
World Health Organization (WHO). Guidelines for Drinking-Water Quality, 4th ed.; World Health Organization: Geneva, Switzerland, 2011. [Google Scholar]
Ayers, R.S.; Westcot, D.W. Water Quality for Agriculture (FAO Irrigation and Drainage Paper No. 29, Rev. 1). Food and Agriculture Organization of the United Nations (FAO). 1985. Available online: https://www.fao.org/4/t0234e/T0234E01.htm (accessed on 6 August 2025).
Brown, R.M.; McClelland, N.I.; Deininger, R.A.; Tozer, R.G. A water quality index—Do we dare? Water Sew. Work. 1970, 117, 339–343. [Google Scholar]
Canadian Council of Ministers of the Environment (CCME). Canadian Water Quality Guidelines for the Protection of Aquatic Life: User’s Manual; Canadian Council of Ministers of the Environment: Winnipeg, MB, Canada, 2001. [Google Scholar]
Lumb, A.; Sharma, T.C.; Bibeault, J. A review of genesis and evolution of water quality index (WQI) and some future directions. Water Qual. Expo. Health 2006, 3, 11–24. [Google Scholar] [CrossRef]
Chidiac, S.; El Najjar, P.; Ouaini, N.; El Rayess, Y.; El Azzi, D. A comprehensive review of water quality indices (WQIs): History, models, attempts and perspectives. Rev. Environ. Sci. Bio/Technol. 2023, 22, 349–395. [Google Scholar] [CrossRef] [PubMed]
Khan, H.; Khan, A.A.; Hall, S. The Canadian water quality index: A tool for water resources management. In Proceedings of the MTERM International Conference, Pathum Thani, Thailand, 6–10 June 2005; Volume 8. Available online: https://www.gov.nl.ca/ecc/files/waterres-quality-background-khan-2005-ait.pdf?utm_source=chatgpt.com (accessed on 7 October 2025).
World Health Organization (WHO). Guidelines for Drinking-Water Quality: Incorporating the First and Second Addenda. 2022. Available online: https://www.who.int/publications/i/item/9789240045064 (accessed on 10 September 2025).
Patel, D.D.; Mehta, D.J.; Azamathulla, H.M.; Shaikh, M.M.; Jha, S.; Rathnayake, U. Application of the weighted arithmetic water quality index in assessing groundwater quality: A case study of the South Gujarat region. Water 2023, 15, 3512. [Google Scholar] [CrossRef]
Demšar, J. Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res. 2006, 7, 1–30. [Google Scholar]
Sperandei, S. Understanding logistic regression analysis. Biochem. Medica 2014, 24, 12–18. [Google Scholar] [CrossRef]
Suthaharan, S. Decision tree learning. In Machine Learning Models and Algorithms for Big Data Classification: Thinking with Examples for Effective Learning; Springer: Boston, MA, USA, 2016; pp. 237–269. [Google Scholar] [CrossRef]
Gültepe, Y. Makine öğrenmesi algoritmaları ile hava kirliliği tahmini üzerine karşılaştırmalı bir değerlendirme. Eur. J. Sci. Technol. 2019, 16, 8–15. [Google Scholar] [CrossRef]
Breiman, L. Random forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
Soria-Lopez, A.; Sobrido-Pouso, C.; Mejuto, J.C.; Astray, G. Assessment of different machine learning methods for reservoir outflow forecasting. Water 2023, 15, 3380. [Google Scholar] [CrossRef]
Bentéjac, C.; Csörgo, A.; Martínez-Muñoz, G. A comparative analysis of xgboost. arXiv 2009. [Google Scholar] [CrossRef]
Thara, D.K.; PremaSudha, B.G. Auto-detection of epileptic seizure events using deep neural network with different feature scaling techniques. Pattern Recognit. Lett. 2019, 128, 544–550. [Google Scholar] [CrossRef]
Hidayat, T.; Manongga, D.; Nataliani, Y.; Wijono, S.; Prasetyo, S.Y.; Maria, E.; Raharja, U.; Sembiring, I. Performance prediction using cross validation (gridsearchcv) for stunting prevalence. In Proceedings of the 2024 IEEE International Conference on Artificial Intelligence and Mechatronics Systems (AIMS), Boston, MA, USA, 21–23 February 2024; pp. 1–6. [Google Scholar] [CrossRef]
Chicco, D.; Jurman, G. The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation. BMC Genom. 2020, 21, 6. [Google Scholar] [CrossRef]
Powers, D.M.W. Evaluation: From precision, recall and F-measure to ROC, informedness, markedness and correlation. J. Mach. Learn. Technol. 2020, 2, 37–63. [Google Scholar] [CrossRef]
Bradley, A.P. The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recognit. 1997, 30, 1145–1159. [Google Scholar] [CrossRef]
Brodersen, K.H.; Ong, C.S.; Stephan, K.E.; Buhmann, J.M. The balanced accuracy and its posterior distribution. Pattern Recognit. Lett. 2010, 31, 2207–2212. [Google Scholar] [CrossRef]
Cohen, J. A coefficient of agreement for nominal scales. Educ. Psychol. Meas. 1960, 20, 37–46. [Google Scholar] [CrossRef]
Naeini, M.P.; Cooper, G.F.; Hauskrecht, M. Obtaining well calibrated probabilities using Bayesian binning. Proc. AAAI Conf. Artif. Intell. 2015, 29, 2901–2907. [Google Scholar] [CrossRef]
Brier, G.W. Verification of forecasts expressed in terms of probability. Mon. Weather. Rev. 1950, 78, 1–3. [Google Scholar] [CrossRef]
Poudel, K.P.; Cao, Q.V. Evaluation of methods to predict Weibull parameters for characterizing diameter distributions. For. Sci. 2013, 59, 243–252. [Google Scholar] [CrossRef]
Abwage, W.D.; Oluwajuwon, T.V.; Adedapo, S.M.; Adamu, A.; Green, P.C.; Ogana, F.N. A stem taper equation for Eucalyptus camaldulensis in northeast Nigeria. J. For. Res. 2025, 36, 1–14. [Google Scholar] [CrossRef]
Günes ¸Sen, S. Machine Learning-Based Water Level Forecast in a Dam Reservoir: A Case Study of Karaçomak Dam in the Kızılırmak Basin, Türkiye. Sustainability 2025, 17, 8378. [Google Scholar] [CrossRef]
Santiago-García, W.; Ramírez-Arce, J.; Ramírez-Martínez, A.; Nava-Nava, A.; Guzmán-Santiago, J.C.; Santiago-García, E. Additive volume-equation systems for Pinus ayacahuite and Pinus douglasiana in temperate forests of the Sierra Norte, Oaxaca, Mexico. J. For. Sci. 2025, 71, 441–455. [Google Scholar] [CrossRef]
Nishat, M.H.; Khan, M.H.R.B.; Ahmed, T.; Hossain, S.N.; Ahsan, A.; El-Sergany, M.M.; Shafiquzzaman, M.; Imteaz, M.A.; Alresheedi, M.T. Comparative analysis of machine learning models for predicting water quality index in Dhaka’s rivers of Bangladesh. Environ. Sci. Eur. 2025, 37, 31. [Google Scholar] [CrossRef]
Shaheed, H.; Zawawi, M.H.; Hayder, G. Water quality index classification of southeast, south and west asia rivers using machine learning algorithms. J. Ecohumanism 2024, 3, 2752–6801. [Google Scholar] [CrossRef]
Uddin, M.G.; Nash, S.; Rahman, A.; Olbert, A.I. Performance analysis of the water quality index model for predicting water state using machine learning techniques. Process Saf. Environ. Prot. 2023, 169, 808–828. [Google Scholar] [CrossRef]
Mohammadpour, A.; Gharehchahi, E.; Golaki, M.; Gharaghani, M.A.; Ahmadian, F.; Abolfathi, S.; Samaei, M.R.; Uddin, M.G.; Olbert, A.I.; Khaneghah, A.M. Advanced Water Quality Assessment Using Machine Learning: Source Identification and Probabilistic Health Risk Analysis. Results Eng. 2025, 27, 105421. [Google Scholar] [CrossRef]
Elshaarawy, M.K.; Eltarabily, M.G. Machine learning models for predicting water quality index: Optimization and performance analysis for El Moghra, Egypt. Water Supply 2024, 24, 3269–3294. [Google Scholar] [CrossRef]
Muhammad, S.Y.; Makhtar, M.; Rozaimee, A.; Aziz, A.A.; Jamal, A.A. Classification model for water quality using machine learning techniques. Int. J. Softw. Eng. Its Appl. 2015, 9, 45–52. [Google Scholar] [CrossRef]
Piazza, S.; Sambito, M.; Freni, G. Analysis of optimal sensor placement in looped water distribution networks using different water quality models. Water 2023, 15, 559. [Google Scholar] [CrossRef]
Marinelli, M.; Biancalani, R.; Joyce, B.; Djihouessi, M.B. A new methodology to estimate the level of water stress (SDG 6.4.2) by season and by sub-basin avoiding the double counting of water resources. Water 2025, 17, 1543. [Google Scholar] [CrossRef]
Tinoco, C.; Julio, N.; Meirelles, B.; Pineda, R.; Figueroa, R.; Urrutia, R.; Parra, Ó. Water resources management in Mexico, Chile and Brazil: Comparative analysis of their progress on SDG 6.5.1 and the role of governance. Sustainability 2022, 14, 5814. [Google Scholar] [CrossRef]

Figure 1. Geographic location of the study area in Kastamonu Province, Western Black Sea Region, Türkiye.

Figure 2. Relative ranking of machine learning models based on combined performance metrics under the CCME-WQI-WHO classification.

Figure 3. Relative ranking of machine learning models based on combined performance metrics under the WA-WQI-WHO classification.

Figure 4. Relative ranking of machine learning models based on combined performance metrics under the WA-WQI-FAO classification.

Figure 5. Global SHAP feature importance for water quality classification under the CCME-WQI-WHO scheme.

Figure 6. Global SHAP feature importance for water quality classification under the Weighted-WQI-WHO scheme.

Figure 7. Global SHAP feature importance for water quality classification under the Weighted-WQI-FAO scheme.

Table 1. Comparison of WHO and FAO standards for selected water quality parameters used in index calculations.

Parameter	WHO Guideline Value	Unit	Parameter	FAO Threshold/Class	Unit
pH	6.5–8.5	-	pH	6.5–8.4	-
Turbidity (NTU)	≤5.0	NTU	EC-Very good	<0.7	dS/m
EC	2500	µS/cm	EC-Moderate restriction	0.7–3.0	dS/m
TDS	1000	mg/L	EC-Severe restriction	>3.0	dS/m
Nitrate (NO₃⁻)	50.0	mg/L	Sodium	≤3.0	meq/L
Nitrite (NO₂⁻)	0.20	mg/L	Chromium (Cr)	0.10	mg/L
Chromium (Cr)	0.050	mg/L	Nickel (Ni)	0.20	mg/L
Nickel (Ni)	0.070	mg/L	Lead (Pb)	5.00	mg/L
Lead (Pb)	0.010	mg/L	Silver (Ag)	0.10	mg/L
Silver (Ag)	0.100	mg/L

Table 2. Tested hyperparameter ranges and optimal values for Logistic Regression (LR) and Decision Tree (DT).

Model	Parameter	Range/Option
Logistic Regression (LR)	C	Logspace(−3, 2, 30)
	Penalty	L2
	Solver	lbfgs
	Class weight	balanced
	Max iterations	3000–4000
Decision Tree (DT)	Max depth	[None, 2–31]
	Min samples split	[2, 5, 10, 20, 50, 100]
	Min samples leaf	[1, 2, 5, 10, 20, 50]
	Criterion	[“gini”, “entropy”, “log_loss”]

Table 3. Tested hyperparameter ranges and optimal values for Random Forest (RF) and XGBoost.

Model	Parameter	Range/Option
Random Forest (RF)	N estimators	[300, 500, 700]
	Max depth	[None, 5, 10, 15, 20, 30]
	Min samples split	[2, 5, 10, 20]
	Min samples leaf	[1, 2, 4, 8]
	Max features	[“sqrt”, “log2”, None]
XGBoost	N estimators	[300, 500, 700]
	Learning rate	[0.03, 0.05, 0.1, 0.2]
	Max depth	[3, 4, 5, 6, 8]
	Subsample	[0.7, 0.8, 0.9, 1.0]
	Colsample by tree	[0.7, 0.8, 0.9, 1.0]
	Min child weight	[1, 2, 3, 5, 7]
	Gamma	[0, 0.1, 0.3, 0.5, 1.0]

Table 4. Descriptive statistics of physicochemical parameters measured in the Taşçılar and Yumurtacılar reservoirs.

Parameters	Mean	Std	Min	%25	%50	%75	Max
pH	7.235	0.364	6.28	6.98	7.315	7.51	7.91
EC	389.012	54.956	262	356	378	407.75	641
TDS	194.723	27.531	131	178	189	203.5	324
turbidity	2.359	2.382	0	0.975	1.6	2.682	16.3
DO	2.118	0.882	0.63	1.528	1.96	2.442	6.2
salinity	0.177	0.03	0.1	0.16	0.17	0.19	0.4
temperature	10	4.013	3	7	8	14	17
NO₂	0.065	0.016	0.03	0.05	0.06	0.08	0.11
NO₃	2.209	12.744	0.84	1.23	1.43	1.85	235
TSS	36.394	493.538	0.5	3	5	7	9000
Na	14.192	11.663	2.104	3.558	8.485	27.444	36.379
Mg	7.675	1.843	2.336	6.635	7.896	8.928	11.32
Ca	28.864	9.33	18.167	20.788	22.371	38.133	46.546
K	0.957	0.764	0.107	0.207	0.812	1.676	2.355
Cr	2.896	0.56	2.149	2.405	2.661	3.374	4.357
Pb	48.189	10.743	30.793	36.709	53.086	57.883	64.321
Ni	14.74	4.295	8.172	10.61	14.278	19.042	22.669
P	51.293	8.63	33.752	42.655	52.298	58.676	71.488
Ag	64.922	10.552	44.844	56.181	65.796	74.946	78.505

Table 5. Distribution of Water Quality Indices (WQI) According to International Standards.

CCME-WQI-WHO			WA-WQI-WHO			WA-WQI-FAO
Category	Count	Percent	Category	Count	Percent	Category	Count	Percent
Fair	160	47.62	Good	269	80.6	Poor	174	51.79
Marginal	157	46.73	Poor	43	12.80	Good	139	41.37
Poor	13	3.87	Out of Range	16	4.76	Out of Range	19	5.65
Excellent	6	1.79	Excellent	8	2.38	Excellent	4	1.19

Table 6. Test performance of machine learning models, Pre and Post hyperparameter optimization under the CCME-WQI-WHO classification.

Model	Accuracy		Macro-F1		ROC-AUC		Balanced Acc		Kappa		MCC		ECE		Brier
Model	Pre	Post	Pre	Post	Pre	Post	Pre	Post	Pre	Post	Pre	Post	Pre	Post	Pre	Post
LR	0.9706	0.9559	0.9844	0.9766	0.9960	0.9984	0.9765	0.9765	0.9213	0.9204	0.8768	0.9208	0.1141	0.0505	0.1141	0.0270
DT	0.9853	0.9853	0.9462	0.9462	0.9548	0.9548	0.9166	0.9166	0.9731	0.9731	0.9736	0.9736	0.0000	0.0001	0.0147	0.0147
RF	0.9853	0.9853	0.9462	0.9462	1.0000	0.9974	1.0000	0.9166	1.000	0.9731	1.0000	0.9736	0.1547	0.0095	0.0374	0.0139
XGB	0.9853	0.9853	0.9462	0.9462	0.9987	1.0000	1.0000	0.9166	1.0000	0.9731	1.0000	0.9736	0.0261	0.0250	0.0203	0.0150

Table 7. Test performance of machine learning models pre- (Pre) and post- (Post) hyperparameter optimization under the WA-WQI-WHO classification.

Model	Accuracy		Macro-F1		ROC-AUC		Balanced Acc		Kappa		MCC		ECE		Brier
Model	Pre	Post	Pre	Post	Pre	Post	Pre	Post	Pre	Post	Pre	Post	Pre	Post	Pre	Post
LR	0.8088	0.7794	0.6338	0.6801	0.8819	0.8722	0.6157	0.7268	0.5407	0.5170	0.5609	0.5500	0.1328	0.2338	0.1486	0.1920
DT	0.8971	0.8676	0.6680	0.6631	0.8213	0.8152	0.6851	0.6759	0.7120	0.6512	0.7135	0.6558	0.0000	0.0029	0.1029	0.1309
RF	0.9118	0.8971	0.6832	0.6657	0.9434	0.9379	0.6898	0.6851	0.7207	0.7028	0.7297	0.7046	0.1100	0.1952	0.0879	0.1128
XGB	0.9265	0.9265	0.6321	0.9851	0.9428	0.9422	0.5972	0.5972	0.7671	0.7671	0.7755	0.7755	0.0391	0.0741	0.0625	0.0689

Table 8. Test performance of machine learning models before (Pre) and after (Post) hyperparameter optimization under the WA-WQI-FAO classification.

Model	Accuracy		Macro-F1		ROC-AUC		Balanced Acc		Kappa		MCC		ECE		Brier
Model	Pre	Post	Pre	Post	Pre	Post	Pre	Post	Pre	Post	Pre	Post	Pre	Post	Pre	Post
LR	0.8971	0.8971	0.7366	0.7366	0.9519	0.9512	0.8339	0.8339	0.8228	0.8228	0.8247	0.8247	0.0881	0.0881	0.0761	0.0761
DT	0.8824	0.9118	0.7072	0.7263	0.8324	0.8492	0.7178	0.7178	0.7834	0.7834	0.7861	0.7861	0.0000	0.0000	0.1176	0.1176
RF	0.9118	0.9412	0.4719	0.7333	0.9537	0.9725	0.4910	0.4910	0.8342	0.8342	0.8409	0.8409	0.1195	0.1195	0.0877	0.0877
XGB	0.9265	0.8971	0.4795	0.9552	0.9381	0.8581	0.5000	0.5000	0.8626	0.8626	0.8715	0.8715	0.0269	0.0269	0.0681	0.0681

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Güneş Şen, S. From Indices to Algorithms: A Hybrid Framework of Water Quality Assessment Using WQI and Machine Learning Under WHO and FAO Standards. Water 2025, 17, 3050. https://doi.org/10.3390/w17213050

AMA Style

Güneş Şen S. From Indices to Algorithms: A Hybrid Framework of Water Quality Assessment Using WQI and Machine Learning Under WHO and FAO Standards. Water. 2025; 17(21):3050. https://doi.org/10.3390/w17213050

Chicago/Turabian Style

Güneş Şen, Senem. 2025. "From Indices to Algorithms: A Hybrid Framework of Water Quality Assessment Using WQI and Machine Learning Under WHO and FAO Standards" Water 17, no. 21: 3050. https://doi.org/10.3390/w17213050

APA Style

Güneş Şen, S. (2025). From Indices to Algorithms: A Hybrid Framework of Water Quality Assessment Using WQI and Machine Learning Under WHO and FAO Standards. Water, 17(21), 3050. https://doi.org/10.3390/w17213050

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

From Indices to Algorithms: A Hybrid Framework of Water Quality Assessment Using WQI and Machine Learning Under WHO and FAO Standards

Abstract

1. Introduction

2. Materials and Methods

2.1. Study Area

2.2. Dataset and Data Analysis

2.3. Determination of Water Quality Classes

2.3.1. Canadian Council of Ministers of the Environment Water Quality Index (CCME WQI)

2.3.2. Weighted Arithmetic Water Quality Index (WA-WQI)

2.4. Machine Learning Approach for Determining Water Quality

2.5. Model Optimization

2.6. Model Evaluation Methods

2.7. Relative Ranking Method

3. Results

3.1. Water Quality Classification Results

3.2. Model Implementation and Optimal Model Selection

3.2.1. Comparative Overview of Model Performance Across WQI Schemes

3.2.2. Performance Results of Machine Learning Models Based on the CCME-WQI-WHO Classification

3.2.3. Performance Results of Machine Learning Models Based on the WA-WQI-WHO Classification

3.2.4. Performance Results of Machine Learning Models Based on the WA-WQI-FAO Classification

3.3. Feature Importance and Model Explainability

4. Discussion

5. Conclusions

Supplementary Materials

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI