Review Reports - Comparative Performance Analysis of Machine Learning Models for Predicting the Weighted Arithmetic Water Quality Index

Round 1

Reviewer 1 Report (New Reviewer)

Comments and Suggestions for Authors

The manuscript “Integration of PCA and Hyperparameter Optimization into Machine Learning Models for Water Quality Prediction: A Case Study in the Semi-Arid Region of Şanlıurfa, Türkiye” is technically sound and well-structured. But this manuscript needs some improvement. My suggestions for improvement of this manuscript are:

The authors have not clearly highlighted the research gaps.
There is some confusion in the manuscript. This study uses regression metrics (MAE, RMSE, R²), but ADASYN is a classification imbalance technique. The authors must clearly state whether they are predicting continuous WAWQI values or WAWQI classes. If it is a regression problem, the use of ADASYN needs stronger justification.
The LSTM needs large amounts of data in the modelling process, but this study contains only 208 observations.
The data period is also very short, limited to January to September 2018. So, how does a small period of the dataset show the ultimate behaviour?
The justification of the LSTM model is needed.
Also, the LSTM and SVR architecture needs to be provided in the methodology section.
The quality of Figure 3 is low.
The selected parameter ranges for Grid Search and Random Search appear arbitrary. A short explanation is needed.
Very high R² values indicated a risk of overfitting, so the authors should have explained the process they used to avoid it.
In some models, PCA reduced performance. The authors should explain why dimensionality reduction did not improve all algorithms.
A comparison with previous studies is required to assess the robustness of the developed models.
The authors should explain each plot in detail, as the heat plot is not well-explained.
Several citations are older than 10–15 years. Since machine learning evolves quickly, more recent studies (last 3–5 years) should be included, especially for LSTM and XGBoost applications in water quality prediction.

Author Response

Reviewer#1

Comment: The authors have not clearly highlighted the research gaps.

Response: We thank the reviewer for this observation. We acknowledge that the research gaps were insufficiently articulated in the original submission.

We have added a dedicated paragraph to the Introduction immediately preceding the study objectives, which identifies four specific research gaps: (1) the absence of studies systematically comparing the joint effects of dimensionality reduction and hyperparameter optimisation within the same modelling framework; (2) the limited application of ADASYN in continuous WAWQI prediction pipelines; (3) the scarcity of studies integrating PCA with both Grid Search and Randomised Search across SVR, XGBoost, and LSTM simultaneously; and (4) the hydrogeochemical distinctiveness of semi-arid drinking water systems relative to the riverine and lacustrine systems predominantly studied in the existing literature.

Manuscript change: New paragraph added to Introduction, immediately before the 'In this study, the evaluated models include...' paragraph. References used: [3, 15, 21, 22, 28, 53]. All references are existing in the manuscript — no new references added.

Comment: There is some confusion in the manuscript. This study uses regression metrics (MAE, RMSE, R²), but ADASYN is a classification imbalance technique. The authors must clearly state whether they are predicting continuous WAWQI values or WAWQI classes. If it is a regression problem, the use of ADASYN needs stronger justification.

Response: We thank the reviewer for identifying this important methodological clarification point. The study predicts continuous WAWQI values (regression), and regression metrics (MAE, RMSE, R²) are correctly applied throughout. However, we acknowledge that the justification for applying ADASYN — a technique conventionally associated with classification — within a regression pipeline was insufficiently explained.

The WAWQI target variable, whilst continuous, is derived from five discrete quality classes (Excellent, Good, Poor, Very Poor, Unsuitable) that are severely imbalanced in the raw dataset (majority class: Good, 53.2%; minority classes: Very Poor and Unsuitable, 2.3% each). Without correction, regression models trained on this distribution are systematically biased towards the dominant quality range. ADASYN was therefore applied to the discretised class labels as a pre-processing step, generating synthetic samples for minority-class instances based on their local neighbourhood difficulty ratios. The WAWQI values of synthesised observations were then assigned by linear interpolation between each minority instance and its nearest neighbour in feature space. Regression models were trained on this balanced dataset and evaluated on the original, unaugmented test set using standard regression metrics — a two-stage approach consistent with recent practice in environmental data modelling.

Manuscript change: Section 2.4.2 revised: the single closing sentence following Equation (3) has been replaced with two explanatory paragraphs providing full methodological justification for ADASYN use within a regression pipeline. References used: [13, 15, 17, 20, 21, 53, 54, 60]. All existing references.

Comment: The LSTM needs large amounts of data in the modelling process, but this study contains only 208 observations.

Response: We would like to clarify and address this point with the following details:

Dataset Size Correction: The total number of primary drinking water samples collected and analyzed in this study is 308, covering 13 districts across four seasons.
Handling Data Imbalance and Augmentation: To address the limitations of the initial sample size and class imbalance, we utilized the ADASYN method. This approach adaptively generated synthetic data points for minority classes, effectively expanding the training environment to 820 samples. This allowed the LSTM to better define decision boundaries and improve generalization.
Architecture Optimization: We intentionally utilized a shallow LSTM architecture (single layer with 64 units). This design choice minimizes the risk of overfitting while still capturing the non-linear temporal dependencies inherent in the water quality data.
Empirical Evidence of Generalization: As shown in the training and validation loss curves (Figure 7), both metrics converged steadily and remained closely aligned. This lack of divergence between training and validation loss confirms that the model achieved robust generalization on unseen data despite the constraints of the physical sample size.

In light of reviewer comment, it is added the detailed technical justification into Section 2.3.3.

Comment: The data period is also very short, limited to January to September 2018. So, how does a small period of the dataset show the ultimate behaviour?

Response: We believe that the sampling period (January–September 2018) is highly representative of the "ultimate behavior" of water quality in the pointed region for the following reasons:

Full Seasonal Cycle: The monitoring period was strategically chosen to encompass all four distinct climatic seasons in the region. In semi-arid continental climates, the most significant hydrochemical variations are driven by the transition from wet/cold winters to extremely hot/dry summers (where temperatures frequently exceed 40°C). Our data captures these extremes, which define the boundary conditions of the water supply system.
Capturing Critical Stressors: The primary objective was to evaluate water quality variability under intense urban pressure and climatic extremes. The inclusion of the peak summer months allowed the models to learn from the highest evaporation rates and lowest precipitation levels, which typically represent the "worst-case" quality scenarios for public health.
Variance for Predictive Modeling: For ML applications, the range and variance of the data are often more critical than the length of the time series. Our dataset captures a broad gradient of parameters—such as THMs ranging from 0.95 to 526 µg/L and Fluoride reaching 452 mg/L—providing the non-linear complexity necessary to train a robust LSTM model.
Model Validation: The high R² (0.999) and low RMSE (1.206) achieved by the LSTM model indicate that even within this nine-month window, the patterns captured were consistent and sufficient for accurate WAWQI prediction. In response to your feedback, we have revised the Study Area section to better emphasize why this specific period captures the critical environmental dynamics of the region.

So, it is added the detailed technical justification into Section 2.1 (Study Area).

Comment: The justification of the LSTM model is needed.

Response: The justification for employing LSTM over traditional ML models is based on the following technical considerations:

Modeling Temporal Dynamics: Water quality parameters are inherently time-dependent, influenced by seasonal cycles and preceding environmental conditions. Unlike "static" models such as SVR or XGBoost, which treat each sample as independent, LSTM is specifically designed to capture sequential dependencies through its recurrent gate mechanism.
Handling High-Variance Non-Linearity: Our dataset contains significant anomalies and extreme skewness (e.g., Fluoride and THMs). The deep recurrent layers of the LSTM provide the computational flexibility required to map these complex, stochastic relationships that linear or kernel-based models may oversimplify.
Empirical Superiority: The comparative analysis demonstrated that the LSTM + Randomized Search model significantly outperformed baseline and optimized versions of SVR and XGBoost. It achieved a superior R² of 0.999 and the lowest RMSE (1.206), proving its architectural suitability for this specific WAWQI dataset.
Robustness to Small Sequences: By reshaping the input data into a 3D tensor (samples, time steps, features), the model was able to extract patterns across the four-season monitoring period, even with a finite number of primary samples. We have added a dedicated paragraph in Section 4 (Discussion) to explicitly discuss these advantages and further justify the use of LSTM in environmental forecasting.

So, it is added the detailed technical justification into Section 2.3.3 (Long-Short Term Memory).

Comment: Also, the LSTM and SVR architecture needs to be provided in the methodology section.

Response: We agree that a detailed architectural description is vital for the transparency of the modeling process. We have expanded the methodology section (Section 2.3.1 and 2.3.3) to include the following technical details:

SVR Architecture: We have specified the use of the Radial Basis Function (RBF) kernel, which was selected to map the non-linear water quality parameters into a higher-dimensional feature space. The model employs an epsilon insensitive loss function to ignore errors within a certain distance from the true value, promoting model flatness and generalization.
LSTM Architecture: We have detailed the deep learning framework, which utilizes a three-dimensional input tensor (samples, time steps, features). The hidden layer consists of 64 LSTM units utilizing tanh activation functions, followed by a Dense output layer for regression. This gated architecture allows the model to selectively retain or discard information across the seasonal monitoring intervals.
Optimization Layer: We clarified that both architectures were finalized only after extensive hyperparameter tuning using GridSearch and RandomizedSearch to identify the optimal regularization and learning parameters.

So, it is added the detailed technical justification into Section 2.3.1 (SVR Architecture).

Comment: The quality of Figure 3 is low.

Response: We thank the reviewer for this observation. Figure 3 has been replaced with a higher-resolution version in the revised manuscript.

Comment: The selected parameter ranges for Grid Search and Random Search appear arbitrary. A short explanation is needed.

Response: The ranges were not selected arbitrarily but were defined based on the following criteria:

Literature-Based Initialization: The initial ranges for regularization (C), kernel coefficient (\gamma), and learning rates were set according to standard practices and established literature in water quality modeling using SVR and XGBoost.
Pilot Testing: Prior to the final training, we conducted preliminary runs to identify the regions of the parameter space where model performance (measured by R²) began to plateau. This allowed us to constrain the search space to regions that balance model complexity with predictive accuracy.
Boundary Validation: For the Randomized Search, we selected wide distributions (e.g., n_estimators [100–500] and learning_rate [0.03–0.3]) to ensure the global optimum was captured within the sampling intervals. We verified that the final "best" parameters did not fall on the extreme boundaries of the defined ranges, confirming the search space was sufficiently broad.
Computational Trade-off: The number of iterations (e.g., 30 random combinations yielding 150 models in cross-validation) was chosen to maximize search coverage while maintaining computational feasibility for the dataset size.

So, it is added the detailed technical justification into Section 2.4.4 (Hyperparameter Optimization) to justify the selection of these ranges.

Comment: Very high R² values indicated a risk of overfitting, so the authors should have explained the process they used to avoid it.

Response: We implemented several rigorous strategies to ensure that the reported R²values reflect true predictive power rather than over-parameterization:

Strict Data Separation: All performance metrics, including the R² of 0.999 for the optimized LSTM, were calculated using a 25% hold-out test set that was completely isolated from the training and hyperparameter tuning processes.
K-Fold Cross-Validation: To ensure the stability of our results, we utilized 10-fold cross-validation. This technique mitigates the risk of overfitting by iteratively validating the model on different segments of the dataset, providing a reliable estimate of its generalizability.
Adaptive Synthetic Sampling (ADASYN): We applied ADASYN to address class imbalance. By focusing synthetic data generation on "hard-to-learn" minority instances, the model was forced to learn robust features rather than just biasing toward the majority class.
Convergence Analysis: As shown in Figure 7, the validation loss closely tracks the training loss throughout the 50 epochs. The absence of a "gap" or divergence between these curves is standard empirical evidence that the model is generalizing effectively to unseen data.
Model Parsimony: We intentionally selected a single-layer LSTM architecture with 64 units to keep the number of trainable parameters low relative to our dataset size, further reducing the likelihood of overfitting. We have added a summary of these "Overfitting Prevention Measures" to Section 2.4.5 of the revised manuscript.

So, it is added the detailed technical justification into Section 2.4.5 (Model Performance Validation).

Comment: In some models, PCA reduced performance. The authors should explain why dimensionality reduction did not improve all algorithms.

Response: As noted in the results, while PCA effectively reduced multicollinearity, it did not consistently enhance predictive accuracy across all algorithms—most notably in XGBoost, where R² dropped from 0.998 to 0.980. We attribute this to the following factors:

Loss of Informative Variance: By selecting principal components that explained 95% of the variance, a 5% "residual" variance was excluded. In hydrochemical datasets characterized by extreme outliers—such as our Fluoride (452 mg/L) and THM (526 µg/L) values—this excluded variance often contains critical information that deep learning and ensemble models use to map non-linear relationships.
Algorithmic Redundancy: Models like XGBoost and SVR are mathematically robust to high-dimensional spaces. XGBoost, in particular, utilizes internal feature importance and regularization (L1/L2) to handle redundant variables. For these models, applying PCA as a preprocessing step can simplify the feature space to a point where the model loses the granular detail needed for high-precision estimation.
Preservation of Physical Meaning: Some parameters in our study, such as the microbiological indicators, showed weak correlations with the chemical mineralization group. PCA may blend these distinct physical processes into single latent factors, making it more difficult for the model to distinguish between independent contamination sources compared to using the raw, unblended features. We have added a paragraph in the Results and Discussion (Section 3.3) to specifically address these findings and the limitations of dimensionality reduction in high-gradient environmental datasets.

So, it is added the detailed technical justification into Section 3.3 (R² for WAWQI Prediction).

Comment: A comparison with previous studies is required to assess the robustness of the developed models.

Response: We thank the reviewer for this important recommendation. A structured comparison with recent studies has been added to the Results and Discussion section as a new subsection (Section 3.6).

The new subsection presents Table 6, which benchmarks the best-performing model (LSTM + Randomised Search: R² = 0.999, RMSE = 1.206, MAE = 0.829) against five recent studies covering diverse geographic settings: Han et al. (2025) [20] — urban rivers, China; Elmotawakkil et al. (2025) [21] — Morocco; Nishat et al. (2025) [34] — Dhaka rivers, Bangladesh; Mo et al. (2024) [15] — coastal city, China; and Talukdar et al. (2024) [26] — Lake Loktak, India. The accompanying narrative discusses agreements and differences between findings, including an explanation of the RMSE scale difference attributable to target variable range differences.

Manuscript change: New Section 3.6 'Comparison with Previous Studies' added after Section 3.5, comprising Table 5 and a comparative narrative paragraph. References used: [15, 17, 20, 21, 26, 34]. All existing references.

Comment: The authors should explain each plot in detail, as the heat plot is not well-explained.

Response: We thank the reviewer for this comment. The Figure 4 caption has been substantially revised to provide a self-contained interpretation of the correlation matrix. The revised caption now specifies: (i) the colour scale range (−1 to +1); (ii) the strong correlation threshold (|r| ≥ 0.70); (iii) the EC–TDS near-perfect correlation in the context of the TDS calculation protocol used; (iv) the independent clustering of microbiological parameters; and (v) the interpretation of the K⁺–THMs association. Additionally, the introductory sentence of the heatmap discussion paragraph in the main text has been corrected — the original contained a grammatical error ('Figure 4, The Pearson correlation heatmap...') — and a linking sentence connecting the correlation findings to PCA component selection has been added at the end of the paragraph.

Manuscript change: Figure 4 caption replaced with expanded interpretive caption. Opening sentence of the Figure 4 discussion paragraph corrected. One linking sentence added at end of paragraph referencing PCA implications [27, 45].

Comment: Several citations are older than 10–15 years. Since machine learning evolves quickly, more recent studies (last 3–5 years) should be included, especially for LSTM and XGBoost applications in water quality prediction.

Response: We thank the reviewer for this recommendation. We have reviewed all references and identified those requiring attention.

Foundational citations — including Cortes & Vapnik (1995) [42] for SVR theory, Hochreiter & Schmidhuber (1997) [51] for LSTM architecture, Chen & Guestrin (2016) [50] for XGBoost, Bergstra & Bengio (2012) [48] for random search, and Jolliffe (2002) [45] for PCA — have been retained as seminal methodological references, consistent with standard practice in applied ML literature.

Three targeted updates have been made: (1) Reference [46] (Carniel et al. 2019 — a venture investment study with no relevance to water quality or ML) has been identified as an erroneous entry and replaced with Mabrouk et al. (2024), a recent study on ML with data augmentation for water quality prediction, published in the Journal of Water and Climate Change. (2) The PCA application references group [30–33] has been supplemented with a 2022 review reference [76] on ML in water quality evaluation. (3) The LSTM + Randomised Search justification in Section 2.3.3 has been supplemented with supporting citations from Han et al. (2025) [20] and Elmotawakkil et al. (2025) [21], confirming the effectiveness of randomised hyperparameter search in recent deep learning applications for water quality prediction.

Manuscript change: Reference [46] replaced with: Mabrouk, M. et al. J. Water Clim. Change 2024, 15, 431–452. New reference [76] added: Zhu, M. et al. Eco-Environ. Health 2022, 1, 107–116. Two sentences added to Section 2.3.3 citing [20, 21].

Author Response File: Author Response.pdf

Reviewer 2 Report (New Reviewer)

Comments and Suggestions for Authors

Good day

I reviewed this interesting article. Authors must attend to the following comments and recommendations before it can be accepted.

Line 46; add comma (,) after “world”

Line 69; indicate if this “A variety of 68 instruments have been developed for the purpose of evaluating water quality data.” Only applies in the study area or globally.

Line 79; Water quality indexes have become a prevalent method for evaluating, in the study area or globally?

Line 154; Unlike previous studies that focus solely on continuous value prediction, this research systematically evaluates the impact of dimensionality reduction and hyperparameter tuning on WAWQI classification across ten integrated ML models.

Line 190; delete comma (,) after phases

Line 225; Table 1 should read as Sampling points and coordinates. Also add information for point 37 to 41 in the Table.

Line 614; is it Figure x or Figure 7?

Line 672; Overall, the findings align with recent studies reporting the effectiveness of LSTM-based models in environmental and water-quality forecasting applications [14,74–76]. Cite those recent studies more especially recent ones (2024-2026). The results of this study must be compared with findings of different studies from different countries and authors must indicate where those studies were conducted and whether they are agreement with their findings or not.

Line 713; is it theses or thesis?

Line 719; References, authors must also consult and cite recent sources of information until 2026 (2024 – 2026) more especially where they discuss the results.

Author Response

Reviewer#2

Comment: Line 46; add comma (,) after “world”.

Response: The suggested comma has been inserted after “world” in Line 46 to improve grammatical correctness.

Comment: Line 69; indicate if this “A variety of 68 instruments have been developed for the purpose of evaluating water quality data.” Only applies in the study area or globally.

Response: We thank the Reviewer for identifying this ambiguity. The statement refers to the global development of water quality assessment instruments. The sentence has been revised to explicitly state: “Globally, more than 68 water quality assessment instruments have been developed to evaluate water quality data under diverse hydro-environmental conditions.” This clarification eliminates potential misinterpretation.

Comment: Line 79; Water quality indexes have become a prevalent method for evaluating, in the study area or globally?

Response: The statement has been clarified to indicate that Water Quality Index (WQI) methodologies are widely used at the global scale. The revised sentence now specifies:

“WQI approaches have become a prevalent global method for evaluating and communicating water quality status.”

Comment: Line 154; Unlike previous studies that focus solely on continuous value prediction, this research systematically evaluates the impact of dimensionality reduction and hyperparameter tuning on WAWQI classification across ten integrated ML models.

Response: We thank the reviewer for identifying this terminological inconsistency. While our final results are used to classify water quality into five distinct categories (Excellent to Unsuitable), the machine learning models (SVR, XGBoost, and LSTM) were developed as regression models to predict the continuous numerical index of the WAWQI. We have corrected the word "classification" to "estimation" in the revised manuscript to accurately reflect the mathematical nature of the algorithms used. This approach was chosen to provide more precise insights into the hydrochemical status of the region before the final mapping into quality classes.

Comment: Line 190; delete comma (,) after phases.

Response: The comma has been removed as suggested.

Comment: Line 225; Table 1 should read as Sampling points and coordinates. Also add information for point 37 to 41 in the Table.

Response: Table 1 has been revised. The title now reads: “Sampling Points and Coordinates.” Sampling points 37–41 have been added with their respective geographic coordinates. Coordinate formatting has been standardized in accordance with geospatial reporting conventions.

Comment: Line 614; is it Figure x or Figure 7?

Response: This typographical inconsistency has been corrected. The correct figure number (Figure 7) is now consistently used throughout the manuscript.

Comment: Line 672; Overall, the findings align with recent studies reporting the effectiveness of LSTM-based models in environmental and water-quality forecasting applications [14,74–76]. Cite those recent studies more especially recent ones (2024-2026). The results of this study must be compared with findings of different studies from different countries and authors must indicate where those studies were conducted and whether they are agreement with their findings or not.

Response: We thank the reviewer for this detailed and constructive comment. The paragraph at Lines 672–679 has been fully revised and substantially expanded to address both requests.

The revised paragraph now provides explicit geographic context and numerical comparison for each cited study: Han et al. (2025) [20] — urban river systems, China (LSTM R² > 0.996, RMSE = 0.061–0.081); Elmotawakkil et al. (2025) [21] — drinking water networks, Morocco (LSTM R² = 0.9999, XGBoost accuracy 99.07–99.99%); Nishat et al. (2025) [34] — riverine environment, Dhaka, Bangladesh (best R² = 0.971, RMSE = 2.34, ANN); and Talukdar et al. (2024) [26] — Lake Loktak, India (RF R² = 0.97). For each study, the degree of agreement or disagreement with the present findings is explicitly stated, and differences (e.g., RMSE scale) are explained in terms of target variable range rather than model accuracy. All four comparison studies were published between 2024 and 2025, fully addressing the reviewer's request for recent citations. This revision also consolidates the response to Reviewer 1, Comment 4, thereby avoiding duplication.

Manuscript change: Lines 672–679 paragraph fully replaced with expanded geographic comparison paragraph. References cited: [15, 17, 20, 21, 26, 34]. All existing references — no new additions required.

Comment: Line 713; is it theses or thesis?

Response: The correct form “thesis” has been used based on singular reference context.

Comment: Line 719; References, authors must also consult and cite recent sources of information until 2026 (2024 – 2026) more especially where they discuss the results.

Response: We thank the reviewer for this valuable suggestion. The reference list and the Results and Discussion sections were carefully reviewed to ensure that recent literature (2023–2026) is adequately represented. The revised manuscript already includes numerous recent studies published between 2023 and 2025 that address water quality assessment, machine learning–based prediction, and WQI modelling. Examples include Han et al. (2023) [1], Babuji et al. (2023) [8], Rana et al. (2023) [13], Hussein et al. (2023) [22], Gupta and Gupta (2021, updated discussion context) [17], Ding et al. (2023) [58], Saroja et al. (2023) [74], and Gao et al. (2023) [75], as well as several very recent studies published in 2024–2025 such as Saeed et al. (2024) [3], Wieczorek et al. (2024) [4], Mo et al. (2024) [14], Uddin et al. (2024) [15], Chellaiah et al. (2024) [18], Xu et al. (2024) [21], Bai et al. (2024) [34], Talukdar et al. (2024) [25], Mahanty et al. (2023) [26], Nishat et al. (2025) [33], Han et al. (2025) [19], and Elmotawakkil et al. (2025) [20]. These studies are cited particularly in the Results and Discussion sections to contextualize the model performance and to compare our findings with recent advances in machine learning applications for water quality assessment. Therefore, the manuscript already reflects the most recent developments in the field.

Author Response File: Author Response.pdf

Reviewer 3 Report (New Reviewer)

Comments and Suggestions for Authors

Review

Integration of PCA and Hyperparameter Optimization into Machine Learning Models for Water Quality Prediction: A Case Study in the Semi-Arid Region of Şanlıurfa, Türkiye

General Comments.

The manuscript presents an ample use of many models to machine learning models in order to water quality prediction. This is really a very ambitious target, especially considering that the data used is not fully validated from the analytical chemistry point of view as well as sampling and sample treatment. I suppose this critical drawback can be remediated. If so, I would recommend going for major revision. Another major drawback is that the manuscript is for a very specific location.

Title. Too long. It should be shortened. An effective and professional paper title is typically between 10–15 words (or 31–40 characters), balancing conciseness with informative content to attract readers.

Abstract. Acceptable overall. However, there are too many acronyms (e.g., Cl⁻, pH, Electrical Conductivity (EC), Total Dissolved Solids (TDS), Nitrite (NO₂⁻), Nitrate (NO₃⁻), Ammonium (NH₄⁺), Sulfate (SO₄²⁻), Free Chlorine (Cl₂), Ca²⁺, Mg²⁺, Na⁺, K⁺, F⁻, Trihalomethanes (THMs)). It would be clearer to use full names (e.g., sulfate, ammonium) unless acronyms are essential. If acronyms are used, they must be declared in the abstract.

Keywords. Keywords should be single words, not compound terms. Please rewrite accordingly.

Introduction. Delete the first sentence, as it is implicit in the second. Rephrase the entire opening paragraph for clarity. Move paragraphs currently in lines 143–153 to the end of the section. The last paragraph is confusing; either delete it or reposition it earlier. Overall, the section requires a more logical structure. References should also be improved.

Materials and Methods. Use SI units and notation consistently. For thousands, use a blank space rather than a comma.

Line 193: The sentence “Nineteen parameters — Nineteen physicochemical and microbiological parameters” is awkward. Clarify where samples were taken (tap water, river, etc.), as this is unclear.
Line 214: Provide details on sampling procedures. How was sampling executed? What parameters were measured?
Lines 227–238: Analytical chemistry procedures must be described, including principles, control values (range, detection limits, precision, accuracy), replicates, sample conservation, and calibration. Without this, reproducibility and accuracy are compromised. Since these nineteen parameters form the basis of the study, this omission must be corrected.

Reliable, consistent, precise, and accurate data are essential. As is well known, poor-quality data lead to poor model outcomes.

Programming Codes? Clarify whether programming codes are available or documented.

Results. No comments, as there is a lack of code. However, there is no explicit analysis of model properties, limitations, or advantages.

Discussion. Internal coherence of results is not clear, despite comparisons between models. For example, lines 633–639 do not constitute a proper discussion paragraph. Lines 672–679, however, are strong and should be retained.

Conclusions. The conclusions are consistent with the results and discussion. However, it remains unclear how, in practice, 19 variables—including microorganisms—can be managed effectively. A more practical and feasible perspective should be provided. Additionally, there is no mention of dissolved nutrients, which are arguably among the most important variables.

Comments on the Quality of English Language

English always can be improved. I can see that they wrote in their native language and then translate. The old said say, traduction always lack soul. Try to think in English and then write. Sorry.

Author Response

Reviewer#3

Comment: The manuscript presents an ample use of many models to machine learning models in order to water quality prediction. This is really a very ambitious target, especially considering that the data used is not fully validated from the analytical chemistry point of view as well as sampling and sample treatment. I suppose this critical drawback can be remediated. If so, I would recommend going for major revision. Another major drawback is that the manuscript is for a very specific location.

Response: We thank the reviewer for raising this important concern regarding analytical data quality. We fully agree that the reliability of input data is fundamental to the validity of any data-driven modelling study, and we acknowledge that the original manuscript did not adequately document the quality assurance measures applied.

A new paragraph has been added at the end of Section 2.2 detailing the following quality assurance elements: (i) factory calibration and pre-campaign recalibration of all in situ instruments (Hach HQ14D, colorimetric chlorine analyser); (ii) wavelength-specific calibration of spectrophotometric analyses using certified standard solutions; (iii) multi-point external calibration and internal standard correction for ICP-OES measurements following EPA Method 200.7; (iv) adherence to TS EN ISO 9308-1 and TS EN ISO 7899-2 for microbiological analyses, both of which incorporate positive and negative control requirements; and (v) conduct of all analyses at the accredited Şanlıurfa Water and Sewerage Administration Drinking Water Laboratory under General Directorate of Public Health oversight.

We acknowledge that the absence of reported duplicate sample recovery rates and CRM data remains a limitation, which is now explicitly stated both in Section 2.2 and in the Conclusions, with a recommendation for future monitoring campaigns.

We acknowledge this limitation and thank the reviewer for raising it. The geographical focus on Şanlıurfa Province was a deliberate methodological choice enabling a spatially and seasonally consistent dataset under uniform regulatory and climatic conditions.

A new paragraph has been added to the Conclusions explicitly acknowledging this as a limitation whilst contextualising the potential transferability of the framework. The paragraph notes that semi-arid regions with comparable hydrogeological conditions — including parts of the Middle East, North Africa, and Central Asia — share similar patterns of elevated mineralisation, seasonal hydrological variability, and intensive agricultural water use, suggesting the framework may be applicable to analogous settings. Future validation across multiple basins is recommended.

Manuscript change: New QA paragraph added at end of Section 2.2. Limitation acknowledged in Conclusions. References: [16, 17, 28, 40, 41, 73]. All existing references.

Comment: Title. Too long. It should be shortened. An effective and professional paper title is typically between 10–15 words (or 31–40 characters), balancing conciseness with informative content to attract readers.

Response: We appreciate this important remark. We revised the manuscript title to ‘’Comparative Performance Analysis of Machine Learning Models for Predicting the Weighted Arithmetic Water Quality Index’’ to better reflect the scope.

Comment: Abstract. Acceptable overall. However, there are too many acronyms (e.g., Cl⁻, pH, Electrical Conductivity (EC), Total Dissolved Solids (TDS), Nitrite (NO₂⁻), Nitrate (NO₃⁻), Ammonium (NH₄⁺), Sulfate (SO₄²⁻), Free Chlorine (Cl₂), Ca²⁺, Mg²⁺, Na⁺, K⁺, F⁻, Trihalomethanes (THMs)). It would be clearer to use full names (e.g., sulfate, ammonium) unless acronyms are essential. If acronyms are used, they must be declared in the abstract.

Response: The abstract has been revised to: (a) Replace excessive chemical abbreviations with full terminology where clarity is improved. (b) Define essential abbreviations at first mention.

Comment: Keywords. Keywords should be single words, not compound terms. Please rewrite accordingly.

Response: We rewrite the manuscript keywords to ‘’machine learning; modeling; evaluation; WAWQI; LSTM’’.

Comment: Introduction. Delete the first sentence, as it is implicit in the second. Rephrase the entire opening paragraph for clarity. Move paragraphs currently in lines 143–153 to the end of the section. The last paragraph is confusing; either delete it or reposition it earlier. Overall, the section requires a more logical structure. References should also be improved.

Response: The first sentence was removed. The introductory paragraph was rephrased. Lines 143-153 were moved to the end of the section. The last paragraph was removed from the section based on the reviewer's options. The removed references were also updated.

Comment: Materials and Methods. Use SI units and notation consistently. For thousands, use a blank space rather than a comma.

Response: The Materials and Methods section has been revised to ensure consistent use of SI units and notation. Additionally, thousand separators have been corrected to use a blank space instead of a comma throughout the manuscript.

Comment: Line 193: The sentence “Nineteen parameters — Nineteen physicochemical and microbiological parameters” is awkward. Clarify where samples were taken (tap water, river, etc.), as this is unclear.

Response: The water sampling procedure is specified in lines 181-194 as follows: ‘’Samples were taken from water tanks with taps. Samples were taken from active fountains and taps directly connected to the water network at monitoring points designated by the Provincial and District Health Directorates. Groundwater serves as the main drinking water source in several districts. Siverek, Viranşehir, and Halfeti each have one spring, while Akçakale, Harran, Ceylanpınar, and Bozova rely on deep boreholes. In the Hilvan district, the primary water source is the Atatürk Dam Lake. Raw water is conveyed through a 17 km pipeline to a 1 500 m³ reservoir. It is then transferred to a second, adjacent 1 500 m³ reservoir for chlorination before being released into the public distribution network. Notably, the district lacks a formal drinking water treatment plant.The Suruç and Birecik districts utilize caisson wells drilled within the water reserve areas of the Birecik and Karkamış Dam Lakes. Additionally, Suruç benefits from five deep boreholes located in the district center. Birecik also draws from springs located within its streets. In Akçakale and Ceylanpınar, some wells are directly connected to the mains without an intermediate reservoir. The Siverek district also utilizes 28 springs in addition to the Karacadağ springs, with key sources. Similarly, in Viranşehir, 26 of the 40 springs are directly integrated into the public water supply system without a reservoir.’’ Also sampling points are as shown in Table a. The necessary explanation has been added to the manuscript.

Table a. Number of samples collected from tanks and mains water in the districts

District Name	Samples from tank water	Samples from main water
Metropolitan Districts	3	13
Akçakale	1	4
Birecik	1	6
Bozova	0	6
Ceylanpınar	2	4
Halfeti	1	4
Harran	1	3
Hilvan	1	4
Siverek	2	7
Suruç	1	4
Viranşehir	1	8

Comment: Line 214: Provide details on sampling procedures. How was sampling executed? What parameters were measured?

Response: Added to lines 217-221 in the dataset description section. ‘’The samples were taken in accordance with the methods determined in the "Handbook on Sample Collection, Transport and Analysis of Water for Human Consumption" published by the Ministry of Health in 2008. Microbiological samples were taken in 500 mL pure thiosulfate polypropylene bottles, and physicochemical samples were taken in 1.5 liter polyethylene bottles.’’

Comment: Lines 227–238: Analytical chemistry procedures must be described, including principles, control values (range, detection limits, precision, accuracy), replicates, sample conservation, and calibration. Without this, reproducibility and accuracy are compromised. Since these nineteen parameters form the basis of the study, this omission must be corrected.

Response: In lines 229-239. ‘Microbiological analyzes were carried out at ŞUSKİ (General Directorate of Water and Sewerage Administration of Şanlıurfa Metropolitan Municipality) Drinking Water Treatment Plant Laboratory with the financial support of HUBAP (The Coordinating Unit for Scientific Research Projects at Harran University, Project No: 17198). T. Coliform, E. coli (TS EN ISO 9308-1), and Enterococci (TS EN ISO 7899-2) measurements were performed via membrane filter method. The water samples were gently shaken to ensure homogeneity. 100 ml water sample was filtered from three membrane filters prepared for T. Coliform, E. coli and Enterococci. After filtering process, T. Coliform and E. coli were incubated in chronogeneous agar broth at 37 oC for 24 hours whereas Enterococci was incubated in Avida broth at 37 oC for 48 hours. Colonies formed at the end of the incubation period were counted. It was observed that red colonies were formed for T. Coliform, blue for E. coli and cherry color colonies for Enterococci.’’ Also Table b contains the analysis technique and device information used for each parameter.

Table b. Water Quality Parameters and Analytical Methods

Parameter	Analysis Method
Temp (°C)	Digital Thermometer
Cl₂ (mg/L)	Colorimetric Method with Hach Device
pH	Hach HQ14D Digital Conductivity Meter
EC (µS/cm)	Hach HQ14D Digital Conductivity Meter
TDS (mg/L)	Calculated from EC values with a coefficient (K: 0.65)
NO₂⁻ (mg/L)	Spectrophotometer
NO₃⁻ (mg/L)	Spectrophotometer
NH₄⁺ (mg/L)	Spectrophotometer
SO₄²⁻ (mg/L)	Spectrophotometer
Cl⁻ (mg/L)	Spectrophotometer
Ca²⁺ (mg/L)	ICP-OES (EPA 3005A, EPA 200.7)
Mg²⁺ (mg/L)	ICP-OES (EPA 3005A, EPA 200.7)
Na⁺ (mg/L)	ICP-OES (EPA 3005A, EPA 200.7)
K⁺ (mg/L)	ICP-OES (EPA 3005A, EPA 200.7)
F⁻ (mg/L)	Spectrophotometer
THM (µg/L)	ICP-OES (EPA 3005A, EPA 200.7)

Comment: Reliable, consistent, precise, and accurate data are essential. As is well known, poor-quality data lead to poor model outcomes.

Response: This comment is addressed jointly with Comment 1 above. The QA paragraph added to Section 2.2 directly demonstrates the reliability and consistency of the dataset through documented calibration procedures, certified analytical methods, and accredited laboratory oversight. The acknowledged limitation regarding CRM data further reflects our commitment to transparent reporting of data quality boundaries.

Comment: Programming Codes? Clarify whether programming codes are available or documented.

Response: - The models were developed using Python 3.x.

- We utilized the Scikit-learn library for SVR, PCA, and hyperparameter tuning (Grid Search and Randomized Search).

- The XGBoost library was used for the ensemble tree-based modeling.

- The Keras/TensorFlow API was employed for the construction and training of the LSTM network.

- Data preprocessing steps, including ADASYN for class balancing and Winsorization for outlier treatment, were implemented using standard open-source libraries.

While the full internal scripts are currently maintained within the AIoTx Research Laboratory for ongoing project development, the detailed hyperparameters and architectural configurations provided in the manuscript are sufficient for any researcher to reproduce the findings using the aforementioned standard tools.

So, it is added the detailed technical justification into Section 2.3 (Machine Learning Algorithms).

Comment: Results. No comments, as there is a lack of code. However, there is no explicit analysis of model properties, limitations, or advantages.

Response: While the results demonstrate high numerical accuracy, we agree that discussing the qualitative properties of the algorithms adds significant value to the study.

In the revised manuscript, we have added a dedicated subsection in the Discussion to evaluate:

Model Advantages: We discuss how the LSTM’s memory cells captured seasonal dependencies and how XGBoost's internal regularization allowed it to outperform PCA-hybrid versions.
Model Properties: We analyze the role of the RBF kernel in SVR for mapping non-linear physicochemical relationships.
Limitations: We explicitly address the dependency of deep learning on data balancing techniques like ADASYN and the potential for information loss when applying dimensionality reduction to datasets with extreme outliers like Fluoride and THMs.

So, it is added the detailed technical justification into 3.6. Comparative Analysis of Model Properties, Advantages, and Limitations

Comment: Discussion. Internal coherence of results is not clear, despite comparisons between models. For example, lines 633–639 do not constitute a proper discussion paragraph. Lines 672–679, however, are strong and should be retained.

Response: We thank the reviewer for this precise structural observation. Lines 633–639 in the original manuscript constituted a repetition of sampling and analytical methodology already presented in Section 2.2, and did not contribute analytical value to the Discussion.

This paragraph has been replaced with a substantive discussion paragraph that: (i) affirms the analytical reliability of the dataset with reference to specific QA procedures; (ii) interprets the broad parameter ranges and skewness documented in Table 3 as attributable to genuine spatial and seasonal heterogeneity rather than analytical artefacts; (iii) supports this interpretation through the internal consistency of inter-parameter correlations in Figure 4; and (iv) acknowledges the absence of duplicate sample and CRM data as a limitation with a forward-looking recommendation.

Manuscript change: Lines 633–639 paragraph replaced with analytical discussion paragraph. References: [16, 17, 40, 41]. All existing references.

Comment: Conclusions. The conclusions are consistent with the results and discussion. However, it remains unclear how, in practice, 19 variables—including microorganisms—can be managed effectively. A more practical and feasible perspective should be provided. Additionally, there is no mention of dissolved nutrients, which are arguably among the most important variables.

Response: We thank the reviewer for this practical and insightful comment. Both points have been addressed in the Conclusions.

Regarding the practicality of 19 variables: a new conclusion point (5) has been added stating that feature importance analysis (XGBoost coefficients and PCA loadings) identifies EC, TDS, Ca²⁺, SO₄²⁻, and K⁺ as the dominant predictors of WAWQI variance. A reduced monitoring protocol targeting these five parameters, supplemented by periodic microbiological screening, is proposed as a practical and cost-effective operational approach.

Regarding dissolved nutrients: we note that NO₃⁻, NO₂⁻, and NH₄⁺ were included as input variables in the model (Table 3, Section 2.2). Their apparent absence from the Conclusions was an oversight. The new conclusion point (5) now explicitly acknowledges the inclusion of these nitrogen species and notes their importance for detecting localised agricultural contamination signals within the GAP irrigation region, where intensive fertiliser application represents a potential diffuse contamination pathway.

Manuscript change: New conclusion point (5) added to Conclusions, addressing both the practical variable reduction strategy and the role of dissolved inorganic nitrogen species. References: [3, 38]. Both existing references.

Comment: English always can be improved. I can see that they wrote in their native language and then translate. The old said say, traduction always lack soul. Try to think in English and then write. Sorry.

Response: Various sentences and phrasings have been refined for clarity and flow. The following examples illustrate these revisions.

Table c. Examples of language improvements in the revised manuscript

· Original Sentence	· Improved Version
Accurate prediction of the water quality is essential for sustainable water resource management and public health, particularly in semi-arid regions.	Precise water quality forecasting is vital for sustainable resource management and public health, especially in semi-arid environments.
The models evaluated include Support Vector Regressor (SVR) and SVR combined with Principal Component Analysis (PCA)...	We evaluated ten predictive models, including Support Vector Regressor (SVR) and Extreme Gradient Boosting (XGBoost), both integrated with dimensionality reduction and hyperparameter optimization.
Overall, the results demonstrate that the LSTM+Randomized Search framework provides the most robust performance...	These findings underscore the effectiveness of the LSTM framework in modeling the complex variance of the Weighted Arithmetic Water Quality Index (WAWQI).
The Adaptive Synthetic Sampling (ADASYN) technique was applied to enhance model generalization and mitigate data imbalance effects.	To improve model generalization and mitigate the effects of class imbalance, we implemented the Adaptive Synthetic Sampling (ADASYN) technique.
In addition, a Long Short-Term Memory (LSTM) network was incorporated to capture temporal dependencies in water quality data...	We incorporated a Long Short-Term Memory (LSTM) network to specifically capture the temporal dependencies and seasonal fluctuations inherent in the monitoring data.
The temporal structure of the seasonal sampling allowed the application of LSTM models to capture potential time-dependent patterns.	The seasonal structure of our dataset enabled the LSTM to identify time-varying patterns that stationary models often overlook.
By transforming the 19 original variables into these five uncorrelated latent factors, the SVR-PCA model effectively reduces multicollinearity...	By condensing the 19 original features into five uncorrelated components, the PCA-SVR model mitigated multicollinearity and enhanced predictive stability.
This performance highlights the strong capability of deep learning architectures to model complex nonlinear relationships...	Such performance demonstrates the capacity of deep learning to capture the high-dimensional, non-linear dynamics often found in hydrochemical data.
The convergence behavior of the training and validation loss functions further suggests stable learning without pronounced overfitting.	Stable convergence of both training and validation loss functions confirms the model’s robust generalization without signs of overfitting.

Author Response File: Author Response.pdf

Round 2

Reviewer 1 Report (New Reviewer)

Comments and Suggestions for Authors

The authors have implemented all the suggested changes. So, the manuscript is ready for publication.

Reviewer 3 Report (New Reviewer)

Comments and Suggestions for Authors

Dear authors. I indeed congratulate to you all for your great effort to remediate and cure your document.

Good work.

Give a last check to the language. Some sentence can be improved and flow (syntaxis).

I leave this laganguage correction to Journal language specialist.

Kind regards.

Comments on the Quality of English Language

still some room to improved, but for the purposes of scientific paper language goes well.

This manuscript is a resubmission of an earlier submission. The following is a list of the peer review reports and author responses from that submission.

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

Why did you select the ML models that you used in your study?

Why did you use the 16 water quality parameters in your study?

Table 2: Standard for pH has a range of values and not 8.5. Similarly for F.

Table 2: How did you get Weight and Relative weight?

Table 3 is not required. It is already presented in Line 453-455.

Table 2 shows only 12 parameters and not 16 you studied.

Table 4: Temperature value is wrong.

Table 4: Some values are very high like F, K, THM. Explain why.

Figure 4: It is not clear why do you want to make find correlation between WQI and water quality parameters. WQI is derived from all these parameters.

Section 3.2: What are the parameters in PC1, etc. Explain.

It is not clear why do you want to find WQI index of drinking water. Drinking water is supposed meet the drinking water quality standards.

Same data presented in tables and figures (Figure 7 and Table 5).

Present the results showing WQI of the samples tested.

Section 4: Repetition of whatever already presented earlier.

It is not clear how use of ML techniques for WQI determination is useful as claimed in the Introduction.

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 2 Report

Comments and Suggestions for Authors

Dear authors, please clarify the following questions:

Table 2 has parameter measurement disabled
Was sampling carried out upstream or downstream, directly from the river or from a pipeline?
Table 3 shows water quality gradation levels, but the article does not provide such levels for the parameters under study, and there is no detailed analysis
The article also contains a statistical analysis of water quality based on modern modeling technologies, the optimal ones are determined, but no forecasting as such was made in the article. Accordingly, it is necessary to either change the title of the article or supplement the material with the results of the forecast for 1-5 years, for example

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Round 2

Reviewer 1 Report

Comments and Suggestions for Authors

Title: Why do you want to model WQI? WQI is already a model.

Line 67: "The WQI was first introduced in the 1960s by [12]. " It should be rewritten as "....in the 1960s by (authors)". Similarly in other places.

Introduction: Discuss studies reported in the literature on WQI of DRINKING Water. Explain how your study is different.

Introduction: Explain the need to determine the WQI of DRINKING water. Also, discuss the use of ML for this application.

Introduction: "The main objective of this study is to develop ML algorithms to make an accurate WQI estimate of reliable water use based on various climate variables in semiarid regions." Explain this point in Results and Discussion with respect to this point, that is, how ML algorithms are able to make accurate WQI estimate.

Introduction: "The study highlights the effective application of ML algorithms, individually or combined, to enhance the precision and dependability of WQI predictions, .." Are you going to predict WQI using ML? Explain with your results.

Table 2 can be in Supplementary material.

Section 2.6: Why did you use Equation 5 for calculating WQI? There are many others.

Table 3: Some Relative weight values are not correct, for example Nitrate, sulfate. chloride.

Microbiological quality is the most important parameter for drinking water. Why this was not included?

Table 3: How did you decide the weights of the parameters?

Equation 5: How did you get Qi values?

Table 3: Is is chlorine or chloride? What is the WHO limit for chloride? The value given in wrong.

Table 5 shows that most of the water quality parameters are within the WHO limits. Still 87.3% of the water sample tested belong to "Bad" category. Why this contradiction?

Line 566: "The correlation results indicate that some parameters, such as EC, TDS, SO₄²⁻, and Ca²⁺, exhibit stronger associations with WQI". Explain why stronger association found.