Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Share Help Cite Discuss in SciProfiles

Open AccessArticle

Peer-Review Record

Water Quality Identification: Integrating IoT Sensors and Deep Learning for Near-Real-Time Water Quality Assessment

Appl. Sci. 2026, 16(10), 4868; https://doi.org/10.3390/app16104868

by Christina Tsolaki¹

, George Kokkonis²

, Stavros Valsamidis³

and Sotirios Kontogiannis^1,*

Reviewer 1: Anonymous

Reviewer 2: Anonymous

Reviewer 3: Anonymous

Reviewer 4: Anonymous

Appl. Sci. 2026, 16(10), 4868; https://doi.org/10.3390/app16104868

Submission received: 10 April 2026 / Revised: 7 May 2026 / Accepted: 10 May 2026 / Published: 13 May 2026

(This article belongs to the Special Issue Applications of Industrial Internet of Things (IIoT) Platforms: 2nd Edition)

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

The manuscript "Water Quality Identification: Integrating IoT Sensors and Deep Learning for Near-Real-Time Water Quality Assessment" presents a highly relevant framework for environmental monitoring. To meet the rigorous publication standards of Applied Sciences, please address the following comments:

The system claims to provide "near-real-time" water quality assessment. However, the study lacks an evaluation of the computational overhead required by the deep learning models. Please include an experiment measuring the inference latency, memory usage, and power consumption of the models when deployed on actual IoT edge hardware versus centralized cloud servers to practically validate the "near-real-time" claim.
Enhance the Introduction by explicitly defining the fundamental mathematical equations or standard parameters used to calculate the target Water Quality Index (WQI) before introducing the predictive algorithms.
Expand the Conclusion to include a comparative analysis of the framework's scalability across diverse aquatic environments (e.g., drinking water reservoirs versus urban sewer networks). Furthermore, discuss the ongoing challenges that remain beyond this paper, such as the models' ability to adapt to sudden ecological anomalies (concept drift), and the long-term energy harvesting and maintenance requirements for the remote IoT sensor nodes.
To clearly demonstrate the novelty of the proposed framework, extend the literature review and benchmark study. Compare your specific deep learning architecture's performance (e.g., using RMSE, MAE) against highly efficient, state-of-the-art ensemble machine learning models (such as XGBoost or LightGBM) heavily utilized in 2023–2024 water quality prediction literature.
The section detailing the dataset preparation requires significant improvement. Real-world IoT water sensors are highly susceptible to bio-fouling, calibration drift, and signal dropouts. Please explicitly detail the data imputation techniques, outlier removal processes, and noise-filtering algorithms applied to the raw sensor data before feeding it into the deep learning pipeline.

Author Response

Response: Thank you for taking the time to review our manuscript. Only comment responses have been highlighted in the text to assist reviewers.

Comment 1: The system claims to provide "near-real-time" water quality assessment. However, the study lacks an evaluation of the computational overhead required by the deep learning models. Please include an experiment measuring the inference latency, memory usage, and power consumption of the models when deployed on actual IoT edge hardware versus centralized cloud servers to practically validate the "near-real-time" claim.

Response: We completely agree with the reviewer that a "near-real-time" claim must be backed by a practical evaluation of the system's hardware footprint. It's one thing to run models in a laboratory environment and another to ensure they are viable on actual IoT nodes. To address this, we have added a paragraph in lines 1059–1066. Furthermore, the device's inference latencies are presented in Table 7 of Scenario III. Cloud inferences are less than 0.5s.

Comment 2: Enhance the Introduction by explicitly defining the fundamental mathematical equations or standard parameters used to calculate the target Water Quality Index (WQI) before introducing the predictive algorithms.

Response: We agree that defining the target metric early on provides much-needed context for the reader. In lines 103–110, we have updated the Introduction to explicitly define the Water Quality Index (WQI) and the five specific parameters, pH, temperature, TDS, EC, and turbidity, that form its basis. We have also clarified that our approach is not grounded in the established Horton and NSF-WQI frameworks. Our experimentation used a more score-based WQI measure, similar to the Air Quality Index: low-better, high-bad score. To maintain a smooth narrative flow, we now direct the reader to Section 3.2 for the full mathematical equations and weighting factors, ensuring the theoretical foundation is clear before introducing our predictive algorithms.

Comment 3: Expand the Conclusion to include a comparative analysis of the framework's scalability across diverse aquatic environments (e.g., drinking water reservoirs versus urban sewer networks). Furthermore, discuss the ongoing challenges beyond this paper, such as the models' ability to adapt to sudden ecological anomalies (concept drift) and the long-term energy-harvesting and maintenance requirements for the remote IoT sensor nodes.

Response: We appreciate the reviewer pushing us to look at the "big picture" of real-world deployment. In lines 1246–1248, we've expanded the Conclusion to compare how the framework scales between stable drinking water reservoirs and much harsher environments, such as urban sewer networks, which involve significant biofouling and maintenance challenges. We also address the algorithmic challenge of "concept drift" during sudden ecological shifts and discuss the need for future iterations to include online learning and solar-based energy harvesting to ensure the nodes can survive long-term in the field without constant manual intervention.

Comment 4: To clearly demonstrate the novelty of the proposed framework, extend the literature review and benchmark study. Compare your specific deep learning architecture's performance (e.g., using RMSE, MAE) against highly efficient, state-of-the-art ensemble machine learning models (such as XGBoost or LightGBM) heavily utilized in 2023–2024 water quality prediction literature.

Response: We agree that benchmarking against state-of-the-art ensemble models such as XGBoost and LightGBM is essential to establishing the novelty of our work. In lines 1178–1189, we added a comparative analysis explaining why we chose a recurrent GRU architecture over these gradient-boosting methods. While ensemble models are excellent for tabular data, our findings show they often struggle with the temporal flow of high-frequency time series. By prioritizing measurement sequence, our GRU models achieved a competitive validation RMSE of 0.0255. They maintained a test R² above 0.98, demonstrating that modeling time dependence is vital for the accuracy of near-real-time IoT sensor streams.

Comment 5: The section detailing the dataset preparation requires significant improvement. Real-world IoT water sensors are highly susceptible to bio-fouling, calibration drift, and signal dropouts. Please explicitly detail the data imputation techniques, outlier removal processes, and noise-filtering algorithms applied to the raw sensor data before feeding it into the deep learning pipeline.

Response: Dealing with the messy nature of real-world IoT data is indeed a major challenge, and we've now clarified our approach to this in lines 873–880. We've added a detailed breakdown of our three-step cleaning pipeline: first, we strip out outliers using realistic physical thresholds; next, we fill in signal dropouts using fuzzy-interpolation to keep the data sequence unbroken; and finally, we apply a 30-sample rolling average to smooth out the high-frequency jitter typical of budget sensors. These steps ensure that the GRU models focus on actual water quality trends rather than being distracted by hardware glitches or calibration drift.

Reviewer 2 Report

Comments and Suggestions for Authors

This paper focuses on near-real-time water quality assessment by integrating low-cost IoT sensors and GRU-based deep learning, proposing the Water-QI platform to achieve Water Quality Index (WQI) calculation and forecasting. The method evaluates shallow and deep GRU architectures under hourly and minute-level temporal resolutions, and verifies the effectiveness of the solution through edge device deployment experiments, which is well-suited to the demands of low-cost, scalable water quality monitoring in smart cities. However, the following points could further enhance the paper's rigor and reproducibility:

The manuscript requires professional English polishing to standardize technical terms, optimize sentence logic, and eliminate ambiguous expressions, ensuring consistency in academic writing throughout the text.
Some key formula parameters are not clearly defined. In the proposed water quality comprehensive score WQS formula. the selection basis, application scope of the weight coefficient α, and the specific calculation method of normalized RMSE are not clearly defined or explained. It is recommended to supplement the complete parameter description and calculation logic.
Regarding the dataset partitioning strategy，Figures 3 and 5 illustrate the training and validation RMSE curves for different GRU architectures. Nevertheless, the associated dataset split ratio and validation protocol are not explicitly documented in the figures, which prevents readers from accurately evaluating the convergence behavior and stability of the models. I recommend refining the figure captions and providing additional explanatory text in the manuscript.
The paper directly adopts a long input sequence of 1440 steps for the minute-level prediction task, without explaining the basis for selecting this sequence length, analyzing its potential impact on gradient vanishing and training efficiency, or providing performance comparisons with shorter sequences. The rationality of the model's input dimension design needs to be strengthened, and it is recommended to supplement the selection basis for the sequence length and relevant comparative experiments.
The edge inference experiments are only conducted on a single hardware platform, the Raspberry Pi Zero 2W. Key variables such as memory usage, inference framework version are not controlled, and the model quantization and precision settings are not specified. As a result, the edge-side performance test results lack horizontal comparability, and the standardization of the experimental setup needs to be improved. It is recommended to unify the experimental environment and supplement descriptions of key parameters.
To enrich the research background, several relevant papers are worth referencing:

[a] AIGC video detection based on the fusion of spatial-frequency-optical flow multimodal features, in Journal of Systems Engineering and Electronics, doi: 10.23919/JSEE.2026.000049.

[b]An interpretable deep learning framework for intrusion detection in industrial Internet of Things[J]. Internet of Things, 2025: 101681.

In summary, this study delivers an innovative XAI-MLOps integrated solution for interpretable concept drift detection, with distinct theoretical innovation and engineering application potential. Addressing the above points will substantially improve the technical completeness, reproducibility and practical guidance of the paper. Major revisions are recommended before acceptance.

Author Response

Reviewer 2:

Response: Thank you for your time and effort in reviewing our manuscript. Here, we quote our responses and the amendments we made based on your comments. Only comment responses have been highlighted in the text to assist reviewers.

Comment 1: The manuscript requires professional English polishing to standardize technical terms, optimize sentence logic, and eliminate ambiguous expressions, ensuring consistency in academic writing throughout the text.

Response: The manuscript has been thoroughly revised to correct inconsistencies, improve clarity, and eliminate syntactic and typographical errors, thereby conveying the research more clearly.

Comment 2: Some key formula parameters are not clearly defined. In the proposed water quality comprehensive score WQS formula. the selection basis, application scope of the weight coefficient α, and the specific calculation method of normalized RMSE are not clearly defined or explained. It is recommended to supplement the complete parameter description and calculation logic.

Response: Thank you very much for this comment. As we detailed, the choice of α = 0.8 is a deliberate strategic decision to prioritize predictive precision (RMSE) over goodness-of-fit (R²), as the latter can often be misleadingly high in noisy environmental datasets. Appropriate amendments have been made to the exploratory analysis paragraph in lines 294-309

Comment 3: Regarding the dataset partitioning strategy，Figures 3 and 5 illustrate the training and validation RMSE curves for different GRU architectures. Nevertheless, the associated dataset split ratio and validation protocol are not explicitly documented in the figures, which prevents readers from accurately evaluating the convergence behavior and stability of the models. I recommend refining the figure captions and providing additional explanatory text in the manuscript.

Response: Thank you very much for this observation. Appropriate amendments have been performed in lines 942-928. Also, lines 835-883 have been rewritten to show how minute-resolution experimentation reflects on the standard, high-capacity/heavy model and deep GRU models from the hourly-resolution experiments. The captions for Figures 3 and 5 have also been amended.

Comment 4: The paper directly adopts a long input sequence of 1440 steps for the minute-level prediction task, without explaining the basis for selecting this sequence length, analyzing its potential impact on gradient vanishing and training efficiency, or providing performance comparisons with shorter sequences. The rationality of the model's input dimension design needs to be strengthened, and it is recommended to supplement the selection basis for the sequence length and relevant comparative experiments.

Response: Thank you very much for this comment. An additional paragraph has been added in lines 854-860 to strengthen the rationale for extended sequence lengths and their potential impact on vanishing gradients.

Comment 5: The edge inference experiments are only conducted on a single hardware platform, the Raspberry Pi Zero 2W. Key variables such as memory usage, inference framework version are not controlled, and the model quantization and precision settings are not specified. As a result, the edge-side performance test results lack horizontal comparability, and the standardization of the experimental setup needs to be improved. It is recommended to unify the experimental environment and supplement descriptions of key parameters.

Response: The edge inference experiments were deliberately only conducted on the Water-QI implemented end nodes. Horizontal comparability with the ESP32 nodes is also presented in lines 1051-1059. Nevertheless, these nodes cannot cope with the memory requirements of these models, as mentioned. To this end, Scenario III presents the Edge memory and computational limits for inference delivery on a more advanced Quad-core ARM device, as mentioned. Nevertheless, single-layer GRU models with more than 2048 units are not examined, since inference times are well above the 1min resolution capabilities. Regarding precision and repeatability, appropriate amendments have been performed in lines 1060-1066.

Comment 6: To enrich the research background, several relevant papers are worth referencing:

[a] AIGC video detection based on the fusion of spatial-frequency-optical flow multimodal features, in Journal of Systems Engineering and Electronics, doi: 10.23919/JSEE.2026.000049.

[b]An interpretable deep learning framework for intrusion detection in industrial Internet of Things[J]. Internet of Things, 2025: 101681.

Response: Thank you very much for this remark. However, the relevance of the suggested papers is not close to that of multi-parameter prediction or classification/sensory IoT systems for water quality monitoring; therefore, they cannot be included as relevant papers in this study.

Comment 7: In summary, this study delivers an innovative XAI-MLOps integrated solution for interpretable concept drift detection, with distinct theoretical innovation and engineering application potential. Addressing the above points will substantially improve the technical completeness, reproducibility and practical guidance of the paper. Major revisions are recommended before acceptance.

Response: Thank you for your time and effort in reviewing our manuscript and for your targeted remarks and observations.

Reviewer 3 Report

Comments and Suggestions for Authors

The manuscript presents a novel IoT-based platform for near-real-time water quality monitoring. The authors investigate the application of Gated Recurrent Unit (GRU) architectures for predicting the Water Quality Index (WQI) under different temporal resolutions. The study addresses a relevant problem in smart city infrastructure, proposing low-cost solutions with edge-computing capabilities. However, the manuscript requires major revisions before publishing.

Major Comments:

The literature review is lengthy, but its contribution to the core issues is limited. It is suggested that the sections 1. Introduction and 2. Related Work be within 3 pages. It is recommended to incorporate a critical analysis of the performance indicators presented in the extensive body of literature summarized in Tables 1 and 2, thereby facilitating a coherent connection to the central theme of the paper.
The Abstract and Introduction sections of the manuscript do not explicitly delineate the primary contribution of the study. It remains unclear whether the author proposes a novel algorithm, a new system architecture, or innovative application scenarios or merely presents an empirical comparison of existing GRU models. Although the paper attempts to address multiple dimensions, none is sufficiently emphasized to establish a distinct contribution.
Section 1. Introduction: The authors assert that current methods are costly and lack real-time monitoring capabilities; however, this issue has been extensively addressed by numerous low-cost IoT solutions documented in the literature [1]-[8]. The manuscript fails to clearly articulate the fundamental distinctions between the proposed Water-QI system and existing approaches, as well as to justify why these differences warrant publication in a high-impact scientific journal.

4. Section 3 provides a detailed account of the engineering implementation of the Water-QI system but does not engage meaningfully with any theoretical frameworks related to water quality assessment. The inclusion of the NSF-WQI appears solely as a background reference for formula calculations rather than serving as a theoretical foundation. Moreover, the paper does not address the critical theoretical question of why the Water Quality Index (WQI) constitutes an appropriate metric for evaluating urban drinking water quality.

5. Lines 245-247: The author incorporated the "water quality score (WQS)" into equation (1). While this approach is innovative, the rationale for selecting α = 0.8 is insufficiently substantiated. It remains unclear whether this indicator aligns with established domain standards; if it is a novel metric, a more rigorous validation process is warranted. Additionally, the WQS formula presupposes that the RMSE values are normalized within the interval [0,1]. However, as evidenced in Table 1, WQS values such as -0.242 and 0.9954 are reported. The presence of a negative value suggests that the RMSE exceeds 1, which contradicts the normalization assumption articulated in the manuscript.

6. Lines 806-810: How does monthly data cover "minute level" experiments? The author mentions "temporally fuzzy-interpolated to provide minute-level measurements" but does not specify interpolation methods, interpolation assumptions, or interpolation errors. It is easy to mistake raw data for minutes, but it is actually synthetic data.

section 4. Experimental Scenarios: Table 4 presents a prediction horizon of 1440 minutes or 24 hours; however, the rationale for selecting a 24-hour prediction horizon is not provided. Furthermore, the comparison between the hourly and minute-level experiments lacks consistency, as the number of variables differs, thereby limiting the validity of any substantive conclusions.
Section 4.2: The regression analyses report only RMSE and R² metrics, omitting the Mean Absolute Error (MAE), which is a more interpretable measure of error, particularly in noisy contexts such as water quality monitoring. Additionally, although the Water Quality Score (WQS) metric was introduced, it was not employed to assess the performance of the proposed models.
The authors introduce GRU models; however, they do not conduct a comparative analysis against a basic baseline or alternative lightweight models within the presented text. Although the abstract references a comparison among various GRU architectures, it is essential to include evaluations against other algorithms to substantiate the claimed superiority of the selected approach.

Author Response

Comment 1: The literature review is lengthy, but its contribution to the core issues is limited. It is suggested that the sections 1. Introduction and 2. Related work be within 3 pages. It is recommended to incorporate a critical analysis of the performance indicators presented in the extensive body of literature summarized in Tables 1 and 2, thereby facilitating a coherent connection to the central theme of the paper.

Response: Thank you very much for this observation. However, the content of the Introduction and Related work, cannot fit in 3 pages. That is because the Related work Tables 1 and 2 are one page long. The reason for this extention is to give the reader a closer view of existing research classified in ML and DL methods and to shortly introduce each examined paper best method and provide a cross comparison section that shows the better performance of DL methods wherever applicable (As also mentioned in some papers), with respect to ML, that hide behind datasets of limited sizes (That is why the WQS score has been introduced). However, the incorporation of the WQS into the authors' experimentation is a critical omission that leaves all this work astray. Therefore, an additional WQS column has been added to the Tables in the Experimental scenarios I and II, and a coherent connection with the best case results of Tables 1 and 2 has been added with an additional of five paragraphs at the end of the section Discussion of the results as highlighted.

Comment 2: The Abstract and Introduction sections of the manuscript do not explicitly delineate the primary contribution of the study. It remains unclear whether the author proposes a novel algorithm, a new system architecture, or innovative application scenarios or merely presents an empirical comparison of existing GRU models. Although the paper attempts to address multiple dimensions, none is sufficiently emphasized to establish a distinct contribution.

Response: You're right to point out that we initially tried to cover several angles without clearly highlighting our "anchor" contribution. We've now sharpened the abstract and the Introduction paragraph (lines 94–102) to make it clear that our primary contribution isn't just a model comparison, but the Water-QI platform itself. We've explicitly stated that the novelty lies in the systematic optimization of GRU architectures specifically for high-resolution, minute-level forecasting on resource-constrained edge hardware.

Comment 3: Section 1. Introduction: The authors assert that current methods are costly and lack real-time monitoring capabilities; however, this issue has been extensively addressed by numerous low-cost IoT solutions documented in the literature [1]-[8]. The manuscript fails to clearly articulate the fundamental distinctions between the proposed Water-QI system and existing approaches, as well as to justify why these differences warrant publication in a high-impact scientific journal.

Response: We acknowledge that low-cost IoT is a well-explored field, which is why we've sharpened our focus in lines 123–130. The real distinction of Water-QI isn't just the price tag, but the shift from passive data logging to proactive Edge intelligence. Unlike existing systems that depend on the cloud for analysis, we've optimized GRU models to run complex, minute-level predictions locally on the device. This allows the system to anticipate quality drops at the source, offering a much more resilient, decentralized solution for urban water management.

Comment 4: Section 3 provides a detailed account of the engineering implementation of the Water-QI system but does not engage meaningfully with any theoretical frameworks related to water quality assessment. The inclusion of the NSF-WQI appears solely as a background reference for formula calculations rather than serving as a theoretical foundation. Moreover, the paper does not address the critical theoretical question of why the Water Quality Index (WQI) constitutes an appropriate metric for evaluating urban drinking water quality.

Response: We have revised Section 3.2 (lines 764-766) to explicitly frame the NSF-WQI and Horton models as the theoretical foundations of our system, rather than merely a source of formulas. We now argue that the WQI is the most appropriate metric for urban drinking water because it serves as a translation layer, compressing complex, multivariate chemical data into a single, actionable score that municipal authorities can use for immediate decision-making.

Comment 5: Lines 245-247: The author incorporated the "water quality score (WQS)" into equation (1). While this approach is innovative, the rationale for selecting α = 0.8 is insufficiently substantiated. It remains unclear whether this indicator aligns with established domain standards; if it is a novel metric, a more rigorous validation process is warranted. Additionally, the WQS formula presupposes that the RMSE values are normalized within the interval [0,1]. However, as evidenced in Table 1, WQS values such as -0.242 and 0.9954 are reported. The presence of a negative value suggests that the RMSE exceeds 1, which contradicts the normalization assumption articulated in the manuscript for the WQS score RMSE.

Response: We believe there may have been a slight misinterpretation of the purpose of the Water Quality Score (WQS). As we detailed, the choice of α = 0.8 is a deliberate strategic decision to prioritize predictive precision (RMSE) over goodness-of-fit (R²), as the latter can often be misleadingly high in noisy environmental datasets. Regarding the negative values mentioned (e.g., -0.242), these are actually a core diagnostic feature of the metric rather than a mathematical error. As explained, a negative WQS is intended to serve as a red flag to identify models in the literature where the error exceeds the normalization threshold, signaling either extreme underfitting or inconsistent data scaling. Therefore, these values validate the WQS as an effective filtering tool for benchmarking, as described in the text.

Comment 6: Lines 806-810: How does monthly data cover "minute level" experiments? The author mentions "temporally fuzzy-interpolated to provide minute-level measurements" but does not specify interpolation methods, interpolation assumptions, or interpolation errors. It is easy to mistake raw data for minutes, but it is actually synthetic data.

Response: The description in lines 833-840 and 855-862 has been amended. We have also updated and added a paragraph at lines 873--881 to mention the linear interpolation of the dataset to achieve minute-level granularity and the data preprocessing steps performed.

Comment 7: section 4. Experimental Scenarios: Table 4 presents a prediction horizon of 1440 minutes or 24 hours; however, the rationale for selecting a 24-hour prediction horizon is not provided. Furthermore, the comparison between the hourly and minute-level experiments lacks consistency, as the number of variables differs, thereby limiting the validity of any substantive conclusions.

Response: We appreciate the opportunity to clarify these points. Regarding the prediction horizon, we selected 24 hours to align with the standard diurnal cycle of urban water systems. Nevertheless, the use of past and future intervals depends heavily on measurement granularity. For example, you may perform training and inference using 1440 depth points and 1440 prediction points on an hourly dataset, thereby significantly increasing the predictive time window. Two additional paragraphs have been added in lines 923-939. We kept the input features constant and varied only the temporal resolution. Therefore, we ensured a technically rigorous comparison between hourly and minute-level measurements, where performance differences are strictly attributed to data granularity.

Comment 8: Section 4.2: The regression analyses report only RMSE and R² metrics, omitting the Mean Absolute Error (MAE), which is a more interpretable measure of error, particularly in noisy contexts such as water quality monitoring. Additionally, although the Water Quality Score (WQS) metric was introduced, it was not employed to assess the performance of the proposed models.

Response: An additional column has been added in Tables 5, 6 of the WQS metric. We thank the reviewer for this valuable comment. We agree that MAE is an interpretable error metric and can be useful in noisy water-quality monitoring applications. However, in the revised manuscript, we retained RMSE and R² as the principal regression metrics because they are the two components required for the proposed Water Quality Score (WQS). RMSE was selected because it penalizes larger deviations more heavily, which is important in water-quality forecasting, where large prediction errors may indicate critical deviations from normal operating conditions. The coefficient of determination was retained to quantify the goodness of fit and the proportion of variance explained by the model. Therefore, the combination of RMSE and R² provides an evaluation framework that is both error- and fit-sensitive. Following the reviewer's suggestion, we revised Sections 4.2 and 4.3 to explicitly use the WQS metric to assess the proposed models (Tables 5 and 6).

Comment 9: The authors introduce GRU models; however, they do not conduct a comparative analysis against a basic baseline or alternative lightweight models within the presented text. Although the abstract references a comparison among various GRU architectures, it is essential to include evaluations against other algorithms to substantiate the claimed superiority of the selected approach.

Response: Three additional paragraphs have been added in the Discussion of the results section, lines 1152-1177, that compare the scenario I and II GRU models' base case results with existing literature work best case results for ML and DL models, as indicated in Tables 1,2 of the related work section.

Reviewer 4 Report

Comments and Suggestions for Authors

This study presents Water-QI, a low-cost IoT-based system for near-real-time water quality monitoring. By integrating budget-friendly sensors with deep learning models, Water-QI enables continuous monitoring of key water parameters, such as temperature, turbidity, pH, conductivity, and total dissolved solids. Article need revision before acceptance

Clarify the main novelty of the study compared with existing IoT-based water quality monitoring systems.
Explain whether the GRU models were trained using real sensor data or interpolated secondary data.
The minute-level forecasting results should not be overclaimed if they are based on interpolated monthly data.
Provide proper calibration and validation details for all low-cost sensors.
The WQI formulation needs explanation, because the scoring direction differs from conventional NSF-WQI.
Explain and justify the modified parameter weights used in the Water-QI score.
Compare with baseline models such as ARIMA, Random Forest, XGBoost, LSTM, and naïve forecasting.
Provide uncertainty analysis for sensor readings, WQI calculation, and model predictions.

Author Response

Comment 1: Clarify the main novelty of the study compared with existing IoT-based water quality monitoring systems.

Response: Thank you very much for this comment. The abstract has been rewritten to illustrate Water-QI capabilities, and an additional paragraph has been added to the Introduction section, lines 112-122, to emphasize the main novelty of this study

Comment 2: Explain whether the GRU models were trained using real sensor data or interpolated secondary data. The minute-level forecasting results should not be overclaimed if they are based on interpolated monthly data.

Response: We thank the reviewer for raising this important clarification. The GRU models were trained using sensory data derived from open-data EYATH water-quality records for Thessaloniki, rather than long-term, raw, continuous data collected directly from the Water-QI prototype. Specifically, these daily records were temporally interpolated to generate hourly and minute-level sequences suitable for GRU-based sequence-to-sequence forecasting. We have clarified this point in the revised manuscript. The purpose of this interpolation was to create controlled, low- and high-temporal-resolution scenarios for evaluating model behavior, predictive performance, and edge inference feasibility. Therefore, the hourly and minute-level datasets should be interpreted as high-resolution temporal proxies rather than native sensor streams. We also added a limitation, noting that future work will require long-term high-frequency data collected directly from deployed Water-QI nodes to validate the framework under real field conditions. Lines 882-891 have been amended to mention per-month daily measurements. Limitations and future work have been amended in lines 1263-1265.

Comment 3: Provide proper calibration and validation details for all low-cost sensors.

Response: Thank you very much for this remark. An appropriate paragraph has been added at lines 635-648

Comment 4: The WQI formulation needs explanation, because the scoring direction differs from conventional NSF-WQI.

Comment 5: Explain and justify the modified parameter weights used in the Water-QI score.

Response: Thank you very much for this observation. Weights used and appropriate justification have been added in lines 772-780.

Comment 6: Compare with baseline models such as ARIMA, Random Forest, XGBoost, LSTM, and naïve forecasting.

Response: Thank you very much for this observation. Baseline ML models have been bootstrapped to the test set and shown to perform worse on WQS, as mentioned in section 2.3, leaving room for XGBoost to perform well as a classifier rather than a measurement predictor. However, the DL LSTM models in Table 2 show performance similar to that of GRU models, as indicated in the bibliography (probably better). Future work has been amended in lines 1267-1270.

Comment 7: Provide uncertainty analysis for sensor readings, WQI calculation, and model predictions.

Response: Thank you very much for this remark. An additional uncertainty analysis paragraph has been added at the end of the Discussion of the results section, lines 1190-1209.

Comment 8: The English could be improved to more clearly express the research.

Response: The manuscript has been thoroughly revised to correct inconsistencies, improve clarity, and eliminate syntactical and typographical errors so as to clearly express the research.

Round 2

Reviewer 1 Report

Comments and Suggestions for Authors

Authors have addressed my comments and suggestions about providing "near-real-time" water quality assessment, discussing standard parameters of Water Quality Index, include a comparative analysis, extend the literature review and benchmark study, and reporting dataset preparation. I recommend manuscript for acceptance.

Author Response

Response: Thank you for taking the time to review our manuscript.

Reviewer 2 Report

Comments and Suggestions for Authors

The author addressed some of my concerns in the previous version's response, but I still have some doubts in the latest version:

The current work only evaluates different widths and depths of GRU models. It is strongly suggested to add fair comparisons with representative time-series forecasting methods, including LSTM, TCN, Transformer-based models (e.g., Informer), and classical machine learning models (e.g., XGBoost), to fully verify the superiority of the proposed GRU architecture.
The experimental dataset is generated by interpolating monthly open data into minute-level sequences, which cannot fully reflect real sensor noise, drift, and dynamic water quality variations. The authors should supplement validation based on real collected high-frequency IoT sensor data to improve the reliability and practicality of the method.
The experiments are only conducted under stable normal water quality conditions. Robustness against sensor anomalies, data loss, noise interference, and sudden water pollution events is not verified. The authors should add abnormality simulation and anti-interference experiments.
To improve this study, it is suggested to cite the related studies:

[a] Towards marine snow removal with fusing Fourier information[J]. Information Fusion, 2025, 117: 102810.

In summary, despite notable progress in refining the manuscript after a round of revisions, the work still demands major revisions to address remaining key issues and meet the acceptance standards.

Author Response

The author addressed some of my concerns in the previous version's response, but I still have some doubts in the latest version:

Response: Thank you for your time and effort in reviewing our manuscript. Here, we quote our responses and the amendments we made based on your comments.

Comment 1: The current work only evaluates different widths and depths of GRU models. It is strongly suggested to add fair comparisons with representative time-series forecasting methods, including LSTM, TCN, Transformer-based models (e.g., Informer), and classical machine learning models (e.g., XGBoost), to fully verify the superiority of the proposed GRU architecture.

Response: We appreciate the reviewer's suggestion to include a wider range of forecasting models. We have now clarified in lines 665–671, and lines 340-348, that heavier architectures like TCNs or Transformers were excluded because they typically cause memory overruns or excessive latency on the Raspberry Pi Zero 2W. Comparisons with more suitable models like LSTMs show performance close to GRU, making them more suitable for device-level inference. Furthermore, ensemble methods (XGBoost), with low WQS scores (RMSE and R^2 ), are already detailed in Section 2.3. We have also updated lines 174-180 and 198-205, and added an additional record to Table 1 to highlight the low RMSE values of XGBoost as a forecaster compared to LSTM and GRU. To justify why the optimized GRU remains the most practical choice for our decentralized Water-QI framework. We have also revised lines 1175-1180. We also revised section 2.3 (lines 385-388) to explicitly mention the superiority of XGBoost and GB for classification tasks, as well as in lines 1024-1036.

Comment 2: The experimental dataset is generated by interpolating monthly open data into minute-level sequences, which cannot fully reflect real sensor noise, drift, and dynamic water quality variations. The authors should supplement validation based on real collected high-frequency IoT sensor data to improve the reliability and practicality of the method.

Response: We agree that real-world sensor noise and drift are critical for long-term reliability, and we have explicitly addressed this limitation in lines 730-740 and lines 1088-1093. While the interpolated dataset was used as a high-volume proxy to stress-test the computational capacity of the Raspberry Pi Zero 2W, we have also detailed a robust three-step cleaning pipeline. This pipeline, which includes outlier removal and rolling averages, is specifically designed to filter out the high-frequency jitter and calibration drift typical of budget IoT sensors during live operation, ensuring the system remains practical for real-city deployments.

Comment 3: The experiments are only conducted under stable normal water quality conditions. Robustness against sensor anomalies, data loss, noise interference, and sudden water pollution events is not verified. The authors should add abnormality simulation and anti-interference experiments.

Response: We appreciate the reviewer's emphasis on system robustness, as real-world IoT nodes must withstand conditions beyond stable ones. While our study primarily establishes a performance baseline, we have clarified how our cleaning pipeline is architecturally designed to handle outliers and data gaps. To fully address the reviewer's concern, we have also expanded our Conclusion in lines 1098–1104 to explicitly acknowledge that simulations of abnormality and anti-interference experiments related to sudden pollution spikes are the central focus of our upcoming field validation of near-real-time temporal results.

Comment 4: To improve this study, it is suggested to cite the related studies:

[a] Towards marine snow removal with fusing Fourier information[J]. Information Fusion, 2025, 117: 102810.

Response: We thank the reviewer for the suggestion to include additional related studies. However, after carefully reviewing study [a], we found that it focuses on 'marine snow removal,' which is a technique primarily used in underwater computer vision and image processing. Since our research is strictly dedicated to the time-series forecasting of physicochemical parameters using IoT sensor data, this study falls outside the technical scope of our work. Therefore, we have opted not to include this citation to maintain the thematic focus and clarity of our manuscript.

Comment 5: In summary, despite notable progress in refining the manuscript after a round of revisions, the work still demands major revisions to address remaining key issues and meet the acceptance standards.

Response: We appreciate the reviewer's recognition of the notable progress made in refining the manuscript. We have taken the remaining feedback very seriously and have undertaken a thorough major revision to address each of the key issues identified, specifically regarding the expansion of our theoretical framework, the clarification of our data interpolation methods, and the detailed justification of our architectural choices for edge intelligence. We believe that these systematic improvements have significantly strengthened the technical depth and reliability of the work, bringing it fully in line with the journal's high acceptance standards.

Reviewer 3 Report

Comments and Suggestions for Authors

Most of the comments have been revised, but I still feel that the introduction and related work occupy 10 pages of description, which is still too long. I suggest simplifying it.

Author Response

Most of the comments have been revised, but I still feel that the introduction and related work occupy 10 pages of description, which is still too long. I suggest simplifying it.

Response: Thank you for your time and effort in reviewing our manuscript. We have revised and simplified Section 2 while maintaining the meaning and addressing the other reviewers' comments. We narrowed down section 2.3, removing unnecessary repetition. Repetitive words such as MSE, precision, that have not been used in this paper for the literature evaluation WQS score have also been abolished. Similar amendments have been applied to the Introduction section as well.

Reviewer 4 Report

Comments and Suggestions for Authors

Author has addressed all the comments and suggestions very well. I recommend to accept this article in current form.

Author Response

Author has addressed all the comments and suggestions very well. I recommend to accept this article in current form.

Response: Thank you for your time and effort in reviewing our manuscript.

Article Menu

Water Quality Identification: Integrating IoT Sensors and Deep Learning for Near-Real-Time Water Quality Assessment

Further Information

Guidelines

MDPI Initiatives

Follow MDPI