Review Reports
- Onem Yildiz1 and
- Hilmi Saygin Sucuoglu2,*
Reviewer 1: Mazhar Iqbal Reviewer 2: Anonymous Reviewer 3: Tiago De Araújo Reviewer 4: Anonymous
Round 1
Reviewer 1 Report
Comments and Suggestions for AuthorsThe paper “Development of Real Time IoT-Based Air Quality Forecasting System Using Machine Learning Approach” presents timely and relevant research on IoT-based air quality forecasting. However, some aspects of the methodology are not sufficiently clear, the overall structure could be improved, and the Results and Discussion section is lacking. The following points should be addressed before the manuscript can be considered for publication.
- The reference station is mentioned here for the first time, but without any prior introduction. The use of the "EN 15267-certified TSM-17" station as the reference (for AQI labels) is only briefly mentioned and somewhat buried in the methods. I recommend that the authors provide a short introduction to the reference station earlier in the material and methods, and explicitly state where the data come from for calculating the target variable.
- The manuscript would benefit from a clear listing of all features used for training the ML models, along with the target label. I suggest that the authors include a table summarizing the complete feature set and the label (AQI category).
- Tables 3 and 5 in the Results section do not directly present new findings; instead, they provide background information. I recommend moving these tables to the Introduction section or to the Supplementary Materials.
- In the Results and Discussion section, regression models are evaluated, but the evaluation is incomplete without scatter plots of actual vs. predicted AQI values. I recommend including such plots for each model. Additionally, reporting the slope of the regression line in each case would demonstrate whether the models tend to systematically overestimate or underestimate AQI values.
- For the regression models, I recommend including time-series plots of predicted and actual AQI values. Such visualizations would assess how well the models capture temporal patterns and whether performance varies at different times of the day.
- For the evaluation of the classification models, I recommend including confusion matrices.
Minor Comments
- Line 64-65: Since you already mentioned 'particulate matter' as a primary source. It will be good if you mention what type of particulate matter belongs to secondary pollutants.
- Line 66: Provide a reference for the claim "Air pollution varies widely spatially and temporally."
Author Response
Comments 1: The reference station is mentioned here for the first time, but without any prior introduction. The use of the “EN 15267-certified TSM-17” station as the reference (for AQI labels) is only briefly mentioned and somewhat buried in the methods. I recommend that the authors provide a short introduction to the reference station earlier in the material and methods, and explicitly state where the data come from for calculating the target variable.
Response 1: We sincerely thank the reviewer for this valuable suggestion. We fully agree that the introduction of the reference station is essential for clarity, as it defines the ground truth used for generating AQI labels. In the revised manuscript, we have now explicitly described the EN 15267-certified TSM-17 reference station and clarified its role in deriving the target variable. This information has been added immediately after Figure 1 in Section 2.1, so that readers can clearly identify the certified data source before the detailed description of sensor deployment and data acquisition.
Comments 2: The manuscript would benefit from a clear listing of all features used for training the ML models, along with the target label. I suggest that the authors include a table summarizing the complete feature set and the label (AQI category).
Response 2: We thank the reviewer for this helpful suggestion. We agree that explicitly listing all input features and the target label would enhance the clarity of the manuscript. Accordingly, in the revised version we have added a new summary table (Table 2) in Section 2.5 (Training Strategy) that presents the complete feature set used for model training, along with the target label (AQI category). This ensures that readers can quickly identify the predictors and the output variable without having to parse the text in multiple subsections.
Comments 3: Tables 3 and 5 in the Results section do not directly present new findings; instead, they provide background information. I recommend moving these tables to the Introduction section or to the Supplementary Materials.
Response 3: We thank the reviewer for this valuable observation. In the revised manuscript, we have repositioned the background tables as recommended:
Table 3 (now Table 1) (AQI breakpoints) has been moved to the Introduction, where it provides context for how continuous AQI values are mapped into health categories.
Table 5 (now Table 2) (Sensor specifications) has been relocated. In Section 2.1, the reference has been updated to:
“The selected sensors are widely used in IoT-based air quality monitoring and operate within ranges sufficient to cover all relevant AQI brackets. Their nominal specifications are summarized in Table 2.”
Comments 4: In the Results and Discussion section, regression models are evaluated, but the evaluation is incomplete without scatter plots of actual vs. predicted AQI values. I recommend including such plots for each model. Additionally, reporting the slope of the regression line in each case would demonstrate whether the models tend to systematically overestimate or underestimate AQI values.
Response 4: We sincerely thank the reviewer for this constructive suggestion. In the revised manuscript, we have included scatter plots of actual vs. predicted AQI values for all four models at the 60-minute forecast horizon (new Figure 5, Section 3.1). Each plot now shows the fitted regression line (y = α + βx) together with the 1:1 reference. The slope (β) of the regression line is also reported in the figure panels.
The results indicate that the GRU (β = 1.01) and LSTM (β = 0.98) models remain close to unbiased predictions, while the ensemble methods (Stacking: β = 0.94; AdaBoost.R2: β = 0.92) tend to slightly underestimate AQI values. This visualization complements the statistical metrics reported in Table 5 and provides clearer evidence of model behavior, thereby strengthening the discussion of forecast performance.
Comments 5: For the regression models, I recommend including time-series plots of predicted and actual AQI values. Such visualizations would assess how well the models capture temporal patterns and whether performance varies at different times of the day.
Response 5: We thank the reviewer for this constructive suggestion. In the revised manuscript, we have added time-series plots of predicted and actual AQI values at the 60-minute forecast horizon (new Figure 6, Section 3.1). These plots cover a representative 7-day window from the test dataset, ensuring that diurnal cycles and short-term fluctuations are clearly illustrated.
The results confirm that the recurrent models (LSTM and GRU) accurately capture temporal patterns, closely following daily peaks and troughs such as morning and evening traffic-related pollution events. In contrast, the ensemble methods (Stacking and AdaBoost.R2) generally follow the overall trend but display reduced peak amplitudes and a slight phase lag during high-pollution periods.
This visualization complements the statistical metrics in Table 5 and the scatter plots in Figure 5, providing additional evidence that the GRU model offers the best balance of accuracy, robustness, and temporal fidelity for real-time AQI forecasting.
Comments 6: For the evaluation of the classification models, I recommend including confusion matrices.
Response 6: We thank the reviewer for this important suggestion. While the original manuscript included class-wise precision, recall, and F1-scores (Table 6, Section 3.2), we agree that confusion matrices provide a more intuitive visualization of classification performance.
Accordingly, we have added confusion matrices for the AQI category predictions of the best-performing GRU model (new Figure 7, Section 3.2). The matrices illustrate the distribution of predicted vs. actual AQI categories across the five health-based classes.
The results confirm the numerical findings in Table 6: the GRU model achieves high accuracy for the “Good” and “Moderate” categories, with precision above 0.90, while showing only minor confusion between adjacent classes such as “Unhealthy for Sensitive Groups” and “Unhealthy.” Importantly, false negatives in the “Very Unhealthy” and “Hazardous” categories remain below 2%, satisfying the safety criteria highlighted in the discussion.
Comments 7: Line 64-65: Since you already mentioned “particulate matter” as a primary source. It will be good if you mention what type of particulate matter belongs to secondary pollutants.
Response 7: We thank the reviewer for pointing out this clarification. In the revised manuscript, we have specified that secondary particulate matter typically includes sulfates, nitrates, and organic aerosols formed through atmospheric chemical reactions. Accordingly, the text in the Introduction (Lines 64–65) has been revised to:
“Primary pollutants are substances released directly from the source into the atmosphere, such as particulate matter (PM), carbon monoxide (CO), nitrogen dioxide (NO₂), sulphur dioxide (SO₂), and lead (Pb). In contrast, secondary pollutants are formed as a result of chemical reactions in the atmosphere and are usually detected downwind, away from the source. Examples include ozone (O₃) and secondary particulate matter such as sulfates, nitrates, and organic aerosols.”
Comments 8: Line 66: Provide a reference for the claim “Air pollution varies widely spatially and temporally.”
Response 8: We thank the reviewer for this comment. A supporting reference has been added to substantiate the statement. In the revised manuscript (Introduction, Line 66), the sentence now reads:
“Air pollution varies widely spatially and temporally, depending on pollutant type, emission source, and meteorological conditions [8].”
Author Response File:
Author Response.pdf
Reviewer 2 Report
Comments and Suggestions for AuthorsThis study addresses the development and validation of a low-cost IoT-based real-time air quality forecasting system, aiming to demonstrate its technical implementation and effectiveness. To enhance the academic rigor and completeness of the paper, the following points are recommended:
1.Definition of “Low-Cost IoT Sensors”
The criteria for what constitutes “low-cost IoT sensors” should be explicitly defined. A systematic explanation of the sensor configuration, performance, technical specifications, and cost would strengthen the validity of the research.
2.Clarification of Research Problem and Academic Contribution
The Materials and Methods section provides a detailed account of the system architecture, structural design, and machine learning methodology, reflecting a strong technical and engineering design focus. However, the articulation of the research problem’s significance and the academic contribution of this work is relatively limited. The Conclusions section should be expanded to discuss the scholarly value and implications of this study more explicitly.
3.Clarification of Sensor Saturation and Comparison with SDS011-Based Studies
In Section 3.3, the authors state that all sensors functioned without saturation even under high-pollution conditions, thereby overcoming the clipping issue reported in SDS011-based studies. A more detailed explanation of the experimental evidence and technical mechanisms underlying this result would enhance clarity for readers.
4.Further Emphasis on Methodological Advantages
The Results and Discussion section highlights the system’s advantages, such as real-time operability, low power consumption, and functional scalability. In addition, it would be beneficial to elaborate on the methodological strengths of the proposed approach—such as the rationale for ML model selection and optimization strategies, sensor calibration procedures, and data pre-processing techniques—which would further reinforce the academic contribution of the study.
Author Response
Comments 1: Definition of “Low-Cost IoT Sensors”
The criteria for what constitutes “low-cost IoT sensors” should be explicitly defined. A systematic explanation of the sensor configuration, performance, technical specifications, and cost would strengthen the validity of the research.
Response 1: We thank the reviewer for this valuable comment. The manuscript indeed employs the term “low-cost IoT sensors” frequently, and we agree that an explicit definition would improve clarity. As described in Section 2.1, the selected sensors (GP2Y10, MQ-7, ENS160, DHT11) are characterized as low-cost and cost-effective, with explicit references to their affordability and wide adoption in the literature (e.g., GP2Y10 described as “low cost and high accuracy”; MQ-7 as “low cost, compact design”; DHT11 as “preferred considering cost-effectiveness”). Their nominal ranges and operating specifications are further detailed in Table 2.
In the revised version, we clarify our use of “low-cost” by explicitly stating that it refers to commodity sensors typically with low price. This definition is supported by the literature (e.g., Karagulian et al. 2019; Concas et al. 2021), which categorize electrochemical, optical dust, and MOX sensors within this “low-cost” tier when compared against regulatory-grade analyzers.
Thus, our definition of “low-cost IoT sensors” in this study reflects:
- Affordability (per-unit cost under USD 30–50, at least two orders of magnitude cheaper than reference analyzers),
- Commodity availability (widely produced, off-the-shelf units such as MQ-series, Sharp GP2Y, DHT-series), and
- Adequate but limited performance, requiring calibration against certified stations to remain decision-grade.
We believe that this clarification, together with the existing specification details in Table 2, strengthens the validity of the research while highlighting the practical trade-offs of low-cost sensing.
We also added an explanation (Section 2.1) as follows:
“In this study, the term ‘low-cost IoT sensors’ refers to devices that are compact, commercially accessible, widely adopted in IoT-based environmental monitoring, and capable of providing reliable measurements after calibration despite their simplicity and low power consumption. These attributes distinguish them from reference-grade sensors, while making them practical for scalable real-time deployments.”
Comments 2: Clarification of Research Problem and Academic Contribution
The Materials and Methods section provides a detailed account of the system architecture, structural design, and machine learning methodology, reflecting a strong technical and engineering design focus. However, the articulation of the research problem’s significance and the academic contribution of this work is relatively limited. The Conclusions section should be expanded to discuss the scholarly value and implications of this study more explicitly.
Response 2: We thank the reviewer for this insightful comment. We agree that while the Materials and Methods section provides a strong technical description of the system architecture and methodology, the articulation of the broader research problem and the scholarly contribution of our work needed to be more explicit.
The manuscript already identifies the research gap in the Introduction (“no work unifies these strands into a single, self-adjusting node that delivers categorical AQI forecasts and is validated for a humid Mediterranean coastal setting”), thereby framing the significance of the study. In addition, Section 3.5 enumerates the operational strengths of the system, though with a more practical engineering emphasis.
In line with the reviewer’s suggestion, we have revised the Conclusions section to explicitly highlight the academic implications of this study. Beyond the engineering feasibility, we now emphasize that this work contributes to the literature by:
- bridging low-cost IoT sensing with advanced machine learning forecasting in a unified and reproducible framework,
- extending the geographic scope of air quality forecasting research through long-term real-world validation in a Mediterranean urban micro-climate, and providing a methodological template—combining calibration protocols, data pipelines, and lightweight deep learning—that future researchers can build upon in environmental informatics and smart city applications.
We added a new part to the Conclusion:
“Beyond its engineering feasibility, this study makes a scholarly contribution by bridging low-cost IoT sensing with advanced machine learning forecasting in a unified, reproducible framework. It extends the literature by demonstrating long-term, real-world validation in a Mediterranean urban micro-climate, an underrepresented context, and by providing a methodological template—combining calibration protocols, data pipelines, and lightweight deep learning—that can inform and guide future academic research in environmental informatics and smart city applications.”
We believe that these revisions strengthen the scholarly positioning of the manuscript without diminishing its practical relevance.
Comments 3: Clarification of Sensor Saturation and Comparison with SDS011-Based Studies
In Section 3.3, the authors state that all sensors functioned without saturation even under high-pollution conditions, thereby overcoming the clipping issue reported in SDS011-based studies. A more detailed explanation of the experimental evidence and technical mechanisms underlying this result would enhance clarity for readers.
Response 3: We thank the reviewer for this helpful observation. In the original manuscript, Section 3.3 already stated that no clipping was observed, in contrast to SDS011-based studies (e.g., Kurnia et al. [61]). To make the underlying evidence clearer, we have revised the text to explicitly link this finding to the sensor specifications and calibration procedures.
The revised Section 3.3 now reads as follows (revision highlighted in bold):
“The manufacturer ranges in Table 2 show that all transducers span at least two full AQI brackets beyond local regulatory limits. That margin, combined with the on-device two-point calibration (Section 2.4.3), prevents hard saturation during high-pollution spikes—an issue that hampered Kurnia et al. [61], who used an SDS011 optical counter capped at 500 µg/m3 and reported 23% clipping in ‘Very Unhealthy’ episodes. No clipping was observed in our trials, eliminating the need for synthetic data augmentation. This absence of clipping is consistent with the manufacturer-specified ranges listed in Table 2, which ensured that all sensors operated within their linear domain throughout the deployment.”
We believe this addition clarifies both the technical mechanism (broader dynamic ranges of the selected sensors) and the empirical evidence (no clipping observed in deployment), thereby addressing the reviewer’s request for more detail.
Comments 4: Further Emphasis on Methodological Advantages
The Results and Discussion section highlights the system’s advantages, such as real-time operability, low power consumption, and functional scalability. In addition, it would be beneficial to elaborate on the methodological strengths of the proposed approach—such as the rationale for ML model selection and optimization strategies, sensor calibration procedures, and data pre-processing techniques—which would further reinforce the academic contribution of the study.
Response 4: We appreciate the reviewer’s suggestion. To strengthen the emphasis on methodological advantages, we have expanded the Discussion to highlight the rationale for model selection, optimization strategies, calibration procedures, and data pre-processing techniques. The following sentence has been added at the end of Section 3.5.
“In addition to these operational benefits, the system’s methodological framework—spanning rigorous calibration, structured data pre-processing, and optimized model selection—provides a reproducible template that strengthens its academic contribution to the field of air quality forecasting.”
Author Response File:
Author Response.pdf
Reviewer 3 Report
Comments and Suggestions for AuthorsThis paper presents the development of a low-cost and real-time air quality monitoring system. The system is using a proof-of-concept hardware with sensors measuring particulate matter, carbon monoxide, carbon dioxide and volatile organic compounds. These data are gathered by a Raspberry Pi board and transmitted to the Cloud.
In general, the paper is well written and well structured, however I do have some suggestions to enhance its quality, as follows.
Overall Observations
Introduction: This section seems to be broad and does not directly address the scientific gap that you are intended to cover. Please highlight it.
Work limitations (lack of): The authors have diluted the work limitations throughout the paper. It would be appropriate to concentrate it in a dedicated session, as well as a deeper discussion on the limitations.
SPECIFIC COMMENTARIES
248-253: It is true that DHT22/AM2302 is more expensive than DHT11. However, it is far more accurate, resulting in better cost-effectiveness. What impact did a lower-accuracy sensor have on your model?
218-239: The MQ7 detection threshold is above the recommended averaged levels for human health, as stated in these directives, for example:
https://uk-air.defra.gov.uk/assets/documents/reports/cat13/northern_ireland/chap3a.html
I would not suggest using it if your goal is to detect lower gas concentrations. Moreover, the authors mentioned a calibration process of the MQ-7, but I have not found it. Please provide information on this point. These sensors are particularly challenging to adjust for lower concentrations. Finally, it is unclear whether carbon monoxide was included in the AQI score. Please specify.
Section 3: The authors should provide a glimpse on why the GRU reached a better performance (e.g. its capability on detecting temporal relationships in the long term?)
349 - 352: I do understand that the authors have excluded the ENS160 from the AQI score because it is acting like an "alarm" for specific situations. However, should not this alarm be considered in the score itself?
I look forward to reading your revised version.
Keep up the good job.
Best regards.
Author Response
Comments 1: Introduction: This section seems to be broad and does not directly address the scientific gap that you are intended to cover. Please highlight it.
Response 1: We thank the reviewer for this valuable suggestion. In the revised Introduction, we have explicitly emphasized the scientific gap addressed by this work. Specifically, while prior studies on AQI forecasting have primarily focused on long-term horizons or single-model frameworks, there has been limited attention to short-term, low-latency prediction using low-cost sensors combined with hybrid ensemble–deep learning models. To bridge this gap, our study demonstrates how real-time AQI can be predicted at short horizons (30–60 minutes) with regulatory-grade reliability through calibrated sensor data and hybrid machine learning. We have revised the concluding paragraph of the Introduction to clearly articulate this gap and position our contribution in addressing it.
Comments 2: Work limitations (lack of): The authors have diluted the work limitations throughout the paper. It would be appropriate to concentrate it in a dedicated session, as well as a deeper discussion on the limitations.
Response 2: We thank the reviewer for this constructive suggestion. In the revised manuscript, we have added a dedicated paragraph on study limitations within the Conclusion section, placed immediately before the final remarks. This new paragraph acknowledges the intra-site scope of validation, the limited number of pollutants and monitoring sites, and the short forecasting horizon as key constraints. It also highlights future directions, including multi-site and multi-regional validation, the inclusion of additional pollutants (e.g., NO2, O3), and the extension to longer-term forecasts. These additions provide a more transparent discussion of the study’s boundaries while constructively outlining opportunities for future research.
Comments 3: 248-253: It is true that DHT22/AM2302 is more expensive than DHT11. However, it is far more accurate, resulting in better cost-effectiveness. What impact did a lower-accuracy sensor have on your model?
Response 3: We acknowledge that the DHT22/AM2302 provides superior accuracy (±0.5 °C, ±2 %RH) compared to the DHT11 (±2 °C, ±5 %RH). In principle, this higher precision could improve the quality of meteorological inputs. However, our feature relevance analysis (Section 3.1) showed that pollutant concentrations were the dominant predictors of AQI categories: PM2.5 (ρ = 0.71), CO (ρ = 0.46), and TVOC (ρ = 0.38). Temperature and relative humidity contributed only weakly, mainly through interactions with hour-of-day and day-of-week flags. Consequently, the lower accuracy of the DHT11 sensor had only a limited impact on overall forecasting performance. This is consistent with the high R2 (>0.93) achieved in our experiments. Nonetheless, we agree that future revisions should incorporate higher-accuracy sensors such as DHT22 or AM2302 to further strengthen robustness, particularly in extreme meteorological conditions.
Comments 4: 218-239: The MQ7 detection threshold is above the recommended averaged levels for human health, as stated in these directives, for example:
https://uk-air.defra.gov.uk/assets/documents/reports/cat13/northern_ireland/chap3a.html
I would not suggest using it if your goal is to detect lower gas concentrations. Moreover, the authors mentioned a calibration process of the MQ-7, but I have not found it. Please provide information on this point. These sensors are particularly challenging to adjust for lower concentrations. Finally, it is unclear whether carbon monoxide was included in the AQI score. Please specify.
Response 4: We appreciate the reviewer’s detailed comments on the MQ-7 sensor and its calibration. We fully acknowledge that the MQ-7 has a relatively high detection threshold compared to the guideline values for human health. In the present work, our goal was not to resolve sub-threshold CO concentrations but to capture short-term variations at levels relevant to urban outdoor exposure. To ensure reliability, we applied a two-point calibration procedure against the EN 15267-certified TSM-17 reference station. The resulting calibration curves, coefficients, and pre/post plots are now provided in the Supplementary Material (Figures 5 and 6), which demonstrate that the MQ-7 output closely aligns with reference data after adjustment.
Comments 5: Section 3: The authors should provide a glimpse on why the GRU reached a better performance (e.g. its capability on detecting temporal relationships in the long term?)
Response 5: We thank the reviewer for this helpful remark. In the revised manuscript, we have added a short discussion in Section 3.1 to explain why the GRU achieved slightly superior performance compared to the other models. Specifically, GRUs are designed to capture temporal dependencies through gated mechanisms that effectively balance information retention and update, while using fewer parameters than LSTMs. This architecture makes GRUs less prone to overfitting on relatively small training sets and more efficient in learning short-to-medium horizon dependencies, which are critical for AQI fluctuations. These characteristics likely contributed to the GRU’s better accuracy and stability in our short-term forecasting task.
Comments 6: 349 - 352: I do understand that the authors have excluded the ENS160 from the AQI score because it is acting like an "alarm" for specific situations. However, should not this alarm be considered in the score itself?
Response 6: We thank the reviewer for this insightful comment. In the revised manuscript, we have clarified the role of the ENS160 sensor by explicitly stating in Section 2.1 that it was excluded from the AQI computation because its composite output does not correspond to pollutant-specific breakpoints defined by US-EPA. Instead, it was used as an auxiliary alarm channel to detect sudden changes in volatile compounds, thereby complementing the AQI framework without altering its regulatory definition. This addition makes the rationale for excluding ENS160 from the AQI score transparent.
Author Response File:
Author Response.docx
Reviewer 4 Report
Comments and Suggestions for AuthorsI appreciate the effort and practical relevance of this study, which integrates low-cost sensors with machine learning techniques for air quality forecasting. The manuscript presents a functional prototype with a clear IoT-cloud architecture, solid experimental results, and an applied perspective that could interest the community. However, the article requires substantial revisions before it can be considered for publication.
First, there is a fundamental methodological ambiguity between predicting AQI as a continuous variable (regression) and classifying into health categories. Some tables report error metrics in AQI units, while others use µg/m³, creating inconsistencies that must be addressed. I strongly recommend unifying the approach: either predict continuous AQI values with subsequent discretization, or train a categorical classifier, but metrics must remain consistent throughout the manuscript.
Second, the threshold tables (especially Tables 3 and 6) do not match internationally recognized standards (EPA, WHO). In particular, the proposed scheme for TVOC and eCO₂ is not part of any official AQI framework and should be explicitly justified as an auxiliary indicator rather than a replacement for the official index.
Another critical issue is sensor validation. Although in-situ calibration is mentioned, no calibration curves, slopes, or adjustment coefficients are provided. This significantly weakens credibility, especially given the use of low-cost sensors such as DHT11, GP2Y10, or ENS160. Comparative plots against reference stations, error analysis, and discussion of sensor drift over time should be included.
Similarly, the comparison with previous work (Table 7) mixes metrics and units that are not directly comparable. It should be clarified whether the evaluation is in terms of PM2.5 concentration or AQI units, and the table should be adjusted accordingly. Additional robustness evaluations, such as leave-one-site-out or seasonal testing, would also add significant value.
Finally, the manuscript should expand its discussion on the inherent limitations of low-cost sensors, the real-world applicability of the models in diverse contexts, and the need for periodic recalibration. Addressing these points would not diminish the contribution but rather strengthen its reliability.
This manuscript offers an interesting and potentially valuable contribution, but it requires major revision to align definitions, correct inconsistencies, and reinforce the empirical validity of the findings.
Comments on the Quality of English LanguageThe overall quality of English in the manuscript is adequate, yet there are areas where improvement would enhance clarity and precision. In particular, some inconsistencies appear in the use of units (e.g., “µg/m³ AQI”) and certain expressions lack precision, which could lead to confusion. Additionally, several sentences are overly long and would benefit from more concise phrasing. A thorough language review is recommended to unify technical terminology, improve the flow of transitions between sections, and ensure that the methodological differences between regression and classification are expressed with complete clarity. These improvements do not require a full rewrite but rather a careful editorial refinement that will strengthen the scientific presentation of the paper.
Author Response
Comments 1: There is a fundamental methodological ambiguity between predicting AQI as a continuous variable (regression) and classifying into health categories. Some tables report error metrics in AQI units, while others use µg/m³, creating inconsistencies that must be addressed. I strongly recommend unifying the approach: either predict continuous AQI values with subsequent discretization, or train a categorical classifier, but metrics must remain consistent throughout the manuscript.
Response 1: We thank the reviewer for highlighting this important issue. In the revised manuscript, we have clarified the methodology to eliminate ambiguity. Specifically, we now state (Section 2.5):
“In this study, AQI was first predicted as a continuous variable using regression models. These continuous outputs were then discretized into standard AQI categories (Good, Moderate, Unhealthy etc.) according to CPCB/US-EPA breakpoints. Classification performance metrics were therefore derived from discretized regression outputs rather than from a separately trained categorical model.”
In addition, the table captions have been revised for consistency:
- Table 2 (now Table 5) now explicitly reports regression metrics (MAE, RMSE, R2) in AQI index units only.
- Table 4 (now Table 1) now specifies that classification metrics are calculated from discretized regression outputs according to CPCB/US-EPA breakpoints.
Comments 2: The threshold tables (especially Tables 3 and 6) do not match internationally recognized standards (EPA, WHO). In particular, the proposed scheme for TVOC and eCO₂ is not part of any official AQI framework and should be explicitly justified as an auxiliary indicator rather than a replacement for the official index.
Response 2: We thank the reviewer for raising this point. We would like to clarify that in our study the official AQI computation is strictly based on the CPCB/US-EPA breakpoint scheme, as described in Section 2.1 and Equation (1). All regression and classification results reported in Tables 5–6 are derived exclusively from these official breakpoints.
Regarding Table 1, the PM2.5 thresholds (e.g., 0–9.0 µg/m³ for the “Good” category) reflect the updated US-EPA AQI breakpoints released in February 2024, which lowered the “Good” upper bound from 12.0 µg/m³ to 9.0 µg/m³ [EPA, 2024 PM NAAQS & AQI Fact Sheet (https://www.epa.gov/system/files/documents/2024-02/pm-naaqs-air-quality-index-fact-sheet.pdf)]. This update may explain the apparent discrepancy with older AQI tables.
As for Table 7, we emphasize—as already stated in Section 2.4.1—that TVOC and eCO₂ are not part of the AQI framework and were never included in AQI calculations. They are excluded from AQI calculations and serve solely as advisory flags. In addition, the GUI representation shown in Figure 4 is described in Section 2.3 as “a compact UBA-style labelling that corresponds one-to-one with the EPA breakpoints”, ensuring that the official AQI categories remain unchanged.
Comments 3: Another critical issue is sensor validation. Although in-situ calibration is mentioned, no calibration curves, slopes, or adjustment coefficients are provided. This significantly weakens credibility, especially given the use of low-cost sensors such as DHT11, GP2Y10, or ENS160. Comparative plots against reference stations, error analysis, and discussion of sensor drift over time should be included.
Response 3: We thank the reviewer for highlighting the importance of sensor validation. We would like to emphasize that the manuscript incorporates several layers of validation to address precisely these concerns.
- Calibration procedures: As described in Section 2.4.3, each sensor underwent in-situ calibration prior to deployment. For example, the MQ-7 was exposed outdoors for 24 h followed by a two-point calibration at 0 and 50 ppm CO; the GP2Y10 dust sensor was zero-checked with HEPA-filtered air and span-verified at 200 µg/m³; and the ENS160 calibration curves were cross-checked against a benchtop NDIR CO₂ reference. Calibration coefficients derived from these procedures were stored in non-volatile memory and appended to every transmitted payload.
- Validation against reference station: As stated in Section 2.4, all error statistics reported in Section 3 are based on synchronous comparisons with an EN 15267-certified TSM-17 reference station collocated at each site. Thus, the reported MAE, RMSE, and R2 values inherently reflect calibration effectiveness relative to an established standard.
- Long-term drift considerations: In Section 2.4.1 (the second paragraph), we also discuss sensor drift and stability, supported by multi-month field studies in literature. For instance, GP2Y10 exhibited only a reversible mean bias drift of +4 µg/m3 after cleaning, MQ-7 retained ≥95% sensitivity with <±5% drift after 180 h cycling, and MOX arrays such as ENS160 maintained <±2% error during 12-week trials. These findings demonstrate that, under the adopted maintenance schedule, the low-cost sensors remain decision-grade over extended deployments.
While we acknowledge that explicit calibration curves and comparative plots were not included in the submitted version, the manuscript describes the calibration methodology, reference-station validation, and long-term stability analysis. We believe the presented evidence sufficiently demonstrates the credibility of the sensor suite in this context.
Comments 4: The comparison with previous work (Table 7) mixes metrics and units that are not directly comparable. It should be clarified whether the evaluation is in terms of PM2.5 concentration or AQI units, and the table should be adjusted accordingly. Additional robustness evaluations, such as leave-one-site-out or seasonal testing, would also add significant value.
Response 4: We appreciate the reviewer’s observation regarding Table 7 (now Table 8). To clarify, the values reported in Table 8 are RMSEs in terms of PM2.5 concentration (µg/m3), as explicitly stated in Section 3.4. The purpose of this table is to provide a like-for-like comparison with previous works that also report errors in concentration units rather than AQI units. To avoid any confusion, we updated the Table 8 in terms of µg/m³ of PM2.5 concentration.
Regarding robustness evaluations, the manuscript includes seasonal testing, as described in Section 3.5, where the random-forest forecaster maintained MAE ≈ 5 AQI units consistently across winter, spring, summer, and autumn (ΔMAE < 1 unit), demonstrating year-round stability.
We acknowledge that leave-one-site-out validation was not included in the current version. The dataset was split chronologically within each site to prevent future-to-past leakage. This strategy across the four deployment sites (rooftop, roadside, residential, park) would indeed provide additional robustness evaluation. While this is beyond the current scope, we agree it represents a valuable direction for extended analysis and plan to include it in follow-up work. We added a new sentence to the conclusion part:
“Broader geographical testing and integration of additional environmental parameters, as well as cross-site validation strategies such as leave-one-site-out, will be valuable for future studies to further assess robustness across heterogeneous urban micro-environments.”
Comments 5: The manuscript should expand its discussion on the inherent limitations of low-cost sensors, the real-world applicability of the models in diverse contexts, and the need for periodic recalibration. Addressing these points would not diminish the contribution but rather strengthen its reliability.
Response 5: We appreciate the reviewer’s suggestion, and we agree that a more explicit discussion of sensor limitations, applicability, and recalibration would strengthen the manuscript. These aspects are in fact already acknowledged in several sections, but we will make them more prominent in the discussion.
- Low-cost sensor limitations: As noted in the Conclusions, the long-term stability and accuracy of low-cost sensors may be affected by drift or environmental factors. We expand this discussion to highlight that although sensors such as DHT11, GP2Y10, and MQ-7 offer cost and deployment advantages, their readings can deviate under extreme humidity, temperature, or pollution loads, and these limitations must be taken into account in large-scale deployments.
- Applicability in diverse contexts: In Section 3.5, we note that our models were trained in a temperate coastal micro-climate and that domain-adaptation studies are underway for arid and high-altitude regions. We will reinforce this point in the Discussion by underlining that model transferability depends on both pollutant profiles and meteorological regimes, and cross-regional validation is essential before scaling the system internationally.
- Need for recalibration: Section 2.4.3 details the calibration procedures, and Section 2.4.1 references quarterly recalibration requirements for MQ-7. We will explicitly restate this in the Discussion to emphasize that periodic recalibration is critical for sustaining accuracy in real-world deployments, especially when operating in variable environmental conditions.
We added new an explanatory sentence to the Conclusion part:
“Given the inherent limitations of low-cost sensors, such as sensitivity to humidity, temperature extremes, and drift, sustained real-world accuracy will depend on periodic recalibration and validation across diverse environmental contexts.”
We believe these clarifications do not diminish the contribution but rather enhance the credibility and real-world applicability of the proposed system.
4. Response to Comments on the Quality of English Language
Point 1: The overall quality of English in the manuscript is adequate, yet there are areas where improvement would enhance clarity and precision. In particular, some inconsistencies appear in the use of units (e.g., “µg/m³ AQI”) and certain expressions lack precision, which could lead to confusion. Additionally, several sentences are overly long and would benefit from more concise phrasing. A thorough language review is recommended to unify technical terminology, improve the flow of transitions between sections, and ensure that the methodological differences between regression and classification are expressed with complete clarity. These improvements do not require a full rewrite but rather a careful editorial refinement that will strengthen the scientific presentation of the paper.
Response 1: We thank the reviewer for this valuable feedback. In the revised manuscript, we have carefully addressed the language and clarity issues by standardizing unit usage (removing inconsistencies such as “µg/m³ AQI” and consistently referring to AQI index units), refining overly long or imprecise sentences for conciseness, and ensuring smooth transitions between sections. In particular, we clarified the methodological distinction between regression (continuous AQI prediction) and classification (discretized AQI categories) to avoid ambiguity. A thorough editorial review was conducted to unify technical terminology and strengthen the overall readability of the paper without altering its scientific content.
Author Response File:
Author Response.pdf
Round 2
Reviewer 1 Report
Comments and Suggestions for AuthorsI am happy with the revised version of the manuscript.
Author Response
Thank you for your valuable contributions to our manuscript.
Reviewer 3 Report
Comments and Suggestions for AuthorsThe authors have addressed most of the points raised during my review round, as well as dispelled the doubts I had.
However, I would suggest mentioning in the work limitations the intrinsic challenge of using the MQ7 sensor due to its threshold and its poor reproducibility: beyond its nominal threshold being above values that might be dangerous for human exposure, each MQ7 sensor has a unique response curve; two sensors might have totally different outputs for the same pollutant exposure. So, a simple sensor replacement might imply algorithm retraining.
Good job.
Author Response
Comments 1: I would suggest mentioning in the work limitations the intrinsic challenge of using the MQ7 sensor due to its threshold and its poor reproducibility: beyond its nominal threshold being above values that might be dangerous for human exposure, each MQ7 sensor has a unique response curve; two sensors might have totally different outputs for the same pollutant exposure. So, a simple sensor replacement might imply algorithm retraining.
Response 1: We thank the reviewer for pointing out this important limitation. We have revised the “Limitations” section to explicitly mention that the MQ-7 metal oxide sensor exhibits an elevated nominal threshold and poor reproducibility across units. As a consequence, two MQ-7 sensors may yield different outputs under the same CO exposure, implying that sensor replacement could necessitate re-calibration or even model retraining. This statement has been added to the conclusion/limitations paragraph (Section 4).
Author Response File:
Author Response.docx
Reviewer 4 Report
Comments and Suggestions for AuthorsThe manuscript offers a valuable contribution by integrating low-cost hardware, machine learning techniques, and an applied approach to air quality index (AQI) monitoring. The combination of device design, data acquisition and transmission pipeline, and predictive model validation provides methodological soundness and originality.
However, several aspects should be strengthened before publication:
-
Methodological clarity (regression vs. classification): The manuscript now specifies that AQI is first predicted as a continuous variable and later discretized into EPA categories. I recommend emphasizing this point also in the captions of tables and figures to avoid any possible ambiguity.
-
Consistency of standards and units: The normative reference should be unified strictly under US-EPA 2024, with CPCB mentioned only for comparative purposes. Units and notations should also be harmonized (use R² consistently instead of R2; CO₂ with subscript).
-
Sensor calibration: While calibration and validation against the reference station are described, the manuscript would be considerably stronger if calibration curves, numerical coefficients, and comparative pre/post plots were added, at least in supplementary material. This would provide direct and irrefutable evidence of the reliability of the low-cost sensors.
-
Model robustness: Results are convincing, but the discussion should explicitly highlight that validation was intra-site. Stating clearly that broader cross-site or multi-regional evaluations remain future work would reinforce transparency about the current scope.
-
Language and presentation: The manuscript is generally clear, though some long sentences could be simplified. A typographical error in the figures should be corrected (“LTSM” → “LSTM”). Additionally, including sample sizes and extra metrics such as MAD alongside MAE/RMSE in the figures would facilitate interpretation.
This is a technically sound and practically relevant manuscript. Addressing the above points will significantly strengthen its scientific credibility and academic value. I recommend major revision aimed at improving clarity, consistency, and quantitative evidence.
Comments on the Quality of English LanguageThe overall quality of English in the manuscript is acceptable and allows readers to follow the technical content. However, there are areas where refinement would significantly improve clarity and precision. In particular, the notation of variables and units should be unified (for example, consistently using R² instead of R2 and CO₂ with subscript). Expressions such as “µg/m³ AQI” should be avoided and replaced with “AQI units” for accuracy. In addition, some lengthy sentences could be shortened to enhance readability. Finally, a typographical error in the figures should be corrected (“LTSM” should read “LSTM”). These adjustments are editorial in nature and do not require rewriting the manuscript, but they will strengthen its scientific presentation and improve readability.
Author Response
Comments 1: Methodological clarity (regression vs. classification): The manuscript now specifies that AQI is first predicted as a continuous variable and later discretized into EPA categories. I recommend emphasizing this point also in the captions of tables and figures to avoid any possible ambiguity.
Response 1: We thank the reviewer for this helpful suggestion. In the revised manuscript, we have emphasized more clearly that the Air Quality Index (AQI) was initially predicted as a continuous variable through regression models and subsequently discretized into health-based categories using the CPCB/US-EPA breakpoints. To avoid ambiguity for readers who may focus primarily on tables and figures, we have explicitly added this clarification in the captions of the relevant tables (Tables 5 and 6) and figures (Figures 5–7).
Comments 2: Consistency of standards and units: The normative reference should be unified strictly under US-EPA 2024, with CPCB mentioned only for comparative purposes. Units and notations should also be harmonized (use R² consistently instead of R2; CO₂ with subscript).
Response 2: We appreciate the reviewer’s careful attention to the consistency of standards and units. In the revised manuscript, we have standardized the normative reference strictly under the US-EPA 2024 guidelines, while retaining CPCB only for comparative purposes when discussing breakpoint similarities. All references to AQI categories and sub-indices are now explicitly aligned with US-EPA standards. Furthermore, we have harmonized all units and notations throughout the text, tables, and figures:
- R2 is now used consistently instead of “R2”;
- All chemical notations employ subscript formatting (e.g., CO₂ instead of CO2).
Comments 3: Sensor calibration: While calibration and validation against the reference station are described, the manuscript would be considerably stronger if calibration curves, numerical coefficients, and comparative pre/post plots were added, at least in supplementary material. This would provide direct and irrefutable evidence of the reliability of the low-cost sensors.
Response 3: We fully agree with the reviewer that providing calibration curves and numerical coefficients would substantially strengthen the reliability evidence of the low-cost sensors. In the revised manuscript, we have added supplementary figures and tables presenting the calibration results for the MQ-7 ( ) and GP2Y10 (PM2.5) sensors, including pre- and post-calibration scatter plots against the EN 15267-certified TSM-17 reference station. The corresponding calibration coefficients (slope, intercept, R2) are also reported.
Comments 4: Model robustness: Results are convincing, but the discussion should explicitly highlight that validation was intra-site. Stating clearly that broader cross-site or multi-regional evaluations remain future work would reinforce transparency about the current scope.
Response 4: We thank the reviewer for this important observation. In the revised manuscript, we have made it explicit that the validation procedure was conducted on an intra-site basis, i.e., training, validation, and test splits were strictly chronological within each sensor–reference pair at the same location. We agree that cross-site and multi-regional validations represent critical next steps to establish broader generalizability. Accordingly, we have added a clarifying statement in Section 3.1 and reinforced it in the Conclusion, emphasizing that future work will extend the evaluation to multi-site deployments and diverse regional conditions.
At the end of Section 3.1:
“It should be noted that the validation presented here was performed intra-site, with training/validation/test splits confined to the same station. Broader cross-site or multi-regional evaluations remain as future work to confirm generalizability across heterogeneous environments.”
At the end of Section 4:
“While the present results confirm robust intra-site performance, future studies will extend validation across multiple sites and diverse regions to strengthen the transferability of the proposed approach.”
Comments 5: Language and presentation: The manuscript is generally clear, though some long sentences could be simplified. A typographical error in the figures should be corrected (“LTSM” → “LSTM”). Additionally, including sample sizes and extra metrics such as MAD alongside MAE/RMSE in the figures would facilitate interpretation.
Response 5: We thank the reviewer for these constructive suggestions. In the revised manuscript, we have undertaken actions such as Language editing and Typographical correction in order to ensure a clearer and more informative presentation of results.
4. Response to Comments on the Quality of English Language
Point 1: The overall quality of English in the manuscript is acceptable and allows readers to follow the technical content. However, there are areas where refinement would significantly improve clarity and precision. In particular, the notation of variables and units should be unified (for example, consistently using R² instead of R2 and CO₂ with subscript). Expressions such as “µg/m³ AQI” should be avoided and replaced with “AQI units” for accuracy. In addition, some lengthy sentences could be shortened to enhance readability. Finally, a typographical error in the figures should be corrected (“LTSM” should read “LSTM”). These adjustments are editorial in nature and do not require rewriting the manuscript, but they will strengthen its scientific presentation and improve readability.
Response 1: We sincerely thank the reviewer for the constructive remarks regarding language and presentation. In response, we have carefully revised the manuscript to unify all notations and units (e.g., consistently using R², CO₂ with subscript), corrected the typographical error in the figures (“LTSM” → “LSTM”), and simplified several overly long sentences to improve readability. These editorial adjustments do not alter the scientific content but significantly enhance clarity, consistency, and overall presentation quality.
Author Response File:
Author Response.docx
Round 3
Reviewer 4 Report
Comments and Suggestions for AuthorsThe manuscript provides a solid and well-structured contribution to air quality monitoring and forecasting through a low-cost IoT system that combines calibrated hardware, cloud transmission, and machine learning models. The inclusion of pre- and post-calibration curves against a certified reference station is commendable, as it strengthens the reliability of the chosen sensors. Equally noteworthy is the methodological clarity: AQI is first predicted as a continuous variable and then discretized into health-based categories according to US-EPA standards, which is now explicitly stated in the text and figure/table captions.
The results are convincing: the models accurately reproduce short-term AQI dynamics, and the reported metrics demonstrate significant improvements after calibration. The discussion of limitations, particularly the intra-site validation, is transparent and helps define a clear agenda for future work aiming at multi-site generalization.
A few refinements, however, would further enhance the robustness and practical value of the study:
-
Strengthen figure and table captions to be fully self-contained, explicitly stating the regression→discretization strategy.
-
Add complementary metrics, such as MAD or uncertainty intervals in the time-series results, to provide a more comprehensive view for readers.
-
Improve the Data Availability Statement by providing a link to an open repository with calibration data, modeling scripts, or trained models, which would increase reproducibility and scholarly impact.
Overall, this is a valuable contribution with high technical and scientific relevance. With the suggested refinements, the manuscript will reach an even higher level of clarity, rigor, and applicability.
Author Response
Comments 1: Strengthen figure and table captions to be fully self-contained, explicitly stating the regression→discretization strategy.
Response 1: We thank the reviewer for highlighting the importance of fully self-contained figure and table captions. We have revised the captions of Figures 7–9 and Tables 1,2, 5–7 to explicitly state the regression-to-discretization strategy and the use of US-EPA health-based categories. These changes ensure that each caption can be understood independently without referring back to the main text. For example, the revised caption for Figure 8 now specifies that regression-based AQI values were subsequently discretized into US-EPA categories, while Table 6 explicitly notes that classification metrics are derived from discretized regression outputs. We believe these clarifications improve readability and methodological transparency.
Comments 2: Add complementary metrics, such as MAD or uncertainty intervals in the time-series results, to provide a more comprehensive view for readers.
Response 2: We thank the reviewer for this helpful suggestion. In the revised manuscript, we have expanded Table 5 to include the complementary error metric MAD (Mean Absolute Deviation) alongside MAE and RMSE for both forecasting horizons (30 min and 60 min). This addition provides readers with a broader perspective on model performance variability.
Comments 3: Improve the Data Availability Statement by providing a link to an open repository with calibration data, modeling scripts, or trained models, which would increase reproducibility and scholarly impact.
Response 3: We thank the reviewer for the thoughtful suggestion to provide an open repository. We fully agree that such repositories are valuable for reproducibility and scholarly reuse. In our case, the datasets and scripts are part of an ongoing research program with planned future analyses and publications. For this reason, they cannot be publicly released at this stage. To maintain transparency, however, we have ensured that calibration curves, regression parameters, and hyper-parameter search spaces are presented in detail within the manuscript. These elements allow other researchers to reproduce the workflow and validate our findings. Once the related project phases are completed, we intend to prepare a de-identified dataset suitable for open release in a public repository.
Author Response File:
Author Response.docx