Next Article in Journal
Decarbonizing the Transportation Sector: A Review on the Role of Electric Vehicles Towards the European Green Deal for the New Emission Standards
Previous Article in Journal
Efficacy of Acid-Treated HEPA Filters for Dual Sequestration of Nicotine and Particulate Matter
 
 
Article
Peer-Review Record

The Design and Deployment of a Self-Powered, LoRaWAN-Based IoT Environment Sensor Ensemble for Integrated Air Quality Sensing and Simulation

by Lakitha O. H. Wijeratne 1, Daniel Kiv 2, John Waczak 3, Prabuddha Dewage 3, Gokul Balagopal 4, Mazhar Iqbal 3, Adam Aker 3, Bharana Fernando 3, Matthew Lary 3, Vinu Sooriyaarachchi 3, Rittik Patra 5, Nora Desmond 6, Hannah Zabiepour 7, Darren Xi 8, Vardhan Agnihotri 9, Seth Lee 10, Chris Simmons 11 and David J. Lary 1,*
Reviewer 1: Anonymous
Reviewer 2:
Submission received: 10 December 2024 / Revised: 20 February 2025 / Accepted: 24 February 2025 / Published: 12 March 2025

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

Minor:

- Term simulation is mentioned in the title, but that is the only ocurence of the term. Please expand on this topic, where suitable in the revised manuscript.

- line 60: "such as Long Range Wide Area Network (LoRaWAN)-based sensors" rephrase this, to make it more accurate and general, since sensor network would help regardles on the wireless communication protocol used, e.g. GSM/LTE based, WiFi based

- line 80: "eschew low-power networks" How this effects the need for regulatory grade monitoring in the process of in-field calibration of low cost sensors?

- In Figure 1 add also a photo of a real deployed device in addition to the 3D CAD model.

- In Figure 2 caption the battery and solar panel, consider using "power sensor" instead of INA219, also "low-power timer" instead of "TPL5110"  in the figure to make it more readable, and in the subtext of the Figure define the "current life cycle" more precisely.

- line 98: "48 MHz" replace with "48 MHz maximum clock frequency"

- line 102: Comment on the urban range of communications, and also substantiate LoRaWAN urban range with additional references in LOS vs NLOS conditions.

- Titles in sections 2.3 and 2.4 should be expanded e.g. 2.3 Power sensor and 2.4 Low-power timer

- Title of section  2.5, please be consistent about naming different part of the device. In Figure 1 this is called Air module, and in this section ESU (external sensing unit).

- line 156: is this "solar radiation shield" a cylindrical Stevenson screen, it seems so from Figure 1.

- line 160: Since this is a particle counter, elaborate on the flow of sampled air. What is used to promote air flow, a fan, a pump or some other method? What is the specified range of the measurements? What is the uncertainty of the sensor?

- For Figure 3, adjust in a similar manner to Figure 2, to make it more easily readable.

- Lines 190 and 193, what is the most commonly used mode in your application and corresponding data rate?

- Lines 204-217: Expand on what happens in critical condition when the instrument is out of battery power, and there is no sufficient solar power. Does the unit check every 15 minutes to check is the power requirements are satisfied?

- line 235: "To date, our sensor network has collected roughly 6 TB of data, and we expect our storage needs to continue to grow as we deploy new sensors each month." Please expand on this, by adding time period for the collected 6 TB of data, how much data single sensor produces in a certain timeframe, and also if data is compressed in some manner (state the algorithm used), and when (in the sensor nodes or at the server).

- Section 6.1 is nicely written, but please expand it a bit for readers not fully familiar with NodeRed. Describe how one sample from one device goes trough the processing flow depicted in Figure 6.

- In Figure 7 fix spelling of PM2.5

- In Figure 8, an interesting visualization is presented for plotting traces of 6 different PM fractions (using log scale for PM mass concentration). More common approach is to state number concentration for ultrafine fraction. Furthermore, all traces look very similar, correlation coefficient between the traces would be of interest to the reader. 

- In Figure 10 provide a reference for MINTS sensor, since this is the only time it is mentioned in the manuscript.

Major:

- Lines 361-364 The authors notices a difference between R2 achieved in training and validation datasets, that is significant. This usually suggests overfitting. When applying ML algorithm to time series data this can happen easily, especially for the more complex models (the model described here with 11 inputs, seems too complex). Please expand on this entire section, and give more details about training/test split used (if only random split was done, repeat the analysis using sequential training/test split), amount of total data used for the mode, update procedure for the calibration model, how much additional data does this require and how is this additional data obtained, and also give R2 and RMSE results not only for complex ML models but also for more simple ones (linear etc.). Compare the observed RMSE to the dynamic range of PM2.5 during the relevant time period. It is known that multilinear models can perform better over longer time ranges compared to the more complex ML models, but it would be interesting to see additional analysis here. Also give more details on the final model, but only after expanding the performed analysis, since I have a strong suspicion that multi linear model will have similar performance to the more complex ML model, and less difference between training and test performance. 

Author Response

Thank you very much for your thoughtful and constructive comments. Below, I have provided detailed responses to each of your points and outlined the corresponding revisions made to the manuscript.

  • Term simulation is mentioned in the title, but that is the only ocurence of the term. Please expand on this topic, where suitable in the revised manuscript.

I have added a paragraph in Section 9, specifically between lines 458 and 464, to address the concerns raised.

  • line 60: "such as Long Range Wide Area Network (LoRaWAN)-based sensors" rephrase this, to make it more accurate and general, since sensor network would help regardles on the wireless communication protocol used, e.g. GSM/LTE based, WiFi based

I have revised line 60 (now line 66 in the updated manuscript) as requested.

  • line 80: "eschew low-power networks" How this effects the need for regulatory grade monitoring in the process of in-field calibration of low cost sensors?

Although low-cost sensors as individual units cannot replace regulatory systems, their integration with more accessible regulatory-grade monitoring units, as discussed in this article, can significantly enhance in-field calibration. This, in turn, ensures the collection of more reliable data from low-cost sensors in the field.

  • In Figure 1 add also a photo of a real deployed device in addition to the 3D CAD model.

A real device in operation has been added (Figure 9).

  • In Figure 2 caption the battery and solar panel, consider using "power sensor" instead of INA219, also "low-power timer" instead of "TPL5110"  in the figure to make it more readable, and in the subtext of the Figure define the "current life cycle" more precisely.

Figure 2 and its caption modified as requested. 

  • line 98: "48 MHz" replace with "48 MHz maximum clock frequency"

Requested change made (line 100).

  • line 102: Comment on the urban range of communications, and also substantiate LoRaWAN urban range with additional references in LOS vs NLOS conditions (lines 114- 120).

I have provided comments on the urban communication range and further substantiated the urban range of LoRaWAN by adding additional references, specifically addressing Line-of-Sight (LOS) versus Non-Line-of-Sight (NLOS) conditions (lines 114-120).

  • Titles in sections 2.3 and 2.4 should be expanded e.g. 2.3 Power sensor and 2.4 Low-power timer

Requested changes made

  • Title of section  2.5, please be consistent about naming different part of the device. In Figure 1 this is called Air module, and in this section ESU (external sensing unit).

Figure 1, along with the article, has been revised to address and improve consistency throughout.

  • line 156: is this "solar radiation shield" a cylindrical Stevenson screen, it seems so from Figure 1.

Yes, it is a more sophisticated Stevenson shield that protects the sensing ensemble from climatic factors while ensuring accurate readings by facilitating airflow around the sensing elements.

  • line 160: Since this is a particle counter, elaborate on the flow of sampled air. What is used to promote air flow, a fan, a pump or some other method? What is the specified range of the measurements? What is the uncertainty of the sensor?

Requested modifications made on section 2.5. A more detailed description on the optical particle counter is given.

  • For Figure 3, adjust in a similar manner to Figure 2, to make it more easily readable.

Figure 3 has also been revised to enhance readability.

  • Lines 190 and 193, what is the most commonly used mode in your application and corresponding data rate?

This is addressed in Section 3 (line 209) and in section 4 (line234 - 241). 

  • Lines 204-217: Expand on what happens in critical condition when the instrument is out of battery power, and there is no sufficient solar power. Does the unit check every 15 minutes to check is the power requirements are satisfied?

Low power and critical conditions are discussed in the last paragraph of Section 4 (lines 242-249).

  • line 235: "To date, our sensor network has collected roughly 6 TB of data, and we expect our storage needs to continue to grow as we deploy new sensors each month." Please expand on this, by adding time period for the collected 6 TB of data, how much data single sensor produces in a certain timeframe, and also if data is compressed in some manner (state the algorithm used), and when (in the sensor nodes or at the server).

Data storage requirements are further explained in the first paragraph of Section 6 (lines 268-274).

  • Section 6.1 is nicely written, but please expand it a bit for readers not fully familiar with NodeRed. Describe how one sample from one device goes trough the processing flow depicted in Figure 6.

Lines 297-306 describe an example of how a single data packet is processed in Node-RED.- In Figure 7 fix spelling of PM2.5

  • In Figure 8, an interesting visualization is presented for plotting traces of 6 different PM fractions (using log scale for PM mass concentration). More common approach is to state number concentration for ultrafine fraction. Furthermore, all traces look very similar, correlation coefficient between the traces would be of interest to the reader.

We aim to distinguish ourselves by presenting more detailed information than what is typically published to the public, which often includes only PM2.5 data. This limitation is usually due to the lack of available data rather than anything else. In our case, we provide data for 7 different PM fractions instead of just one. Additionally, we have displayed the latest readings for all PM fractions on a separate panel to increase visibility.
The PM1 fraction is always a subset of PM2.5, and PM2.5 is included within PM10, meaning a degree of correlation between these measurements is anticipated.

  •  In Figure 10 provide a reference for MINTS sensor, since this is the only time it is mentioned in the manuscript.

The reasoning behind naming the sensor "MINTS node" is explained in the modified article (Figure 11).

Major:

  • Lines 361-364 The authors notices a difference between R2 achieved in training and validation datasets, that is significant. This usually suggests overfitting. When applying ML algorithm to time series data this can happen easily, especially for the more complex models (the model described here with 11 inputs, seems too complex). Please expand on this entire section, and give more details about training/test split used (if only random split was done, repeat the analysis using sequential training/test split), amount of total data used for the mode, update procedure for the calibration model, how much additional data does this require and how is this additional data obtained, and also give R2 and RMSE results not only for complex ML models but also for more simple ones (linear etc.). Compare the observed RMSE to the dynamic range of PM2.5 during the relevant time period. It is known that multilinear models can perform better over longer time ranges compared to the more complex ML models, but it would be interesting to see additional analysis here. Also give more details on the final model, but only after expanding the performed analysis, since I have a strong suspicion that multi linear model will have similar performance to the more complex ML model, and less difference between training and test performance. 

A more comprehensive analysis has been added in Section 8, which includes the additional information requested on machine learning models, along with a baseline model using linear regression.

Reviewer 2 Report

Comments and Suggestions for Authors

This paper is very interesting since it describes architecture for PM measurement in real time with low cost sensors. It is suitable for publication in Air. However, some important information is missing.

They are mainly related to the final accuracy of the system in measuring the mass concentration of particulate matter. Scaling of particle number over several size intervals to mass concentration is not an easy process. As explained by Authors in Chapter 8, there are several variables affecting the conversion factors. Among these variables, pressure, temperature and relative humidity being the most important. The correction for relative humidity, as shown in figure 12 provides a first level correction. The machine learning correction provides a higher corrected mass concentration by a factor about 2. Some explanation should be given with the identification of the parameters affecting the response by a such large deviation.

In other words, the paper does not address properly the accuracy of the system. This is probably the most important information that a monitoring system should provide. Unfortunately, no comparison is reported with a reference method, so that the accuracy remains undefined. May be the Authors has some comparison data that should be included in the paper.

Other comments are:

-Line 30: Before diameter, add aerodynamic

-Line 32: Same

-Line 51: After this paragraph, include some comment about the exposition to PM in indoor environments

-Line 62: After LoRaWAN based sensors, add fully described in this paper

-Line 92: MCU is reported to include three components. However only two are considered

-Line 93,94: Specify the role of these components

-Line 96: Specify the vendor name, brand, address and website

-Line 113: As above

-Line 222: cite vendor name and info

-Line 320: Cite references by number

 

Author Response

Thank you very much for your thoughtful and constructive comments. Below, I have provided detailed responses to each of your points and outlined the corresponding revisions made to the manuscript.

This paper is very interesting since it describes architecture for PM measurement in real time with low cost sensors. It is suitable for publication in Air. However, some important information is missing.

They are mainly related to the final accuracy of the system in measuring the mass concentration of particulate matter. Scaling of particle number over several size intervals to mass concentration is not an easy process. As explained by Authors in Chapter 8, there are several variables affecting the conversion factors. Among these variables, pressure, temperature and relative humidity being the most important. The correction for relative humidity, as shown in figure 12 provides a first level correction. The machine learning correction provides a higher corrected mass concentration by a factor about 2. Some explanation should be given with the identification of the parameters affecting the response by a such large deviation. In other words, the paper does not address properly the accuracy of the system. This is probably the most important information that a monitoring system should provide.

Section 8 provides a more comprehensive analysis, including an examination of machine learning models alongside the use of linear models. While machine learning models are often considered 'black boxes', Figure 12c offers valuable insights into the most influential parameters. Furthermore, a competitive analysis of different models has been included to highlight the superior performance of the ultimately selected machine learning model.

Unfortunately, no comparison is reported with a reference method, so that the accuracy remains undefined. May be the Authors has some comparison data that should be included in the paper.

Typically, the gold standard for PM2.5 sensing is referred to as the Federal Reference Method (FRM). This method involves drawing air pockets of known volume through appropriately sized filters and then weighing the collected particulate matter to determine PM2.5 concentration. However, this approach can often be impractical and time-consuming. The next best approach involves using Federal Equivalent Method (FEM) sensors, which provide the closest possible readings to those obtained with FRM methods. In this paper, we have attempted to map the readings from low-cost sensing units deployed on LoRaWAN nodes to the readings of an FEM unit, specifically the BAM 1022, which is used as a reference in this study. The effectiveness of this mapping is depicted in Figures 12a and 12b, along with supporting data presented in Table 3.

Please suggest any further tasks that may help clarify your concerns.


The following comments are addressed on the modified article. 

Other comments are:

-Line 30: Before diameter, add aerodynamic

-Line 32: Same

-Line 51: After this paragraph, include some comment about the exposition to PM in indoor environments

-Line 62: After LoRaWAN based sensors, add fully described in this paper

-Line 92: MCU is reported to include three components. However only two are considered

-Line 93,94: Specify the role of these components

-Line 96: Specify the vendor name, brand, address and website

-Line 113: As above

-Line 222: cite vendor name and info

-Line 320: Cite references by number

Round 2

Reviewer 1 Report

Comments and Suggestions for Authors

Minor comments:

1. From previous round:

"- is this "solar radiation shield" a cylindrical Stevenson screen, it seems so from Figure 1.

Yes, it is a more sophisticated Stevenson shield that protects the sensing ensemble from climatic factors while ensuring accurate readings by facilitating airflow around the sensing elements."

Please make this change in text, not only in reply to reviewer.

2. Line 181 in revised manuscript:

Reconsile part of text at line 177 with 181. PM0.1 fraction is mass of particles below 0.1. OPCs have a certain cutoff size below which they cant detect partciles (authors claim this is 0.1 in line 177, although typical cutoff for OPCs is around 200-300nm), but then they wouldnt be able to measure PM0.1 fraction. Probably sensor manufacturers use somewhat different convention for PM fraction naming, but please check this and make needed changes.

3. Line 271: "our other systems" and "our sensor network" are mentioned. Please be more specific about what other systems/sensors feed into Analytical Toolbox other than the LoRaWAN nodes.

4. Line 416: Use a more common term than "PM abundance"

5. Major comment from previous round was partially adressed. But please answer the remaining questions:

In line 379 authors say that "The machine learning model is  continuously updated", please elaborate (in revised manuscript) the update procedure for the model(s).

Elaborate on the training/test split (amount of data, how was the split done, randomly or sequentially? For time series data random splits are not a good approach, so it was suggested to use sequential split). Authors only show training/validation results, what about independent test set?

6. Explain the difference between training/validation/test in this specific usecase.

7. In Table 3 add a column with the degrees of freedom for the models.

8. In Figure 13 signals are very different, comment on this, and also add a figure from the model training period, where training/validation split and performance would be visible.

Author Response

Thank you for your valuable feedback! Below is a summary of the changes made to address your comments.

  •  From previous round: "- is this "solar radiation shield" a cylindrical Stevenson screen, it seems so from Figure 1. Yes, it is a more sophisticated Stevenson shield that protects the sensing ensemble from climatic factors while ensuring accurate readings by facilitating airflow around the sensing elements." Please make this change in text, not only in reply to reviewer.

Thank you for highlighting this. I have added a paragraph about the solar radiation shield to the caption of Figure 1.

  • Line 181 in revised manuscript: Reconsile part of text at line 177 with 181. PM0.1 fraction is mass of particles below 0.1. OPCs have a certain cutoff size below which they cant detect partciles (authors claim this is 0.1 in line 177, although typical cutoff for OPCs is around 200-300nm), but then they wouldnt be able to measure PM0.1 fraction. Probably sensor manufacturers use somewhat different convention for PM fraction naming, but please check this and make needed changes.

Thank you for bringing this to my attention. Yes, particle counts for a particle diameter of 0.1 micrometers are estimated values, as indicated in the sensor datasheet. I have made the necessary revisions in Section 2.5 under the bullet point IPS7100.

  • Line 271: "our other systems" and "our sensor network" are mentioned. Please be more specific about what other systems/sensors feed into Analytical Toolbox other than the LoRaWAN nodes.

Thank you for bringing this to my attention. A more concise summary of our sensing systems can be found in Section 6, lines 273–283.

  • Line 416: Use a more common term than "PM abundance"

Thank you for highlighting this. Necessary changes are made in Section 8.1.

  • Major comment from previous round was partially addressed. But please answer the remaining questions: In line 379 authors say that "The machine learning model is  continuously updated", please elaborate (in revised manuscript) the update procedure for the model(s)

Thank you for highlighting this. The dataset, collected from December 2023 to July 2024, will continue to grow, aiding in the ongoing refinement of our model. As new data from Fort Worth's co-located systems (the BAM 1022 and our sensing system with a similar sensor package as the LoRaWAN Nodes) becomes available, we will retrain the model to enhance its ability to capture climate and particulate matter variability. This is highlighted in Section 8, lines 396–399 on the updated manuscript.

  • Elaborate on the training/test split (amount of data, how was the split done, randomly or sequentially? For time series data random splits are not a good approach, so it was suggested to use sequential split).

Thank you for your insightful comment. The data split was performed randomly rather than sequentially because our dataset does not yet encompass the full range of atmospheric and particulate matter concentrations. Given the significant seasonal climate variations in Texas, a sequential split would have resulted in a dataset that did not adequately represent the climate conditions during the testing months, potentially compromising the model’s effectiveness. Once a sufficient amount of data is collected, we will consider to adopt a sequential split method for our training procedures.

  • Authors only show training/validation results, what about independent test set? Explain the difference between training/validation/test in this specific use case.

Thank you for pointing out your concern, and I apologize for any confusion caused by the previous version. When I referred to "validation," I was specifically addressing the independent dataset that the models had not encountered during training. Typically, a three-way split (Training, Validation, and Independent Validation (Test)) is used with neural networks, where I can explicitly designate separate training and validation datasets during the training phase. However, since I am working with seven different machine learning models, I only divided the data into two datasets: training and independent validation (test). For the neural network, the training algorithm automatically splits the training dataset to create a separate validation set internally.

I have updated Section 8.1 to clarify this, and have revised Table 3, changing the column header from "Validation" to "Independent Validation" to make this distinction clearer. Similar updates have been made in Figure 12 and throughout the article.

  • In Table 3 add a column with the degrees of freedom for the models.

A column is added on the table 3 to have the degrees of freedom for each of the models.

  • In Figure 13 signals are very different, comment on this, and also add a figure from the model training period, where training/validation split and performance would be visible.

Thank you for your comment. To clarify the purpose of Figure 13, The training was conducted using data from our systems in Fort Worth, as they are conveniently located next to a reference-grade PM2.5 sensor. Figure 13 presents a time series plot from a LoRaWAN node, which includes the same two low-cost sensors as those used in the Fort Worth system. Since we have the same input variables from the LoRaWAN sensor, we can apply the model trained on the Fort Worth data to map the LoRaWAN inputs to the expected values of the reference-grade sensor. I have provided further elaboration on this in Section 8, lines 384–394.

However, since this datasets is sourced from different systems, conducting training and independent validation for the data from the LoRaWAN nodes is not feasible. The training data originates from the system in Fort Worth, not from the LoRaWAN node.

Also since as I pointed out before, our dataset collected at Fortworth does not yet encompass the full range of atmospheric and particulate matter concentrations, and also probably does not cover the full range of input range to conditions found at the location of the LoRaWAN node. This will improve when more training data is collected.

Reviewer 2 Report

Comments and Suggestions for Authors

This version of the paper is fine. Congratulation!

Author Response

I would like to sincerely thank you for taking the time to review my paper and for your valuable feedback. I truly appreciate your thoughtful comments and the constructive suggestions you provided. Your insights have significantly contributed to improving the quality of the manuscript.

Thank you again for your support and for accepting the paper. I look forward to your continued feedback.

Back to TopTop