Next Article in Journal
Comparison of Particle Sizers and Counters with Soot-like, Salt, and Silver Particles
Next Article in Special Issue
Co-Training Semi-Supervised Learning for Fine-Grained Air Quality Analysis
Previous Article in Journal
The Combined Impacts of ENSO and IOD on Global Seasonal Droughts
Previous Article in Special Issue
Predictive Analysis of In-Vehicle Air Quality Monitoring System Using Deep Learning Technique
 
 
Article
Peer-Review Record

A PM2.5 Concentration Prediction Model Based on CART–BLS

Atmosphere 2022, 13(10), 1674; https://doi.org/10.3390/atmos13101674
by Lin Wang 1,2,*, Yibing Wang 1,*, Jian Chen 1 and Xiuqiang Shen 3
Reviewer 1:
Reviewer 2:
Atmosphere 2022, 13(10), 1674; https://doi.org/10.3390/atmos13101674
Submission received: 19 September 2022 / Revised: 11 October 2022 / Accepted: 11 October 2022 / Published: 13 October 2022
(This article belongs to the Special Issue Air Quality Prediction and Modeling)

Round 1

Reviewer 1 Report

General comments:

In general, the paper is not easy to follow. The ideas are not well connected so in some parts, it is difficult to discern what the authors want to point out. It is important that they improve their English skills, especially the connections between paragraphs and ideas.

 

The plots have to be greatly improved

 

The paper has a nice idea to improve the prediction capacity and precision of machine learning, but the paper is not well written, and the model is not well validated, they need to make great improvements to publish it

 

Major comments:

Line 224: I understand the imputation method they used, but why this method and no other, in addition, there isn’t a reference that makes a case for using it. There are many papers that have made efforts to try different imputation methods, in fact, the examples below have more than 3 methods of imputation. Moreover, if you change the imputation method, the results are the same?

 

The authors have to improve the use of references, the reference help to back up arguments and methods. I consider they need to make a big effort to improve this.

 

Section 3.1.1. This section is a little messy and the information is not completely well supported. Trend analysis needs more data, two years are not enough to argue that the seasonal variations happen. Why there is a seasonality in the data? the authors try to mention this but there is no discussion. The plots have to be improved, what are the units? if the authors want to put two different variables, they need to mention the units, and even put 2 different axes. I do not see how a figure between pm and temperature can prove that they are related, why are they related? Is there statistical significance?

This section has to be greatly improved, I put here two really nice and new papers from different parts of the world that make a really strong trend analysis, only to show that the arguments have to be better presented and supported. I know this is not the objective of the paper, but is better to exclude a section like this than include it without backing up information, statistical analysis, and references.

 

V. Singh, S. Singh, A. Biswal Exceedances and trends of particulate matter (PM2.5) in five indian megacities Sci. Total Environ., 750 (2021), p. 141461, 10.1016/j.scitotenv.2020.141461

 

Casallas A., Castillo-Camacho M.P., Guevara-Luna M.A., Gonzalez Y., Sánchez E., Belalcázar L.C., 2022. Spatio-temporal analysis of PM2.5 and policies in Northwestern South AmericaSci. Total Environ., 852, 158504https://doi.org/10.1016/j.scitotenv.2022.158504

 

I don’t agree with the conclusions of this section, they need to be better described, and produce more reliable arguments.

 

Section 3.2. ML techniques reduce their capability when multicollinearity is present, the authors used here 10 variables related to pm10, but they do not calculate any multicollinearity, which many other authors have found essential. I suggest that they make this statistical validation, to prevent a bad interpretation of the results.

 

a.     Ghahremanloo M, Lops Y, Choi Y, Jung J, Mousevinezhad S, Hammond D (2022). A comprehensive study of the COVID-19 impact on PM2.5 levels over the contiguous United States: A deep learning approach. Atmos Environ 118944. https://doi.org/10.1016/j.atmosenv.2022.118944 

b.     Ghahremanloo M, Lops Y, Choi Y, Yeganeh B (2021). Deep learning estimation of daily ground level NO2 concentrations from remote sensing data. J. Geophys. Res Atmos 126:e2021JD034925, https://doi.org/10.1029/2021JD034925

 

 

Section 3.4: The model is validated using only one statistical parameter, I strongly suggest the authors use more than that. The Celis et al reference (in the supplementary Material) has many parameters which are explained and already have their related equation, references therein also have several papers that have used and even put benchmarks to evaluate model performance (e.g., Emery et al. (2017) Recommendations on statistics and benchmarks to assess photochemical model performance. Journal of the Air & Waste Management Association 67:582-598. https://doi.org/10.1080/10962247.2016.1265027).

 

A validation with only one parameter is not enough to analyze model performance, for example, is the model able to follow the pm2.5 behavior? it is able to capture high pollution events? Does it produce false alarms?

 

Minor comments:

Line 12: BLS is defined after this line, at line 13

 

Line 18 and 19: it is not fully understandable with the information of the abstract, what is the global model, and the local one, one may understand it, but it is better if it is more explicit, especially for an abstract.

 

Line 21: The abstract is difficult to follow, and the sequence of the ideas has to be better written.

 

Introduction: There are many instances in the introduction that lack referencing. Additionally, I understand that the study is in China, but there are a lot of papers from other places in the world that are worth mentioning. There are examples in the US, Latin America (Colombia, Equator), and Europe (Germany), so a wider range of places and papers would improve the introduction a lot. Some papers (a and c) compare multiple machine learning techniques, as the authors did in this paper.

 

Some examples:

a.     C.J. Huang, P.H. Kuo A deep CNN-LSM model for particulate matter (PM2.5) forecasting in smart cities. Sensors, 1 (2018)

b.     A. Sayeed, Y. Lops, Y. Choi, J. Jung, A.K. SalmanBias correcting and extending the PM forecast by CMAQ up to 7 days using deep convolutional neural networks. Atmos. Environ., 253 (2021)

c.     N. Celis, A. Casallas, E. Lopez-Barrera, H. Martinez, C.A. Peña, R. Arenas, C. Ferro. Design of an early alert system for PM2.5 through a stochastic method and machine learning models. Environmental science and policy., 127 (2022)

d.    Yeo, I., Choi, Y., Lops, Y. et al. Efficient PM2.5 forecasting using geographical correlation based on integrated deep learning algorithms. Neural Comput & Applic 33

e.     Here is an example from Equator as a review: https://www.mdpi.com/2076-3417/8/12/2570

 

Line 39-40: Saying that the deterministic model has poor generalization without references, is a big statement, and it must be addressed with caution.

 

Line 88 to 91: Chen et al, are the authors of the grayscale prediction, “BP neural network, we found”, I don’t understand what is the word we doing in the text, the authors are not Chen. The authors need to make precise proofreading before sending the paper again, to avoid this type of mistake.

 

Line 150-151: subsets, respectively. the comma is missing

 

Line 152 and 153: The transition between ideas has to be improved, every new idea enters in a disruptive way not smoothly as it supposes to

 

Line 155: RVFKNN is not defined

 

Figure 2: the plot has to be improved, and the names of the stations and places have to be in English as they are in the text. Use GIS software or cartopy (they are using python, so this is a good tool) to improve the plot. See how other papers in the journal on in other journals make their maps.

 

Line 215: what is the temporal resolution of the data?

 

Line 217: I don’t think it is necessary to mention that the authors' used anaconda 3, python 3.9.7 is enough information.

 

Figure 7: This is very informative but it is very hard to read

 

Author Response

Please see the attachment.

Author Response File: Author Response.docx

Reviewer 2 Report

To predict the PM2.5 concentration, this paper proposes the CART-BLS model, which divides the training set into subsets using a CART decision tree and trains the BLS with its own training samples on each node of the division. Then, according to the prediction accuracy on each child node data sample, the validity of its model is judged. The evaluated experimental results demonstrates that the CART-BLS model has better prediction performance and can greatly reduce the complexity of the model. Generally speaking, this paper can be accepted after minor revision for it is organized clearly and intelligible except for several subtle drawbacks.

1. The paper is not well formatted in its references.

2. The creativity and advantages can be more detailed when discussing the contributions of this paper.

3. There are faulty within the sentences in Line 129 and Line 159.

Author Response

Please see the attachment.

Author Response File: Author Response.docx

Round 2

Reviewer 1 Report

Review:

A PM2.5 Concentration Prediction Model Based on CART-BLS

 

Major comments:

Trend of pollution factor

This I think is the major drawback of this paper. First, I don’t see any trend analysis, I send 2 papers that worked on something like this and they really made an effort to understand trends and their factors. I know this is not the objective of the paper, but if the authors include it as a section, the arguments need to be largely improved.

 

The plots are a lot better, but I do not trust the conclusions or hypotheses. What are the squares in the peaks or valleys of the plots? I suppose the idea is to show the correlation, but this must be stated in the caption.

 

The authors need to be a better job of trying to show their ideas. First, it could be a good idea if you put all the variables into one plot with three panels, with the same x-axis to facilitate the comparison between variables, especially because there are different xticks and labels, and also periods of time, it seems.  

 

Now I understand the hypothesis but the authors need to couple it with statistical results, for example calculating the Rho, R2 or maybe doing a plot of one variable as a function of the other. Again, I send some papers in the previous review with ideas on how to proceed, since I feel this analysis is really lacking. Now ok suppose there is a relation between the variables, it is only because of the changes in meteorology or there are more physical reasons? Are there any other ideas or hypotheses that could help explain this from a physical point of view?

 

Global and Local Model

I still think that the validation is not enough. The authors need at least one statistical parameter that accounts for the behavior of the PM, something like the Rho or correlation coefficient. It is also important that the authors at least the HIT and FAR categorical parameters test the ability of their model to find events or produce false events. Celis's paper and references therein have many different statistics that could give a complete idea of how the model performs. I also think that the authors have to make an effort to present the results in a more readable, and aesthetic way.

 

Minor comments:

 

1.     I still think that the authors can improve Figure 2. There are many examples in the literature to plot a nice map of the place with the stations as markers. In fact, in the references I put in the last report you can see many examples. Cartopy, from the met office is an amazing tool to do this. Nevertheless, I think this is only an opinion, which is not forced to be addressed.

2.     Phrases in lines 18 to 19 about the data have to be improved, it is hard to follow it

3.     Line 46: “For the PM2.5 prediction models that have been proposed, they can broadly be classified into two types: deterministic models and statistical models”, this can be improved, for example, For the PM2.5 prediction models can be broadly classified into two types: … As for example, there are many others in which an easy improvement can be made.

4.     I still think that links between Asian papers and from other places are important, see Huang and Kao and the paper of Celis et al, which uses fairly similar strategies to the ones used by reference 32, and no surprise in the three cases, although with very different characteristics the models work incredibly precisely.

5.     Line 168: “rapidity” can be changed for a more elegant word

6.     Line 167 to 169: That piece of text has to be improved

7.     Line 170 to 172: This paragraph needs to be improved, use more common and elegant words, and avoid using words as messy. In fact, the authors could even start the phrase: Here, we split…

8.     Line 190 has a grammatical mistake in the new version, is splits into is not correct, the “s” at the end or the ”is” can be removed

9.     Line 210: "Is shown in Figure 1", the word figure is missing

10.  I understand the response of the authors about the imputation method, and the method is completely fine, but I still think that the authors lack references in that section. They must have based their method in other papers, so please add them, I put many examples in the last report

11.  Line 339: the threshold is 10, that is fine, bud you need references! I am concerned that many ideas lack foundation, even though they are correct. Also, why VIF, and from where? Following who?

12.  Line 342: It would be better if the authors make a table with the values they put here, like that it would be easier to follow and understand

13.  In all the line plots please specify in the captions what are the red boxes

14.  Lines 460 to 469: I do not understand the idea behind this table. The table is also not very clear, maybe a color map would improve the readability.

15.  Conclusions: I think describing some of the reasons why points 1, 2, and 3 are true could improve the section.

Author Response

Please see the attachment.

Author Response File: Author Response.docx

Back to TopTop