Next Article in Journal
Assessment of Climatic Impact on Vegetation Spring Phenology in Northern China
Previous Article in Journal
Total Solar Irradiance and Stroke Mortality by Neural Networks Modelling
 
 
Article
Peer-Review Record

Simulation of the Spatiotemporal Distribution of PM2.5 Concentration Based on GTWR-XGBoost Two-Stage Model: A Case Study of Chengdu Chongqing Economic Circle

Atmosphere 2023, 14(1), 115; https://doi.org/10.3390/atmos14010115
by Minghao Liu 1,*, Xiaolin Luo 1, Liai Qi 1, Xiangli Liao 1 and Chun Chen 2
Reviewer 1: Anonymous
Reviewer 2: Anonymous
Atmosphere 2023, 14(1), 115; https://doi.org/10.3390/atmos14010115
Submission received: 14 October 2022 / Revised: 13 November 2022 / Accepted: 27 December 2022 / Published: 5 January 2023
(This article belongs to the Topic Advanced Research in Precipitation Measurements)

Round 1

Reviewer 1 Report

The authors propose a GTWR-XGBoost two-stage hybrid model to simulate the spatial and temporal distribution of PM2.5 concentrations.In order for the reader to better understand this paper, I suggest the following changes.

1.The contribution and novelty of the paper should be described in the introduction.

2.Machine learning and deep learning prediction have been more comprehensively applied to PM2.5 concentration prediction and inversion. The authors' review of current research on PM2.5 concentration prediction is not comprehensive enough in the introduction section, for example, citing research on PM2.5 concentration prediction.See for example “PM2.5 volatility prediction by XGBoost-MLP based on GARCH models”and “Prediction of Air Pollutant Concentration Based on One-Dimensional Multi-Scale CNN-LSTM Considering Spatial-Temporal Characteristics”.

3.Section 2.2 proposes the addition of MAPE indicators.

4.Combine sections 1.3.3, 1.3.4 and 1.3.5

5.Write out the detailed parameters of this paper and the comparative model as well as the test environment.

6.Figures 4 and 5 are deleted and this section is already described in sections 7 and 8.

Some other minor issues are as follows:

1.Please standardise the format of PM2.5,If it's PM2.5 it's all PM2.5 or PM2.5.

2.Formulae to the right, not beyond the text.

3.Line 266 is followed by a full stop, not a colon.

4.Line 310 is either m-s-1 or m/s, please be consistent throughout.

5.352 lines if it should be RMSE/ug-m-3 . All subsequent ones are the same.

6.The compass in Figure 7 is not fully displayed

Please check carefully for minor errors.

Author Response

1)Line 114-115;

2)Line 100-104;

3)After adding MAPE in this paper, it is roughly 40%, which is not effective. There are two possibilities: one is because the time scale is different, the monthly scale used in this paper; the other is because the data set is not standardized, resulting in poor effect of MAPE.

4)Line 254;1.3.3. Two-stage model construction and Experimental scheme design

5)Line 293-296;In the second stage, the maximum depth of parameters used by XGBoost is 4, the maximum number of iterations is 301, and the iteration step is 0.25. Software: ArcGIS, pycharm, python, numpy, pandas, sklearn; Hardware: i5 processor, 16GB RAM, 500GB SSD.

6)and other issues were have been modified.

Author Response File: Author Response.docx

Reviewer 2 Report

In general, the paper is in the scope of the journal. However, the language, editorial, grammar, and other errors make it hard to read. I also encourage authors to take a look at modern papers about GWR, meteo, and terrain factors related to air pollution (see recommendation below). I also do not see the full research support for the conclusions made in the paper. The author used quite a standard ML approach, but this is ok. I would expect some more sophisticated analysis of features etc. The language is not specific in many places in the paper. 

PLENTY OF EDITORIAL mistakes. It is hard to read the paper in some places.

 

 

Line 51 (and others in methods) - authors did not mention that between rare, reference ground stations, there are also dense ground stations of low-cost sensors. They have higher accuracy compared to AOD, and higher spatial coverage compared to the reference station. Please discuss this third option and its use in the context of GWR  in comparison to Elzbieta Weglinska et. all (2022) paper in Scientific Reports about air pollution geostatistical study for meteo and terrain factor analysis using GWR. 

181-182 - please explain how different resolutions were incorporated into the final results.

Line 191 - what are the other operations? Please clearly specify

How did you prepare data for GWR? Were there normalized before? Discuss this. 

 

Line 228 - what matching? Describe it clearly. 

Line 236 - what's correlation? You meant correlation coefficients I guess... Please be specific 

Line 326 - The CV description should be moved to the methods part. 

411 - mountains, which is not conducive to the diffusion not always! See the paper mentioned in the first remark. So please be specific about why is not conducive in this particular case. 

 

2.4 has one significant defect I think. Is it calculated for a whole period, yes? The feature importance should be calculated in windows like winter/summer. 

 

459-464 This is very a serious distinction of factors. This is what your research proved. 

Author Response

1)2)Modified according to comments of reviewer 1。

3)The study shows the importance and variability of the analyzed factors’ influence on air pollution inflow and outflow from the city.This document studies the impact of meteorological and topographic factors on PM2.5 through GWR, lacking consideration of temporal heterogeneity.

4)Line 187-190;PM2.5 ground monitoring stations were used as the center of human activity intensity data to make buffer zones with different radii (0.5km, 1km, 2km), and the buffer data with the greatest correlation with PM2.5 concentration was selected as the influencing factors.

5)Line 195;Next, the driver needs to be data fused.

6)Line 256-258;The variables suitable for GTWR model were selected by variable screening and variance inflation factor; The previous data can not be standardized.

7)Line 232-233;Natural environmental factors were extracted from the above processed Spatiotemporal data set.

8)Line 252,253;correlation coefficient;

9)Line 299-302;The model fitting is based on the same data set, which cannot reflect whether there is over-fitting phenomenon in the model. Therefore, the 10-fold cross validation (CV) method (Hu et al., 2021) was used to test the prediction accuracy of the model.

10)Line 411;Documented demonstration【Guo Xiaomei. Observation and simulation of air quality climate characteristics and its effect on large terrain in Sichuan Basin [D]. Nanjing University of Information Science and Technology,2016.】

11)see Figure 7;

Author Response File: Author Response.docx

Round 2

Reviewer 2 Report

The authors did not take seriously my comments.

 

1) 2) I don't see answers/modifications.

4) what's buffer data? I still can't find straightforward information about the final resolution selection. 

 

6) why?

 

Line 411 - This is good reference, but is not giving nothing new in relation to my comment. I wrote that not always and you should discuss other possibilities. Reference is for a case study which is not for all possibilities.

Author Response

1)2)There are a lot of changes, specifically in the document marked in red.

4) buffer data refer to,

PM2.5 ground monitoring stations were used as the center of human activity intensity data to make buffer zones with different radii (0.5km, 1km, 2km), and the buffer data with the greatest correlation coefficient with PM2.5 concentration was selected as the influencing factor. Finally, the proportion of human activity intensity data in the buffer zone with a radius of 1km, centered on the PM2.5 ground monitoring station, was selected to represent the impact on PM2.5 concentration. For example, NL calculates the average night light intensity in the buffer zone by partition statistics; WAY is the road length in the statistical buffer through spatial overlay analysis, which represents the road traffic condition (Liu et al., 2020). LU represents the land use status by the proportion of land use types in the statistical buffer zone through the histogram in ArcGIS.

6) see Line 217-221;

Not all natural environmental factor variables with significant correlation are suitable for the modeling prediction of the GTWR model in Stage 1. The variance inflation factor (VIF) is used to check the collinearity of variables in the model. The natural environment factor variables with VIF>4 are eliminated, and the variables with VIF ≤ 4 are input into the GTWR model (Wong et al., 2021; Xie et al., 2022). The previous data has not been standardized, because spatiotemporal geographical weighted regression can be used to build regression equation to predict PM2.5 concentration through raw data.

Line 411 - This is good reference, but is not giving nothing new in relation to my comment. I wrote that not always and you should discuss other possibilities. Reference is for a case study which is not for all possibilities.

see Line 397-402;

The main reason is that the eastern part of Sichuan is more developed in industry, automobile exhaust, human activities and other emissions of PM2.5 concentration is high. Secondly, Sichuan is a basin on the landform, Chengdu is a plain, and Chongqing is blocked by mountains(Guo Xiaomei, 2016), which is not conducive to the diffusion of PM2.5. Finally, in winter, Chengdu and Chongqing, the southeasterly wind combined with the terrain may present a phenomenon of PM2.5 gathering in the southeast region. 

Author Response File: Author Response.docx

Back to TopTop