Spatiotemporal Graph Convolutional Attention Network for Air Quality Index Prediction of Beijing, Shanghai and Shenzhen
Round 1
Reviewer 1 Report
Comments and Suggestions for AuthorsThe manuscript introduces GLA-Net, a new deep learning model for the prediction of Air Quality Index (AQI) that leverages Graph Attention Networks (GAT), Long Short-Term Memory (LSTM), and temporal attention mechanisms. While the work demonstrates technical competence and achieves improved prediction accuracy, several concerns regarding novelty, methodology, and presentation warrant major revisions before publication.
Here are some major comments :
Lines 37-93 (Litzrature review gaps): by reading the introduction, we can feel the lacks comprehensive coverage of recent research focusing on the recent spatiotemporal air quality prediction methods. Notable omissions include:
-Recent transformer-based architectures for pollution prediction
-Hybrid physics-informed neural networks
-Multi-modal fusion approaches incorporating satellite imagery Lines 91-93 mention GCN limitations but fail to adequately review recent dynamic GCN variants that address these issues. A more thorough review would strengthen the motivation.
Lines 103-129 (Novelty and Contribution): The asserted contributions seem to be an advancement to existing work rather than original. The combination of GAT and LSTM is a common technique in air quality prediction research. In lines 103-105, the authors mention this as a major contribution but very similar setups have already been thoroughly researched. More than just the combination of different architectures, the authors should eventually expose what their methodology is really about. The temporal attention mechanism (lines 110-116) is a standard component in modern deep learning. What specific innovations does GLA-Net introduce beyond existing spatiotemporal models?
Lines 152-172 : The data used in this study is only one year; I think it is insufficient for robust model evaluation, it recommended to add a discussion of seasonal variation or extreme weather events
Lines 180-191: The definition of the graph through the adjacency matrix A is too simplistic. In lines 184-186, the authors discuss two factors based on the cluster membership, however, I have some question to the authors, how are clusters determined? K-means? Distance threshold?; why use binary edges rather than weighted connections based on distance or correlation?; and how sensitive are results to clustering methodology? I think this foundational design choice lacks justification.
Lines 206-216 : The author describe adding LSTM outputs to GLSTM-Block outputs, but what is the theoretical justification for this additive combination? And have alternatives (concatenation, gating mechanisms) been explored?
Line 234 : For the Embedding strategy, I think the dimension calculation appears arbitrary, no ablation study examines this chose, the author have to give more clarifications.
Lines 331-333: The authors describe temporal self-attention in the decoding unit, but it is unclear how this differs from or complements attention in GAT. Are both necessary?
Otherwise, there will be redundancy in the attention mechanism.
Line 344-353: Table 2 presents final parameters but provides no information about the search space explored, with the optimization methodology and the cross-validation strategy, it’s also preferable to add computational cost comparisons.
Lines 464-474: The RMSE of the discussed case for Beijing is higher with 12.716, while the IA is highest with 0.983, which leads to the conclusion of the concentration fluctuations being the main cause of the output. The implication is that the model may not be applicable to all types of pollution situations. The presence of significant prediction degradation at AQI > 100 is seen in Figure 7, but the issue is only briefly mentioned. We invite the authors to add further arguments.
The Minor Comments are below:
Line 32: "AQI mean 64.289 µg/m³" - AQI is dimensionless; this appears to be PM2.5 concentration. Clarify.
Table 1: "Miss" column meaning is unclear until reading text. Add full label "Missing (%)"
Lines 339-341: Metrics definitions are standard; brief citations would suffice.
Table 4: Including R² alongside IA would provide additional perspective on explained variance.
Lines 532-539: Conclusion mentions limitations but future work is vague. Specific research directions would strengthen the paper.
VISUALS AND FIGURES
All the figures require more resolution; they are all unreadable, it's impossible to analyze them.
I have some questions for Authors
- What is the approach of the model towards the absence of monitoring stations and the presence of newly added stations?
- What is the comparison of the computational complexity with baseline models?
- Has the model been evaluated on extreme pollution events (e.g., AQI > 300) during the testing phase?
- Is it possible to get predictions along with confidence intervals or uncertainty quantification?
- What is the difference in performance among seasons, which is significant due to the meteorological dependencies?
REQUIRED REVISIONS:
Authors are invited to:
1 Demonstrate the innovative aspect by clearly distinguishing it from previous work and by means of a thorough literature review.
2 Provide detailed ablation studies analyzing the impact of each architectural component.
3 Broaden the experimental evaluation to include multi-year data, statistical testing, and state-of-the-art references.
4 Perform an in-depth error analysis by pollution level, season, and forecasting horizon.
5 Present in a more accessible way using clearer figures, consistent notation, and professional English editing.
6 Reproducibility elements to be added such as code repository, detailed hyperparameters, and computing requirements.
7 Extend the discussion on the limitations of the model, its failure cases, and practical considerations with respect to deployment.
REFERENCES
The manuscript cites 38 references, 3 of which are self-citations. This proportion is appropriate and consistent with the standards accepted by the journal.
Comments on the Quality of English Language
Author Response
Response to Comments on the Manuscript:
“Spatiotemporal Graph Convolutional Attention Network for Air Quality Index Prediction of Beijing, Shanghai and Shenzhen”
November 13, 2025
-------------------------------------------------------------------------------------------------------
The authors gratefully acknowledge the editors and the anonymous reviewers for their constructive comments. We have made a comprehensive revision for our previous manuscript. Specially, any revisions are highlighted using the "Track Changes" function in Microsoft Word. Please refer to the point by point response. Thank you for your time.
Response to comments by Reviewer #1:
We would like to gratefully thank the reviewer for his/her constructive comments and recommendations for improving the paper. A point-by-point response to the interesting comments raised by the reviewer follows.
Here are some major comments :
Point 1. Lines 37-93 (Litzrature review gaps): by reading the introduction, we can feel the lacks comprehensive coverage of recent research focusing on the recent spatiotemporal air quality prediction methods. Notable omissions include:
-Recent transformer-based architectures for pollution prediction
-Hybrid physics-informed neural networks
-Multi-modal fusion approaches incorporating satellite imagery Lines 91-93 mention GCN limitations but fail to adequately review recent dynamic GCN variants that address these issues. A more thorough review would strengthen the motivation.
Response 1: Thank you for your suggestion. We agree with the reviewer. Thank you for this valuable feedback. We agree that our original literature review lacked comprehensive coverage of recent advances in spatiotemporal air quality prediction methods. We have substantially revised and expanded the literature review section to address the identified gaps:
- **Transformer-based architectures**: We have added a new paragraph discussing recent transformer-based models for pollution prediction, highlighting their advantages in capturing long-range temporal dependencies through self-attention mechanisms.
- **Physics-informed neural networks**: We have incorporated discussion of hybrid PINNs that integrate atmospheric physics and chemistry domain knowledge into data-driven models, addressing the challenge of limited training data while maintaining physical consistency.
- **Multi-modal fusion approaches**: We have added coverage of recent studies incorporating satellite imagery and multi-source data fusion for enhanced spatial coverage and prediction performance.
- ** GCN variants**: We have expanded the discussion to comprehensively review recent dynamic GCN variants (T-GCN, DGNN) that specifically address the static graph limitation of conventional GCN architectures.
These additions (marked in red in the revised manuscript) provide a more thorough review of the state-of-the-art and strengthen the motivation for our proposed approach. We believe these revisions significantly improve the quality and completeness of our literature review.
Below are the additions. Please refer to the section 1 part of the revised manuscript.
Recently, transformer-based architectures have emerged as powerful alternatives for air quality prediction, leveraging self-attention mechanisms to capture long-range temporal dependencies more effectively than traditional RNN-based models [22-24]. Several studies have demonstrated the superiority of transformer models in capturing complex temporal patterns in pollutant concentration data [25-27]. Furthermore, hybrid physics-informed neural networks (PINNs) have gained attention by integrating domain knowledge of atmospheric physics and chemistry into data-driven models, thereby improving prediction accuracy while maintaining physical consistency [28, 29]. These approaches effectively address the challenge of limited training data by incorporating established physical laws as constraints. Additionally, multi-modal fusion approaches that incorporate satellite imagery, meteorological data, and ground-level monitoring information have shown promise in enhancing spatial coverage and prediction performance, particularly in regions with sparse monitoring networks [30, 31].
Recent research has extended CNN to GCN that can handle non-Euclidean spatial structured data, and has been effectively utilized for prediction tasks for time series data, including air pollutant prediction [36, 37]. However, when conventional GCN architectures process spatiotemporal data, they often suffer from spatial and temporal data redundancy issues. Specifically, GCNs tend to aggregate information from all neighboring nodes and time steps equally, without distinguishing the varying importance of different spatial locations and temporal periods [38, 39]. This leads to the inclusion of irrelevant or weakly correlated information, which may dilute critical features and reduce prediction accuracy.
To address these challenges, several dynamic GCN variants have been proposed in recent years. Temporal Graph Convolutional Networks (T-GCN) integrate GRU with graph convolution to capture temporal dynamics, but still rely on predefined static spatial graphs [40]. Spatial-Temporal Graph Convolutional Networks (STGCN) employ temporal convolutions alongside spatial graph convolutions, yet their fixed receptive fields limit adaptability to varying correlation patterns [41]. Diffusion Convolutional Recurrent Neural Networks (DCRNN) model traffic diffusion processes using bidirectional random walks, though the computational complexity increases significantly with network scale [42]. Attention-based Spatial-Temporal Graph Convolutional Networks (ASTGCN) incorporate spatial and temporal attention modules, but the separate processing of spatial and temporal features may not fully capture their intricate interactions [43]. While these methods have made progress in modeling dynamic spatiotemporal relationships, they often require complex graph structure learning or predefined adjacency matrices, and may still struggle with effectively filtering redundant information across both spatial and temporal dimensions simultaneously.
In contrast, more recent research has increasingly adopted attention mechanisms that can selectively focus on spatially and temporally relevant information. Attention mechanisms effectively filter out redundant data and concentrate on the most informative features. The advantage of GAT is that they can solve the problem of data redundancy and dynamic spatial correlation through end-to-end learnable attention weights without re-quiring predefined graph structures, which brings new inspiration to this study [44]. Building on this understanding, in pollution prediction research, there are three key fac-tors that require special attention: (1) Considering the diffusion characteristics of pollutants, the dynamic correlation between all monitoring stations must be considered during the spatial feature extraction process, while avoiding spatial data redundancy; (2) Due to the dynamic and periodic changes in pollutant concentrations, accurate modeling of temporal correlation is crucial, while filtering out temporally irrelevant information; (3) In the multi-station atmospheric pollution prediction task, extracting the long-term spatiotemporal dependency features of sequence data is also critical. This research suggests a novel spatiotemporal graph attention network (GLA-Net) for AQI prediction in light of these difficulties. The following are this model's primary contributions:
Point 2. Lines 103-129 (Novelty and Contribution): The asserted contributions seem to be an advancement to existing work rather than original. The combination of GAT and LSTM is a common technique in air quality prediction research. In lines 103-105, the authors mention this as a major contribution but very similar setups have already been thoroughly researched. More than just the combination of different architectures, the authors should eventually expose what their methodology is really about. The temporal attention mechanism (lines 110-116) is a standard component in modern deep learning. What specific innovations does GLA-Net introduce beyond existing spatiotemporal models?
Response 2: Thank you for your suggestion. We agree with the editor. Thank you for this valuable feedback. We acknowledge that our original presentation did not adequately distinguish GLA-Net's innovations from existing spatiotemporal architectures. We have substantially revised this section to clearly articulate our specific technical contributions. We respectfully clarify that GLA-Net is NOT a simple combination of GAT and LSTM, but rather introduces targeted innovations to address spatial-temporal redundancy:
- Coupled Dual-Attention Architecture: Our key innovation is the tight coupling mechanism where GAT's spatial attention weights are directly fed into LSTM gating mechanisms, rather than the conventional sequential GAT→LSTM pipeline. This design ensures spatial redundancy filtering occurs BEFORE temporal processing, preventing irrelevant spatial information from propagating through the temporal dimension. This coupling strategy fundamentally differs from existing approaches that process spatial and temporal features separately or sequentially.
- Explicit Temporal Redundancy Filtering: While temporal attention is indeed a standard component, our innovation lies in the explicit redundancy filtering mechanism. Unlike standard temporal attention that passively assigns weights, our mechanism actively suppresses weakly correlated historical information through differentiated weight assignment. This design specifically addresses the challenge of distinguishing informative historical patterns from redundant data in AQI time series, which exhibit both periodic variations and abrupt changes.
- Integrated Architecture Design: Dense connections and layer normalization are integrated to work synergistically with our dual-attention mechanism, addressing information loss and training stability in the context of redundancy-filtered feature flow.The revised manuscript (marked sections in Lines 103-129) now clearly
presents:
- The specific mechanism of each innovation (HOW it works)
- The distinction from existing methods (WHY it's different)
- The problem it solves (WHAT challenge it addresses)
We believe these revisions demonstrate that GLA-Net introduces substantive architectural innovations specifically designed for spatiotemporal redundancy reduction in multi-station air quality prediction, rather than simply combining existing components.
We greatly appreciate your insightful critique, which has significantly
strengthened our contribution statement and improved the clarity of our work.
Below are the additions. Please refer to the section 1 part of the revised manuscript.
In light of these challenges, this study proposes a novel spatiotemporal graph attention network, GLA-Net, for air quality index (AQI) prediction, and introduces the following innovations to address the problems of spatiotemporal redundancy and difficulty in extracting correlations:
(1) Integrated dual-attention GLSTM-block with coupled spatial-temporal processing. Unlike conventional sequential GAT and LSTM architectures, our GLSTM-block tightly couples spatial and temporal attention by feeding GAT's spatial attention weights directly into LSTM gating mechanisms. This design filters out spatial redundancy before time processing, thus preventing the spread of irrelevant information;
(2) Temporal attention for dynamic historical correlation and redundancy filtering. The temporal attention mechanism calculates correlation between historical inputs and predicted values through query-key-value transformations, dynamically identifying the most impactful historical time steps for current predictions. Unlike standard temporal attention that treats all history equally, our mechanism explicitly filters temporal redundancy by assigning differentiated weights that suppress weakly correlated information while emphasizing critical historical patterns, effectively modeling both periodic variations and sudden changes in AQI time series;
(3) Dense connectivity with layer normalization for enhanced feature propagation. This study integrates dense connection structures and layer normalization to address in-formation degradation in deep architectures. Dense connections create direct in-ter-layer pathways that enable multi-scale feature fusion, suppress gradient vanishing, and promote feature reuse across network depth. Layer normalization stabilizes the feature distribution at each layer, mitigating internal covariate shift and improving training convergence. This integrated design ensures robust propagation of the spatiotemporal features extracted by the dual-attention mechanism, enhancing both model trainability and prediction performance;
(4) This paper selects AQI as the prediction target for experimental verification. Experimental results in three cities show that the proposed prediction model outperforms other cutting-edge models in terms of RMSE, mean absolute error (MAE), and IA metrics. In addition, GLA-Net exhibits good dynamic characteristics, especially in areas with high concentrations of air pollutants, and can more accurately predict the changing trend of AQI.
Point 3. Lines 152-172 : The data used in this study is only one year; I think it is insufficient for robust model evaluation, it recommended to add a discussion of seasonal variation or extreme weather events.
Response 3: Thank you for your suggestion. We agree with the editor. Thank you for this important concern regarding the temporal coverage of our dataset. We acknowledge that one year of data presents limitations for comprehensive model evaluation.
Seasonal Analysis Already Included:
Our manuscript includes extensive seasonal analysis (Figure 10 and associated discussion) that addresses seasonal variation:
Comprehensive Seasonal Coverage: Data spans all four seasons (spring, summer, autumn, winter) with complete meteorological diversity
Seasonal Performance Patterns: Winter RMSE is 3× higher than summer (Beijing), 2.7× (Shanghai), and 3.4× (Shenzhen)
Mechanistic Explanation: We have now expanded the discussion to explain seasonal variations through atmospheric dynamics (thermal inversions in winter, enhanced mixing in summer)
Extreme Weather: The AQI is divided into 6 levels. An AQI greater than 300 is considered extreme weather, but the sample size is small and not representative. The RMSE for each AQI level has been added as part of the analysis results.
We believe these additions substantially address your concerns while transparently acknowledging the limitation and providing clear direction for future validation with multi-year datasets.
Below are the additions. Please refer to the section 4.3.3 and 5.2 part of the revised manuscript.
Point 4. Lines 180-191: The definition of the graph through the adjacency matrix A is too simplistic. In lines 184-186, the authors discuss two factors based on the cluster membership, however, I have some question to the authors, how are clusters determined? K-means? Distance threshold?; why use binary edges rather than weighted connections based on distance or correlation?; and how sensitive are results to clustering methodology? I think this foundational design choice lacks justification.
Response 4: Thank you for your suggestion. We agree with the editor. Thank you for highlighting the insufficient detail in our graph construction methodology. We have substantially revised this section to provide clear explanation and justification.
(1) Graph Construction Method:
We employ an average-distance threshold approach:
- Calculate Euclidean distances between all station pairs based on geographic coordinates
- Compute the average distance d_avg across all station pairs
- Connect stations i and j (A_ij = 1) if d_ij ≤ d_avg; otherwise A_ij = 0
This creates a binary adjacency matrix where stations closer than the network's average inter-station distance are considered spatially connected.
(2) Why Binary Edges:
We respectfully clarify that using binary adjacency matrices (0/1) with GAT is a well-established and theoretically grounded approach in graph neural network literature, particularly for spatiotemporal prediction tasks. The seminal GAT paper and numerous subsequent works in air quality prediction employ binary adjacency matrices as the initial graph structure, allowing the attention mechanism to learn data-driven edge weights during training. This approach has become a standard practice in the field.
We use binary edges because GAT's attention mechanism learns adaptive weights during training, allowing data-driven discovery of spatial correlations that reflect actual pollutant dispersion patterns. Predefined distance weights would impose rigid assumptions inconsistent with complex pollution dynamics.
(3) Acknowledgment of Limitations:
We recognize that this average-distance approach has limitations, particularly in not accounting for regional station density variations and dynamic meteorological factors. We have added a comprehensive discussion of these limitations and potential improvements in the revised manuscript's Limitation section, including:
- Inability to adapt to varying station densities across regions
- Lack of consideration for dynamic meteorological conditions
- Potential for more sophisticated adaptive or data-driven graph construction
We believe that transparently acknowledging these limitations while demonstrating strong empirical performance represents scientific rigor. Future work will explore adaptive graph learning approaches as discussed in the revised limitation section.
Below are the additions. Please refer to the section 3.1 and 6 part of the revised manuscript.
Point 5. Lines 206-216 : The author describe adding LSTM outputs to GLSTM-Block outputs, but what is the theoretical justification for this additive combination? And have alternatives (concatenation, gating mechanisms) been explored?
Response 5: Thank you for your suggestion. We agree with the editor. Thank you for this question. We have added clarification to distinguish the dense connections within GLSTM-Block from the additive combination of parallel pathways.
Architectural Clarifications:
- Dense Connections within GLSTM-Block: GAT and LSTM layers within GLSTM-Block are connected via dense connections, enabling direct feature propagation, preventing information loss, and facilitating gradient flow through the spatiotemporal processing pipeline.
- Additive Combination of Parallel Pathways: The final integration H_final = H_LSTM + H_GLSTM is a straightforward element-wise addition designed to:
- Preserve station-specific temporal patterns (from independent LSTM)
- Incorporate network-wide spatiotemporal dependencies (from GLSTM-Block)
- Maintain balanced integration for station-level prediction
- Why Addition (Not Concatenation or Gating):
Concatenation:
- Doubles feature dimension
- Requires additional projection parameters
- Increases computational cost
Gating mechanisms:
- Introduces learnable control parameters
- Adds unnecessary complexity for this specific integration task
Addition:
- No additional parameters
- Balanced, equal-weight integration
- Preserves personalized station characteristics alongside spatial features
This design ensures that individual monitoring stations retain their temporal characteristics while benefiting from network-wide spatial information.
The revised manuscript now clearly distinguishes dense connections (within GLSTM-Block) from additive fusion (between pathways).
Below are the additions. Please refer to the section 3.2 part of the revised manuscript.
Point 6. Line 234 : For the Embedding strategy, I think the dimension calculation appears arbitrary, no ablation study examines this chose, the author have to give more clarifications.
Response 6: Thank you for your suggestion. We agree with the editor. All experiments were conducted on a computer with a Windows 10 operating system (GPU: NVIDIA GeForce GTX 2080ti, RAM: 8 GB). Data processing, approach construction, and training were carried out using several open-source libraries and frameworks, including Numpy, Pandas, and TensorFlow2.0, with Anaconda (Python 3.7) as the programming software.
We acknowledge that the original manuscript lacked sufficient justification.
Dimension Selection Rationale - 64-Dimensional Unified Architecture:
We adopt a uniform 64-dimensional embedding across all network components (GAT, LSTM, temporal attention) based on three key considerations:
- Architectural Consistency Benefits:
Maintaining identical dimensions throughout the network enables:
- Efficient element-wise operations: The additive fusion H_final = H_LSTM + H_GLSTM requires no dimension transformation
- Streamlined attention computation: Query-key-value projections in temporal attention operate without dimension mismatch
- Reduced parameter overhead: Eliminates need for dimension transformation layers between modules
- Simplified information flow: Features propagate seamlessly from GAT → LSTM → Attention without bottlenecks
- Performance-Efficiency Trade-off:
Lower dimensions (e.g., 32-dim):
- Insufficient representational capacity
- Preliminary tests showed ~15% RMSE increase across test cities
- Unable to capture complex spatiotemporal patterns
Higher dimensions (e.g., 128-dim, 256-dim):
- Substantial parameter increase (e.g., 128-dim yields ~4× parameters in attention layers)
- Marginal performance gains (<3% RMSE improvement)
- Overfitting risk, particularly in cities with limited training samples
- Increased computational cost compromises real-time forecasting capability
64-dimensional choice:
- Optimal balance between expressiveness and generalization
- Robust performance across diverse pollution regimes (Beijing, Shanghai, Shenzhen)
- Maintains real-time inference capability
- Literature Alignment:
This dimension aligns with established practices in spatiotemporal graph neural networks, where embedding dimensions typically range from 32-128 for similar applications.
Validation Through Ablation Study:
Beyond dimension selection, we designed comprehensive ablation experiments to validate the entire architectural framework:
Ablation Design:
- GLSTM-Block: Core spatiotemporal module (GAT + LSTM) with 64-dim embeddings
- GLSTM-Block-LSTM: Adds independent LSTM branch (dual-pathway) with consistent 64-dim
- GLA-Net (complete): Full architecture including temporal attention (all 64-dim)
Key Findings: Single-step prediction results (Table X) show progressive performance improvement, validating that:
- The 64-dim embedding supports effective feature representation across all components
- Each architectural innovation (independent pathway, temporal attention) contributes meaningfully
- Dimensional consistency enables seamless integration of these innovations
The uniform 64-dimensional design is not arbitrary but rather a deliberate choice that:
- Enables architectural modularity (validated through ablation)
- Balances performance and efficiency (validated through preliminary dimension tests)
- Facilitates gradient flow across multi-component architecture
Revision:
We have substantially expanded the methodology section (Section X.X) to include:
- Detailed explanation of 64-dimensional design rationale
- Clarification of ablation study design philosophy
- Explicit connection between dimension choice and architectural validation
We appreciate this feedback, which has strengthened both our architectural justification and experimental design explanation.
Below are the additions. Please refer to the section 4.2.1 part of the revised manuscript.
Point 7. Lines 331-333: The authors describe temporal self-attention in the decoding unit, but it is unclear how this differs from or complements attention in GAT. Are both necessary? Otherwise, there will be redundancy in the attention mechanism.
Response 7: Thank you for your suggestion. We agree with the editor. Thank you for this question. The two attention mechanisms are complementary, not redundant:
GAT Spatial Attention:
- Operates across monitoring stations (spatial dimension)
- Identifies "where"—which neighboring stations are relevant
- Captures pollutant spatial dispersion patterns
Temporal Self-Attention:
- Operates across historical time steps (temporal dimension)
- Identifies "when"—which time periods are informative
- Captures temporal dependencies for future prediction
Why Both Are Necessary: They operate on orthogonal dimensions. Spatial attention cannot capture temporal dependencies, and temporal attention cannot capture spatial relationships. Both are required for complete spatiotemporal modeling.
We have added clarification before Line 331 distinguishing these mechanisms.
Below are the additions. Please refer to the section 3.5 part of the revised manuscript.
Point 8. Line 344-353: Table 2 presents final parameters but provides no information about the search space explored, with the optimization methodology and the cross-validation strategy, it’s also preferable to add computational cost comparisons.
Response 8: Thank you for your suggestion. We agree with the editor. All experiments were conducted on a computer with a Windows 10 operating system (GPU: NVIDIA GeForce GTX 2080ti, RAM: 8 GB). Data processing, approach construction, and training were carried out using several open-source libraries and frameworks, including Numpy, Pandas, and TensorFlow2.0, with Anaconda (Python 3.7) as the programming software. Through extensive experimentation, we established a uniform 64-dimensional representation across all network components (GAT QKV projections, LSTM hidden states, temporal attention QKV, and dense connection layers). This design emerged from system-atic optimization addressing both model performance and overfitting concerns. First, comparative experiments with alternative dimensions (32, 64, 128, 256) revealed that 64-dim achieves optimal balance. Lower dimensions (32) provided insufficient representational capacity, causing RMSE degradation. Higher dimensions (128, 256) increased parameters by 4-16 times without proportional performance gains, introducing overfitting particularly in smaller datasets. Second, dimensional consistency across modules yields critical architectural benefits. When GAT spatial features, LSTM temporal representations, and attention mechanisms share 64-dim, the model eliminates dimension transformation layers between components. This simplifies architecture, reduces parameters, and enables seamless information flow. The additive fusion and attention computations proceed efficiently without dimensional mismatches requiring additional projections. Third, maintaining 64-dim in dense connection layers preserves feature fidelity across network depth.
Specifically, Optimal hyperparameter selection was achieved through systematic it-erative experimentation. The dataset underwent fixed partitioning: 70% allocated to model training, 15% to validation, and 15% to testing. This stratification strategy differs from k-fold cross-validation by preserving temporal sequence integrity, thereby preventing in-formation leakage inherent in time-series forecasting tasks. Real-time model evaluation occurred during training via the validation subset. Following each iteration, we computed root mean square error (RMSE) and mean absolute error (MAE) metrics to assess predictive accuracy. Parameter configurations yielding superior performance were dynamically saved and updated throughout the optimization cycle. The training regimen encompassed 100 epochs, with validation-based assessment conducted after each complete pass through the training data. When validation metrics (RMSE, MAE) demonstrated improvement, the corresponding network weights were preserved. To mitigate overfitting risks, we implemented a 0.5 dropout probability informed by preliminary experimental trials. Batch size (32 samples) and learning rate (0.001) emerged from grid search exploration, aligning with established practices in spatiotemporal forecasting literature. Additionally, an early termination mechanism monitored validation performance to guarantee convergence stability and prevent unnecessary computational expenditure.
Due to space constraints and manuscript scope considerations, exhaustive ablation tables documenting the complete hyperparameter exploration process are not included in the main text. However, to ensure full experimental reproducibility, we provide comprehensive technical specifications in Table 2, including: (1) computational infrastructure details, CUDA version, memory specifications); (2) software environment configuration (Python version, PyTorch/TensorFlow version, key dependency libraries); and (3) all critical hyperparameters (learning rate, batch size, dropout rate, network dimensions, attention heads, epoch count). These specifications enable readers to replicate our experimental setup and validate reported results.
Furthermore, to promote transparency and facilitate future research, the complete source code implementing GLA-Net will be made publicly available on the corresponding author's GitHub repository upon manuscript acceptance. The repository will include: training scripts, data preprocessing pipelines, model architecture implementation, evaluation metrics computation, and visualization tools. We encourage the research community to utilize, extend, and improve upon this work.
Below are the additions. Please refer to the section 4.1 part of the revised manuscript.
Point 9. The RMSE of the discussed case for Beijing is higher with 12.716, while the IA is highest with 0.983, which leads to the conclusion of the concentration fluctuations being the main cause of the output. The implication is that the model may not be applicable to all types of pollution situations. The presence of significant prediction degradation at AQI > 100 is seen in Figure 7, but the issue is only briefly mentioned. We invite the authors to add further arguments.
Response 9: Thank you for your suggestion. We agree with the editor. Thank you for this critical observation. We acknowledge that our original discussion inadequately addressed the model's performance degradation at AQI > 100.
Expanded Analysis Added:
We have substantially expanded our discussion (after Table 3 analysis) to address three key points:
- Root Causes of High Pollution Prediction Degradation:
- Data imbalance: High pollution episodes are rare (15.67% in Beijing, 5.03% in Shanghai, 1.12% in Shenzhen), causing insufficient learning
- Complex nonlinear dynamics: Extreme events involve sudden emissions, stagnant meteorology, and regional transport that differ from typical patterns
- Attention mechanism limitations: Difficulty weighting features during atypical scenarios substantially different from training examples
- Model Applicability Across Pollution Regimes:
Beijing (high variability):
- High IA (0.983): Captures overall trends successfully
- High RMSE (12.716): Systematic errors during extremes
- Implication: Suitable for general monitoring and moderate pollution forecasting, but requires caution for high pollution early warning
Shenzhen (low variability):
- Low RMSE (5.039): High accuracy in stable conditions
- Lower IA: Less sensitive to subtle variations
- Implication: Most reliable for moderate-variability regions
- Proposed Improvements:
- Pollution-level-specific loss functions or weighted sampling
- Auxiliary features (emission inventories, synoptic patterns)
- Ensemble with physics-based models for extreme events
- Data augmentation with synthetic high pollution scenarios
Below are the additions. Please refer to the section 5.1 part of the revised manuscript.
The Minor Comments are below:
Point 1. Line 32: "AQI mean 64.289 µg/m³" - AQI is dimensionless; this appears to be PM2.5 concentration. Clarify.
Response 1: Thank you for your suggestion. We agree with the reviewer. Thank you for pointing out this oversight. You are absolutely correct that AQI (Air Quality Index) is dimensionless and should not have units. This was an inadvertent error on our part. The value 64.289 does refer to the AQI (not PM2.5 concentration). We have removed the unit "µg/m³" from Line 32 in the revised manuscript, which now correctly reads: "AQI mean: 64.289". We appreciate your careful review, which helps us maintain the rigor and accuracy of our work.
Below are the additions. Please refer to the section ABSTRACT and 5.1 of the revised manuscript.
Point 2. Table 1: "Miss" column meaning is unclear until reading text. Add full label "Missing (%)"
Response 2: Thank you for your suggestion. We agree with the editor. Corrections have been made in this revised version.
Below are the additions. Please refer to the section 2.2 part of the revised manuscript.
Point 3. Lines 339-341: Metrics definitions are standard; brief citations would suffice.
Response 3: Thank you for your suggestion. We agree with the editor. This revision removed section 3.6 and related formulas.
Below are the additions. Please refer to the section 3.5 part of the revised manuscript.
Point 4. Table 4: Including R² alongside IA would provide additional perspective on explained variance.
Response 4: Thank you for your suggestion. We agree with the editor. This revision adds descriptions of R² and IA.
Below are the additions. Please refer to the section 4.3.1 part of the revised manuscript.
Point 5. Lines 532-539: Conclusion mentions limitations but future work is vague. Specific research directions would strengthen the paper.
Response 5: Thank you for your suggestion. We agree with the editor. W In this paper, Table 1 has been modified to a three-line table. This revision adds a new section, 6. Limitations. This section provides a more detailed description of future research directions.
Below are the additions. Please refer to the section 6 part of the revised manuscript.
VISUALS AND FIGURES
Point 1. All the figures require more resolution; they are all unreadable, it's impossible to analyze them.
Response 1: Thank you for your suggestion. We agree with the reviewer. Thank you for bringing this important issue to our attention. We sincerely apologize for the poor figure quality in the version you reviewed.
Root Cause Identification:
Upon investigation, we identified that the resolution degradation occurred during the journal's manuscript processing workflow rather than in our original submission. Specifically:
Original Submission Quality: All figures in our original manuscript were prepared at 600 DPI resolution, meeting or exceeding standard publication requirements for scientific journals.
Compression During Processing: The downloadable Word and PDF versions available through the journal platform undergo automatic compression to reduce file size. This process significantly degrades image quality, making figure details—particularly numerical values, axis labels, and legends—difficult or impossible to discern.
PDF Version More Severely Affected: The PDF conversion process applies additional compression, resulting in substantially lower resolution compared to the Word version, likely because the Word document was not configured to preserve high-fidelity graphics during PDF generation.
Resolution:
To address this issue, we have taken the following actions:
Format Optimization: Figures are now embedded in the Word document with "High fidelity" or "Do not compress images" settings enabled to prevent automatic quality reduction.
Verification: We have attached three representative figures as supplementary high-resolution files demonstrating the intended quality. All numerical values, labels, and graphical elements are clearly legible at their original 600 DPI resolution.
Communication with Editorial Office: We recommend the editorial team review their document processing pipeline to ensure submitted high-resolution figures are preserved in the final published version.
Attached Evidence:
Please find the three sample figures (600 DPI) attached to this response, demonstrating that all numerical values and graphical details are fully legible in our original files. We are confident that with proper handling during production, all figures will meet publication standards in the final article.
We appreciate your patience and thorough review. Please let us know if you require any specific figure format or additional documentation to ensure optimal quality in the published version.
Below are the additions. Please refer to the revised manuscript.
=
I have some questions for Authors
Point 1. What is the approach of the model towards the absence of monitoring stations and the presence of newly added stations?
Response 1: Thank you for your suggestion. We agree with the reviewer. Thank you for this important question regarding model adaptability to dynamic monitoring network changes.
Current Dataset Completeness:
The monitoring stations selected in this study represent the most comprehensive available network configuration for the studied regions. Our data are sourced from two authoritative open platforms: meteorological data from OpenWeatherMap (https://openweathermap.org/) and air quality data from Quotsoft Air Quality Database (https://quotsoft.net/air/). These platforms maintain real-time data streams from operational monitoring networks, and our station selection encompasses all currently active stations with consistent historical records in Beijing, Shanghai, and Shenzhen during the study period (2023).
Model Behavior with Network Topology Changes:
We acknowledge that handling missing stations and newly added stations represents an inherent limitation of graph-based spatial models, including our GLA-Net architecture:
- Missing Stations (Temporary Data Loss):
When individual stations experience temporary equipment failures or data transmission issues, our model cannot dynamically adapt without intervention. The graph structure (adjacency matrix A) is fixed during training, and missing data from specific nodes would require either:
- Data imputation methods to fill gaps before model inference
- Retraining with updated graph topology excluding unavailable stations
This limitation is common to graph convolutional approaches where spatial relationships are encoded in fixed adjacency matrices.
- Newly Added Stations (Network Expansion):
When monitoring networks expand by deploying new stations, integration into GLA-Net requires:
- Graph reconstruction: Recalculating the adjacency matrix using the distance-based threshold criterion to establish connections between new stations and existing neighbors
- Model retraining: The graph neural network architecture must be retrained to learn spatial attention patterns incorporating the expanded topology
This retraining requirement stems from fundamental graph neural network principles—GAT learns node-specific attention coefficients based on the initial graph structure, and topological changes necessitate relearning these spatial relationships.
Practical Implications:
For operational deployment, this means:
- The model performs optimally with the current complete monitoring network
- Temporary station outages require data imputation or graceful degradation strategies
- Network expansions require periodic model updates (e.g., quarterly retraining cycles when new stations are deployed)
Point 2. What is the comparison of the computational complexity with baseline models?
Response 2: Thank you for your suggestion. We agree with the editor. Thank you for this important question. We acknowledge that GLA-Net exhibits higher computational complexity compared to baseline models, which is an inherent consequence of our architectural design philosophy.
Architectural Complexity Analysis:
GLA-Net's computational complexity is fundamentally higher than baseline models due to its multi-component integrated architecture:
- Multi-Module Architecture:
Unlike simpler baseline models (LSTM, GRU, CNN-LSTM) that employ single processing pathways, GLA-Net integrates multiple sophisticated components:
- Spatial attention (GAT): Computes dynamic attention weights for all station pairs (O(N²) operations per layer)
- Dual temporal processing: Independent LSTM branch + LSTM within GLSTM-Block
- Temporal attention mechanism: Multi-head self-attention over historical sequences (O(T²) complexity)
- Dense connections: Cross-layer feature aggregation requiring additional forward pass computations
Each component introduces computational overhead. For instance, the GAT layer requires pairwise attention coefficient calculations across N monitoring stations, which is absent in sequence-only models like LSTM.
Point 3. Has the model been evaluated on extreme pollution events (e.g., AQI > 300) during the testing phase?
Response 3: Thank you for your suggestion. We agree with the editor. This content has been added in this revision.
Below are the additions. Please refer to the section 4.3.3 part of the revised manuscript.
|
AQI |
Beijing |
Shanghai |
Shenzhen |
|||
|
Percentage |
RMSE |
Percentage |
RMSE |
Percentage |
RMSE |
|
|
47.55% |
9.36 |
54.71% |
6.98 |
78.39% |
3.35 |
|
|
36.75% |
10.64 |
40.26% |
7.60 |
20.49% |
3.81 |
|
|
8.29% |
17.58 |
4.42% |
12.73 |
0.99% |
15.83 |
|
|
4.26% |
19.65 |
0.27% |
23.01 |
0.06% |
42.14 |
|
|
2.35% |
39.71 |
0.13% |
41.09 |
0.07% |
67.68 |
|
|
0.80% |
70.91 |
0.21% |
62.82 |
- |
- |
|
Point 4. Is it possible to get predictions along with confidence intervals or uncertainty quantification?
Response 4: Thank you for your suggestion. We agree with the editor. We sincerely thank the reviewer for the valuable suggestion regarding uncertainty estimation. Following the advice, we have conducted a preliminary analysis of confidence intervals based on the original data. Although this part has not been included in the main manuscript to maintain the focus on model development, we would be glad to provide these results in the supplementary material if the editor and reviewers consider it appropriate. We also fully agree that rigorous uncertainty quantification is essential for decision-making, and we will make this an important direction of our future research.
Point 5. What is the difference in performance among seasons, which is significant due to the meteorological dependencies?
Response 5: Thank you for your suggestion. We agree with the editor. This content has been added in this revision.
Below are the additions. Please refer to the section 4.3.3 part of the revised manuscript.
REQUIRED REVISIONS:
Point 1. Demonstrate the innovative aspect by clearly distinguishing it from previous work and by means of a thorough literature review.
Response 1: Thank you for your suggestion. We agree with the reviewer. This revision adds a comprehensive literature review to demonstrate its innovativeness.
Below are the additions. Please refer to the section 1 part of the revised manuscript.
Point 2. Provide detailed ablation studies analyzing the impact of each architectural component.
Response 2: Thank you for your suggestion. We agree with the editor. This revision adds detailed ablation studies to analyze the impact of each building component.
Below are the additions. Please refer to the section 4.2.1part of the revised manuscript.
section 4.2.1
In this study, monitoring stations 1006A, 1147A, and 1356A from the Beijing, Shanghai, and Shenzhen datasets were selected as prediction targets.
Point 3. Broaden the experimental evaluation to include multi-year data, statistical testing, and state-of-the-art references.
Response 3: Thank you for your suggestion. We agree with the editor. Thank you for this valuable suggestion regarding expanding the experimental scope.
We acknowledge that our study utilizes one year of data (2023), which limits inter-annual variability assessment. This temporal constraint stems from practical data availability considerations:
Data Source and Access: The air quality and meteorological data used in this study were obtained through a collaborative research project. The project's current phase (Phase I/II) provides access to 2023 data with complete temporal coverage and comprehensive station networks across Beijing, Shanghai, and Shenzhen.
Multi-Year Data Requirements: Extending the evaluation to multi-year datasets requires continuation into subsequent project phases (Phase III and beyond), which will provide access to additional historical years and enable longitudinal validation. The project timeline and data sharing agreements currently limit our access to single-year data.
Below are the additions. Please refer to the revised manuscript.
Point 4. Perform an in-depth error analysis by pollution level, season, and forecasting horizon.
Response 4: Thank you for your suggestion. We agree with the editor. This content has been added in this revision.
Below are the additions. Please refer to the section 4.3.3 and 5.2 part of the revised manuscript.
|
AQI |
Beijing |
Shanghai |
Shenzhen |
|||
|
Percentage |
RMSE |
Percentage |
RMSE |
Percentage |
RMSE |
|
|
47.55% |
9.36 |
54.71% |
6.98 |
78.39% |
3.35 |
|
|
36.75% |
10.64 |
40.26% |
7.60 |
20.49% |
3.81 |
|
|
8.29% |
17.58 |
4.42% |
12.73 |
0.99% |
15.83 |
|
|
4.26% |
19.65 |
0.27% |
23.01 |
0.06% |
42.14 |
|
|
2.35% |
39.71 |
0.13% |
41.09 |
0.07% |
67.68 |
|
|
0.80% |
70.91 |
0.21% |
62.82 |
- |
- |
|
Point 5. Present in a more accessible way using clearer figures, consistent notation, and professional English editing.
Response 5: Thank you for your suggestion. We agree with the editor. This revision adds clearer numbers, consistent notation, and professional English editing.
Below are the additions. Please refer to the revised manuscript.
Point 6. Reproducibility elements to be added such as code repository, detailed hyperparameters, and computing requirements.
Response 6: Thank you for your suggestion. We agree with the editor. This modification has been explained.
Below are the additions. Please refer to the section 4.1 part of the revised manuscript.
Point 7. Extend the discussion on the limitations of the model, its failure cases, and practical considerations with respect to deployment.
Response 7: Thank you for your suggestion. We agree with the editor. This revision includes a detailed explanation of the limitations.
Below are the additions. Please refer to the section 6 part of the revised manuscript.
REFERENCES
Point 1. The manuscript cites 38 references, 3 of which are self-citations. This proportion is appropriate and consistent with the standards accepted by the journal.
Response 1: Thank you for your suggestion. We agree with the reviewer. Thank you for your review of our citation practices. We note that the manuscript currently contains 52 references, of which 3 are self-citations (~5.8%). This ratio is well within acceptable standards and reflects our commitment to comprehensive literature coverage while providing necessary context from our previous methodological development. All self-citations are directly relevant to the current research framework. We appreciate your confirmation that our citation practices comply with journal norms.
Thank you for the valuable comments from the editor, I will work harder in the future. I wish you good health!
Author Response File:
Author Response.pdf
Reviewer 2 Report
Comments and Suggestions for AuthorsThe manuscript deals with air qulaity index condidering spatiotemporal information. I would like to make some comments.
- In title, it is better to repalce air qulaity with air quality index because tha authors mainly forcused on AQI.
- AQI has no unit, however the authors used ug/m3. Also, units must be specified for all parameters in Table 1. In Table 6, RMSE for PM2.5 needs unit.
- In Table 1, 11 meteorological data are not shown.
- How many neiboring stations were used to predict AQI at the traget station for three cities?
- Why the authors demonstrated prediction of AQI at only target station in each city rather than all stations?
- In Table 3, station numbers should be specified for all cities.
- Practically, time periods less than 24 hours are too short in predition of AQI and PM2.5.
In general, the manuscript can be easily readable.
Author Response
Response to Comments on the Manuscript:
“Spatiotemporal Graph Convolutional Attention Network for Air Quality Index Prediction of Beijing, Shanghai and Shenzhen”
November 13, 2025
-------------------------------------------------------------------------------------------------------
The authors gratefully acknowledge the editors and the anonymous reviewers for their constructive comments. We have made a comprehensive revision for our previous manuscript. Specially, any revisions are highlighted using the "Track Changes" function in Microsoft Word. Please refer to the point by point response. Thank you for your time.
Response to comments by Reviewer #2:
We would like to gratefully thank the reviewer for his/her constructive comments and recommendations for improving the paper. A point-by-point response to the interesting comments raised by the reviewer follows.
Point 1. In title, it is better to repalce air qulaity with air quality index because tha authors mainly forcused on AQI.
Response 1: Thank you for your suggestion. We agree with the editor. Thank you for this excellent suggestion. We agree that "Air Quality Index" is more precise than "Air Quality" given our specific focus on AQI prediction. The title has been revised accordingly in the manuscript.
Revised Title: [Spatiotemporal Graph Convolutional Attention Network for Air Quality Index Prediction of Beijing, Shanghai and Shenzhen]
We appreciate your attention to terminological precision.
Below are the additions. Please refer to the revised manuscript.
Point 2. AQI has no unit, however the authors used ug/m3. Also, units must be specified for all parameters in Table 1. In Table 6, RMSE for PM2.5 needs unit.
Response 2: Thank you for your suggestion. We agree with the editor. Thank you for identifying these important inconsistencies in unit specifications.
- AQI Unit Correction:
You are absolutely correct that AQI (Air Quality Index) is dimensionless and should not have units. We have removed the erroneous "µg/m³" notation from all instances where AQI is mentioned in the manuscript.
- Table 1 - Parameter Units:
All parameters in Table 1 have been updated with appropriate units (e.g., temperature in °C, wind speed in m/s, pressure in Pa, pollutant concentrations in µg/m³).
- Table 6 - RMSE Unit:
The unit for PM2.5 RMSE in Table 6 has been added (µg/m³).
We appreciate your meticulous review, which has improved the technical accuracy and clarity of our manuscript.
Below are the additions. Please refer to the section 2.2 and 4.4 part of the revised manuscript.
|
Parameters |
Beijing (1006A station) |
Shanghai (1147A station) |
Shenzhen (1358A station) |
||||||
|
Max |
Min |
Missing (%) |
Max |
Min |
Missing (%) |
Max |
Min |
Missing (%) |
|
|
Temperature |
312.25 |
254.52 |
0 |
309.7 |
265.82 |
0 |
309.29 |
279.76 |
0 |
|
Dew point |
300.36 |
240.33 |
0 |
304.03 |
254.52 |
0 |
306.19 |
265.29 |
0 |
|
Sensible temperature |
315.07 |
247.8 |
0 |
316.7 |
258.82 |
0 |
316.29 |
276.08 |
0 |
|
Min temperature |
309.74 |
250.04 |
0 |
309.15 |
265.25 |
0 |
307.56 |
279.12 |
0 |
|
Max temperature |
313 |
254.79 |
0 |
310.26 |
266.4 |
0 |
309.38 |
281.49 |
0 |
|
Pressure |
1048 |
988 |
0 |
1042 |
985 |
0 |
1031 |
994 |
0 |
|
Humidity |
100 |
10 |
0 |
100 |
15 |
0 |
99 |
24 |
0 |
|
Wind speed |
8.59 |
0.02 |
0 |
17 |
0 |
0 |
9.7 |
0 |
0 |
|
Wind direction |
360 |
0 |
0 |
360 |
0 |
0 |
360 |
0 |
0 |
|
Cloudiness |
100 |
0 |
0 |
100 |
0 |
0 |
100 |
0 |
0 |
|
Weather id |
804 |
200 |
0 |
804 |
200 |
0 |
804 |
500 |
0 |
|
AQI |
500 |
8 |
2.72 |
500 |
8 |
3.57 |
248 |
6 |
4.72 |
|
PM2.5 |
663 |
1 |
2.12 |
178 |
1 |
2.03 |
198 |
1 |
4.81 |
|
PM2.5_24h |
232 |
2 |
1.16 |
106 |
2 |
0.74 |
93 |
1 |
1.68 |
|
PM10 |
7093 |
1 |
3.81 |
741 |
1 |
2.79 |
292 |
1 |
3.27 |
|
PM10_24h |
1771 |
4 |
1.17 |
401 |
9 |
0.74 |
151 |
3 |
1.15 |
|
SO2 |
51 |
1 |
1.79 |
349 |
1 |
1.35 |
21 |
1 |
2.72 |
|
SO2_24h |
51 |
1 |
1.15 |
20 |
1 |
0.74 |
14 |
3 |
1.15 |
|
NO2 |
126 |
1 |
1.78 |
154 |
3 |
1.52 |
232 |
1 |
2.23 |
|
NO2_24h |
94 |
2 |
1.15 |
126 |
5 |
0.74 |
138 |
4 |
1.15 |
|
O3 |
1200 |
1 |
4.12 |
281 |
5 |
1.41 |
259 |
1 |
4.46 |
|
O3_24h |
1200 |
2 |
1.15 |
281 |
13 |
0.74 |
264 |
4 |
1.15 |
|
CO |
5 |
0.1 |
1.93 |
3.2 |
0.1 |
1.62 |
1.6 |
0.1 |
2.82 |
|
CO_24h |
2.5 |
0.1 |
1.15 |
1.9 |
0.1 |
0.74 |
1 |
0.2 |
1.15 |
|
Model |
Beijing |
Shanghai |
Shenzhen |
|||
|
RMSE (µg/m³) |
IA |
RMSE (µg/m³) |
IA |
RMSE (µg/m³) |
IA |
|
|
CNN-LSTM |
31.980 |
0.767 |
23.723 |
0.659 |
13.879 |
0.641 |
|
ResNet-LSTM |
30.456 |
0.781 |
22.664 |
0.670 |
13.247 |
0.652 |
|
GCN-LSTM |
30.312 |
0.783 |
22.194 |
0.673 |
12.883 |
0.659 |
|
GLA-Net |
29.014 |
0.791 |
21.317 |
0.684 |
11.764 |
0.667 |
Point 3. In Table 1, 11 meteorological data are not shown.
Response 3: Thank you for your suggestion. We agree with the editor. We added descriptions of the meteorological data to Table 1.
Below are the additions. Please refer to the section 2.2 part of the revised manuscript.
|
Parameters |
Beijing (1006A station) |
Shanghai (1147A station) |
Shenzhen (1358A station) |
||||||
|
Max |
Min |
Missing (%) |
Max |
Min |
Missing (%) |
Max |
Min |
Missing (%) |
|
|
Temperature |
312.25 |
254.52 |
0 |
309.7 |
265.82 |
0 |
309.29 |
279.76 |
0 |
|
Dew point |
300.36 |
240.33 |
0 |
304.03 |
254.52 |
0 |
306.19 |
265.29 |
0 |
|
Sensible temperature |
315.07 |
247.8 |
0 |
316.7 |
258.82 |
0 |
316.29 |
276.08 |
0 |
|
Min temperature |
309.74 |
250.04 |
0 |
309.15 |
265.25 |
0 |
307.56 |
279.12 |
0 |
|
Max temperature |
313 |
254.79 |
0 |
310.26 |
266.4 |
0 |
309.38 |
281.49 |
0 |
|
Pressure |
1048 |
988 |
0 |
1042 |
985 |
0 |
1031 |
994 |
0 |
|
Humidity |
100 |
10 |
0 |
100 |
15 |
0 |
99 |
24 |
0 |
|
Wind speed |
8.59 |
0.02 |
0 |
17 |
0 |
0 |
9.7 |
0 |
0 |
|
Wind direction |
360 |
0 |
0 |
360 |
0 |
0 |
360 |
0 |
0 |
|
Cloudiness |
100 |
0 |
0 |
100 |
0 |
0 |
100 |
0 |
0 |
|
Weather id |
804 |
200 |
0 |
804 |
200 |
0 |
804 |
500 |
0 |
|
AQI |
500 |
8 |
2.72 |
500 |
8 |
3.57 |
248 |
6 |
4.72 |
|
PM2.5 |
663 |
1 |
2.12 |
178 |
1 |
2.03 |
198 |
1 |
4.81 |
|
PM2.5_24h |
232 |
2 |
1.16 |
106 |
2 |
0.74 |
93 |
1 |
1.68 |
|
PM10 |
7093 |
1 |
3.81 |
741 |
1 |
2.79 |
292 |
1 |
3.27 |
|
PM10_24h |
1771 |
4 |
1.17 |
401 |
9 |
0.74 |
151 |
3 |
1.15 |
|
SO2 |
51 |
1 |
1.79 |
349 |
1 |
1.35 |
21 |
1 |
2.72 |
|
SO2_24h |
51 |
1 |
1.15 |
20 |
1 |
0.74 |
14 |
3 |
1.15 |
|
NO2 |
126 |
1 |
1.78 |
154 |
3 |
1.52 |
232 |
1 |
2.23 |
|
NO2_24h |
94 |
2 |
1.15 |
126 |
5 |
0.74 |
138 |
4 |
1.15 |
|
O3 |
1200 |
1 |
4.12 |
281 |
5 |
1.41 |
259 |
1 |
4.46 |
|
O3_24h |
1200 |
2 |
1.15 |
281 |
13 |
0.74 |
264 |
4 |
1.15 |
|
CO |
5 |
0.1 |
1.93 |
3.2 |
0.1 |
1.62 |
1.6 |
0.1 |
2.82 |
|
CO_24h |
2.5 |
0.1 |
1.15 |
1.9 |
0.1 |
0.74 |
1 |
0.2 |
1.15 |
Point 4. How many neiboring stations were used to predict AQI at the traget station for three cities?
Response 4: Thank you for your suggestion. We agree with the editor. Thank you for your valuable question.
Number of neighboring stations used:
For each target station, we utilized data from all other stations within the same city to construct the spatial graph network. The specific numbers are as follows:
- Beijing: 23 neighboring stations (e.g., when predicting AQI at station 1006A, data from the other 23 stations are used)
- Shanghai: 18 neighboring stations
- Guangzhou: 14 neighboring stations
Graph construction methodology:
In our model, each monitoring station is represented as a node in the graph, and the connections between stations are represented as edges. The adjacency matrix A is constructed using the average distance threshold method:
- Distance calculation: We first compute the Euclidean distance between all station pairs based on their geographical coordinates (latitude and longitude).
- Threshold determination: The average distance among all station pairs is calculated as the threshold: $$\bar{d} = \frac{2}{N(N-1)}\sum_{i<j}d_{ij} where N is the total number of stations.
- Binary adjacency construction: For any two stations i and j, the adjacency relationship is defined as: $$A_{ij} = \begin{cases} 1, & \text{if } d_{ij} < \bar{d} \text{ (connected)} \\ 0, & \text{otherwise (disconnected)} \end{cases}
This approach ensures that each station is dynamically connected to its spatially proximate neighbors, enabling the model to capture local spatial dependencies while maintaining computational efficiency.
Below are the additions. Please refer to the section 3.1 part of the revised manuscript.
Environmental monitoring stations in Beijing.
|
Number |
Monitoring station |
Latitude () |
Longitude () |
|
1 |
1001A |
116.362 |
39.878 |
|
2 |
1002A |
116.220 |
40.292 |
|
3 |
1003A |
116.417 |
39.929 |
|
4 |
1004A |
116.407 |
39.886 |
|
5 |
1005A |
116.462 |
39.937 |
|
6 |
1006A |
116.339 |
39.930 |
|
7 |
1007A |
116.288 |
39.961 |
|
8 |
1008A |
116.663 |
40.135 |
|
9 |
1009A |
116.628 |
40.328 |
|
10 |
1010A |
116.230 |
40.217 |
|
11 |
1011A |
116.397 |
39.982 |
|
12 |
1012A |
116.184 |
39.914 |
|
13 |
3281A |
115.972 |
40.453 |
|
14 |
3417A |
117.071 |
40.147 |
|
15 |
3418A |
116.834 |
40.410 |
|
16 |
3671A |
116.088 |
39.973 |
|
17 |
3672A |
116.154 |
39.820 |
|
18 |
3673A |
116.680 |
39.908 |
|
19 |
3674A |
115.958 |
39.760 |
|
20 |
3675A |
116.467 |
39.773 |
|
21 |
3694A |
115.986 |
40.456 |
|
22 |
3695A |
116.615 |
40.305 |
|
23 |
3696A |
116.255 |
39.877 |
|
24 |
3697A |
116.840 |
40.384 |
Currently, AQI prediction is a sequence prediction issue with time and space dimensions. That is, forecast the AQI of the target monitoring station in the area for the next time interval based on the historical data on air pollutant concentrations for the current time interval [49]. Based on previous research results, in order to accurately grasp the geographic spatial correlation [50], this paper defines the monitoring stations and pairwise connections as G=(V,E,A), where V represents the node set, E represents the edge set, V=R^(N×N) represents the adjacency matrix, that is, it represents the proximity between any pair of nodes, and N represents the number of nodes. As shown in Figure 2, this paper abstracts each monitoring station as a node in the graph and the pairwise connections of the monitoring stations as edges. The adjacency matrix A is constructed using the average distance threshold method: (1) Distance Calculation: First, we calculate the Euclidean distance between the monitoring stations based on their geographical coordinates (latitude and longitude); (2) Threshold Determination: We calculate the average distance between all pairs of stations, where N is the total number of monitoring stations; (3) Binary Adjacency Relationship Construction: For any two stations, if the distance between them is less than the average distance, we set it to 1 (connected); otherwise, we set it to 0 (disconnected). Assume that the input time step is n, the prediction time step is p, and t_i∈{t_1,⋯,t_n,⋯,t_(n+p) }. The core of this study is how to reveal the spatiotemporal correlation of highway network traffic data. Therefore, this paper uses GLA-Net to learn the spatiotemporal features hidden in the monitoring site data to achieve accurate prediction of AQI.
Point 5. Why the authors demonstrated prediction of AQI at only target station in each city rather than all stations?
Response 5: Thank you for your suggestion. We agree with the editor. Thank you for this insightful question. We would like to clarify that in our initial analysis, we did conduct predictions for all monitoring stations in Beijing (as mentioned in our preliminary draft, where we predicted all 23 stations and performed kriging spatial interpolation). However, in the final manuscript, we chose to present results from one representative target station per city for the following reasons:
Reason 1: Focus on methodological validation rather than comprehensive spatial analysis
The primary objective of this study is to validate the effectiveness of the proposed GLA-Net architecture through systematic comparison with baseline models. Since our graph-based model treats all stations equivalently within the network structure (i.e., each station serves as both a node receiving information from neighbors and a contributor to the overall graph), demonstrating the model's performance at one representative station is sufficient to validate the methodology. Including predictions for all stations would shift the focus from model validation to spatial distribution analysis, which is beyond the scope of this paper. As you correctly noted, the ability to predict one station implies the capability to predict others using the same framework.
Reason 2: Ensuring fair comparison across different model architectures
Many baseline models in our comparison (like LSTM, and GRU) do not inherently support multi-station joint prediction. To ensure a fair and consistent comparison, we standardized the experimental setup by selecting one representative station per city. Presenting all-station predictions only for GLA-Net while showing single-station results for baselines would introduce methodological inconsistency and potentially bias the evaluation.
Reason 3: Avoiding redundancy and maintaining manuscript conciseness
Our preliminary all-station analysis for Beijing revealed spatially consistent patterns: prediction errors (RMSE) exhibited similar trends across stations, with lower errors in the northwest (low AQI region) and higher errors in the southeast (high AQI region). This consistency suggests that presenting results from all stations would introduce substantial redundancy without adding significant new insights. Following the principle of scientific communication efficiency, we opted to present representative results while noting that the model can be readily extended to all stations.
Additional clarification:
We acknowledge that demonstrating the model's spatial generalizability is valuable. To address this concern, we are willing to include the all-station prediction results and spatial interpolation maps (as prepared in our initial draft) in the Supplementary Materials, which would provide comprehensive evidence of the model's performance across the entire study area without disrupting the main narrative flow.
Below are the additions.
Average AQI and RMSE distribution in Beijing.
Point 6. In Table 3, station numbers should be specified for all cities.
Response 6: Thank you for your suggestion. We agree with the editor. We have revised Table 3 to include the specific station numbers for all three cities.
Additionally, to ensure consistency and clarity throughout the manuscript, we have also added station identifiers to Tables 5, 6, and 7, where multi-city comparisons are presented.
This modification enhances the reproducibility and transparency of our experimental results.
Below are the additions. Please refer to the revised manuscript.
Point 7. Practically, time periods less than 24 hours are too short in predition of AQI and PM2.5.
Response 7: Thank you for your suggestion. We agree with the editor. We agree that extending the prediction horizon beyond 24 hours would provide more practical value for air quality management and public health warning systems.
In response to your comment, we have extended our experiments to include 48-hour ahead predictions for PM2.5.
Below are the additions. Please refer to the section 4.4 part of the revised manuscript.
Table 8. PM2.5 concentration prediction in 48h task.
|
Model |
Beijing (1006A station) |
Shanghai (1147A station) |
Shenzhen (1358A station) |
|
RMSE (µg/m³) |
RMSE (µg/m³) |
RMSE (µg/m³) |
|
|
CNN-LSTM |
56.548 |
46.248 |
23.841 |
|
ResNet-LSTM |
54.147 |
44.565 |
23.452 |
|
GCN-LSTM |
49.457 |
43.261 |
22.591 |
|
GLA-Net |
48.568 |
41.685 |
22.054 |
Thank you for the valuable comments from the editor, I will work harder in the future. I wish you good health!
Author Response File:
Author Response.pdf
Round 2
Reviewer 1 Report
Comments and Suggestions for AuthorsAfter thoroughly examining the authors' response letter and the revised manuscript, the authors have made considerable and laudable efforts to successfully address with the principal issues that had been raised in our first review. The alterations illustrate an increase in quality in several critical aspects. The revised manuscript is now suitable for publication in Atmosphere journal with minor revisions listed below.
Minor revisions required before publication:
- Authors should add a list of acronyms and define them only once in the text, when they are first used.
- The keywords are too generic, and key terms that would improve visibility are missing. You could add, for example, “air quality index prediction” or “Beijing-Shanghai-Shenzhen.”
- The caption for Figure 7 needs to be more detailed. Readers should be able to understand the figures without reading the main text. Describe (a), (b), ... and apply this to all figures.
- Improve the formatting of tables, such as Table 1. For temperature, use one decimal place (312.3 instead of 312.25) and do not repeat the unit (K) for the other columns.
Table 3: add the percentage improvement between GCN-LSTM and GLA-NET.
-Lines 217-220: the sentence is vague and lacks justification; the authors are invited to improve it.
-The seasonal analysis is presented in Figure 10 and the RMSE variation by season, but lacks explanation. The authors are invited to add more description after line 571.
- For section 6 (lines 684-711), we thank the authors for this good start, but we think the paragraph should be better structured with subheadings.
- Please avoid redundancies, for example (lines 305-307) and (467-469).
- Some techniques are described without citations, such as in lines 273 and 167. Please add references.
- Add a statement about the authors' contributions.
- The authors mentioned funding but not acknowledgments. Why?
Comments on the Quality of English Language
Author Response
Response to Comments on the Manuscript:
“Spatiotemporal Graph Convolutional Attention Network for Air Quality Index Prediction of Beijing, Shanghai and Shenzhen”
November 17, 2025
-------------------------------------------------------------------------------------------------------
The authors gratefully acknowledge the editors and the anonymous reviewers for their constructive comments. We have made a comprehensive revision for our previous manuscript. Specially, any revisions are highlighted using the "Track Changes" function in Microsoft Word. Please refer to the point by point response. Thank you for your time.
Response to comments by Reviewer #1:
We would like to gratefully thank the reviewer for his/her constructive comments and recommendations for improving the paper. A point-by-point response to the interesting comments raised by the reviewer follows.
Minor revisions required before publication:
Point 1. Authors should add a list of acronyms and define them only once in the text, when they are first used.
Response 1: Thank you for your suggestion. We agree with the reviewer. We sincerely thank the reviewer for this valuable suggestion. We have carefully revised the manuscript according to this comment. Specifically, we have added comprehensive definitions and introductions for all models and acronyms used in this study in Section 4.2.1 (Baseline Models). In this section, each acronym is fully defined when first introduced, including:
Machine learning models: SVM (Support Vector Machine), SVR (Support Vector Regression), XGBoost (Extreme Gradient Boosting), and CatBoost (Categorical Boosting)
Deep learning models: CNN (Convolutional Neural Network), LSTM (Long Short-Term Memory), CNN-LSTM, ResNet-LSTM, and GCN-LSTM (Graph Convolutional Network-LSTM); GLSTM-Block; GLSTM-Block-LSTM
Below are the additions. Please refer to the section 4.2.1 part of the revised manuscript.
4.2.1. Baseline model
To evaluate the performance and advantages of the proposed model, we compare it with the following state-of-the-art machine learning and deep learning models in air pol-lutant concentration prediction tasks.
(1) SVM (Support Vector Machine): Support Vector Machine is a classic machine learning algorithm that finds the optimal hyperplane to separate different classes. It has been widely applied in air quality classification and pattern recognition tasks.
(2) SVR (Support Vector Regression): Support Vector Regression is the regression variant of SVM, which uses the same principles to predict continuous values. It is commonly used as a baseline model in air pollutant concentration forecasting due to its robustness to outliers.
(3) XGBoost (Extreme Gradient Boosting): Extreme Gradient Boosting is an advanced ensemble learning algorithm based on gradient boosting decision trees. It has be-come one of the most popular machine learning models in air quality prediction tasks due to its high accuracy and efficiency.
(4) CatBoost (Categorical Boosting): Categorical Boosting is a gradient boosting algo-rithm that excels at handling categorical features. This model is selected for com-parison to demonstrate the performance of advanced ensemble methods in air pollutant concentration prediction.
(5) CNN (Convolutional Neural Network): Convolutional neural network is one of the most classic deep learning models in air pollutant concentration prediction tasks. It effectively captures spatial features and local patterns in sequential data.
(6) LSTM (Long Short-Term Memory): Long Short-Term Memory is a recurrent neural network architecture designed to capture long-term temporal dependencies. It is widely used as a baseline model for time series prediction in air quality forecast-ing.
(7) CNN-LSTM: Combine CNN with LSTM to build a hybrid deep learning model. This model leverages CNN for spatial feature extraction and LSTM for temporal dependency modeling, demonstrating the advantages of the hybrid architecture.
(8) ResNet-LSTM: Combine Residual Network with LSTM to build a hybrid deep learning model. This architecture uses ResNet's skip connections to extract deep features while LSTM captures temporal patterns, addressing the degradation problem in deep networks.
(9) GCN-LSTM: Combine Graph Convolutional Network with LSTM to build a hy-brid deep learning model. This model captures spatial correlations between mon-itoring stations through GCN and temporal dependencies through LSTM, suita-ble for multi-site air quality prediction.
(10) GLSTM-Block: This model is based on section 3.4 of the article.
(11) GLSTM-Block-LSTM: This model is GLA-Net without added temporal attention.
All baseline models were trained and tested using the same dataset, hyperparameters, and evaluation metrics as the proposed model to ensure a fair and comprehensive com-parison.
Point 2. The keywords are too generic, and key terms that would improve visibility are missing. You could add, for example, “air quality index prediction” or “Beijing-Shanghai-Shenzhen.”
Response 2: Thank you for your suggestion. We agree with the editor. We sincerely appreciate the reviewer's constructive suggestion to improve the visibility and specificity of our manuscript. Following this recommendation, we have revised the keywords section by adding more specific and descriptive terms:
"Air quality index prediction" - This term directly reflects the core objective of our study and enhances searchability for researchers in this specific field.
"Multi-city forecast" - This term emphasizes the multi-location prediction capability of our model, which is a key contribution of this work. These additions significantly enhance the precision and visibility of our manuscript, making it easier for researchers in air quality prediction and multi-city forecasting to discover our work.
Below are the additions. Please refer to the section ABSTRACT part of the revised manuscript.
Keywords: air quality index prediction; multi-city forecast; graph convolutional network; long short-term memory; temporal attention mechanism; spatial and temporal feature
Point 3. The caption for Figure 7 needs to be more detailed. Readers should be able to understand the figures without reading the main text. Describe (a), (b), ... and apply this to all figures.
Response 3: Thank you for your suggestion. We agree with the editor. We sincerely thank the reviewer for this important suggestion to improve the clarity and readability of our manuscript. We fully agree that figure captions should be self-explanatory and enable readers to understand the content without referring to the main text.
Following this recommendation, we have carefully revised the captions for all figures that contain multiple subfigures (a), (b), (c), etc.
Below are the additions. Please refer to the revised manuscript.
Point 4. Improve the formatting of tables, such as Table 1. For temperature, use one decimal place (312.3 instead of 312.25) and do not repeat the unit (K) for the other columns.
Response 4: Thank you for your suggestion. We agree with the editor. We sincerely thank the reviewer for this valuable suggestion to improve the presentation quality of our tables. Following this recommendation, we have revised Table 1 and all other relevant tables in the manuscript with the following improvements:
- Decimal places: All numerical values now display one decimal place (e.g., 312.3 instead of 312.25), providing sufficient precision while improving readability and consistency.
- Unit notation: The unit (K) is now indicated only once in the column header, and has been removed from individual data cells to avoid redundancy and enhance visual clarity.
These formatting improvements make the tables more concise, professional, and easier to read, which significantly enhances the overall presentation quality of the manuscript.
Below are the additions. Please refer to the section 2.2 part of the revised manuscript.
Point 5. Table 3: add the percentage improvement between GCN-LSTM and GLA-NET.
Response 5: Thank you for your suggestion. We agree with the editor. We sincerely thank the reviewer for this constructive suggestion. We fully understand the importance of clearly demonstrating the performance improvement of our proposed model.
While we did not add the percentage improvement directly in Table 3 to maintain the table's clarity and readability (as adding percentage calculations for all baseline models would make the table overly complex), we have addressed this concern by adding a detailed quantitative analysis in Section 4.2.2. Specifically, we added the following statement:
"Finally, compared to GCN-LSTM, GLA-Net improved the average prediction performance by 6.78%, demonstrating its superiority in extracting spatiotemporal features from time series data."
This approach allows readers to:
- Clearly view the raw performance metrics of all models in Table 3 for direct comparison
- Understand the specific percentage improvement of GLA-Net over the most competitive baseline (GCN-LSTM) through the detailed textual analysis
We believe this presentation strategy effectively highlights the advantages of our proposed model while maintaining the clarity and professional formatting of the table. If the reviewer still prefers to have this information in Table 3, we are happy to add an additional column showing the percentage improvement.
Below are the additions. Please refer to the section 4.2.2 part of the revised manuscript.
Point 6. Lines 217-220: the sentence is vague and lacks justification; the authors are invited to improve it.
Response 6: Thank you for your suggestion. We agree with the editor. This revision includes detailed explanations and justifications.
Below are the additions. Please refer to the section 2.3 part of the revised manuscript.
Lastly, all data were normalized to the range of [0,1] using the Min-Max scaling approach [49]. This normalization is necessary because the input features (e.g., temperature, humidity, pollutant concentrations) have different scales and units, which could cause features with larger numerical ranges to dominate the model training process. Normalization ensures that all features contribute equally to the model and accelerates convergence by improving numerical stability during gradient descent optimization. To maintain the te-poral continuity of the time series data, the dataset was split chronologically without shuffling into three sets: a training set (70%), a validation set (15%), and a test set (15%). This 70-15-15 split ratio is commonly adopted in time series prediction tasks, ensuring sufficient training samples while providing adequate data for model validation and independent testing [50]. The training set was used for model parameter learning, the validation set for hyperparameter tuning and preventing overfitting, and the test set for final performance evaluation.
Point 7. The seasonal analysis is presented in Figure 10 and the RMSE variation by season, but lacks explanation. The authors are invited to add more description after line 571.
Response 7: Thank you for your suggestion. We agree with the editor. We sincerely thank the reviewer for this constructive suggestion. Following this recommendation, we have added more detailed descriptions and explanations in the revised manuscript.
Below are the additions. Please refer to the section 4.3.3 part of the revised manuscript.
Seasonal analysis presented in Figure 10 demonstrates two key findings. The model exhibits clear temporal performance variations across all three cities, achieving lowest errors during summer months and highest errors in winter. This pattern remains consistent regardless of location. This seasonal variation can be attributed to several meteorological and atmospheric factors. During summer, air pollutant dispersion is enhanced by higher temperatures, stronger convection, and more stable weather patterns, leading to more predictable pollution dynamics. In contrast, winter is characterized by frequent temperature inversions, complex atmospheric boundary layer structures, and heating-related emission increases, which create more challenging forecasting conditions. The higher RMSE values in winter (Beijing: 18.72, Shanghai: 15.63, Shenzhen: 12.56) reflect these complexities, where the model encounters greater difficulty in capturing rapid pollution accumulation events and dispersion processes.
Additionally, Shenzhen maintains superior prediction quality throughout the annual cycle, with optimal performance occurring in summer (RMSE=3.68) and spring (RMSE=4.27). These results reflect both the region's relatively stable atmospheric conditions and the model's enhanced capability in forecasting moderate pollution scenarios. Specifically, Shenzhen's coastal location and subtropical climate contribute to more uniform meteorological conditions with less extreme seasonal variations compared to Beijing and Shanghai. The city experiences fewer severe pollution episodes and more consistent wind patterns that facilitate pollutant dispersion. Furthermore, the absence of winter heating-related emissions in Shenzhen reduces the seasonal volatility in pollution levels, resulting in a narrower RMSE range (3.68-12.56) across seasons compared to Beijing (8.91-18.72) and Shanghai (7.02-15.63). This demonstrates that the model performs more reliably in environments with moderate pollution levels and stable atmospheric dynamics, while it faces greater challenges in regions with high pollution variability and complex seasonal emission patterns.
Point 8. For section 6 (lines 684-711), we thank the authors for this good start, but we think the paragraph should be better structured with subheadings. This revision includes the addition of subheadings.
Response 8: Thank you for your suggestion. We agree with the editor. This revision adds two second-level headings.
Below are the additions. Please refer to the section 6 part of the revised manuscript.
- Limitations and Future Research Directions
6.1. Limitations
While GLA-Net demonstrates superior performance in AQI prediction by deeply mining the coupling relationship between pollutants and meteorological parameters and establishing a robust spatiotemporal correlation modeling framework, several limitations warrant acknowledgment:
(1) Graph Construction Limitations: Uniform Threshold Across Heterogeneous Net-works: Our average-distance threshold approach fails to adapt to varying station densities across regions. In densely monitored urban areas, this creates overly connected graphs with spatial redundancy, while in sparse rural regions, it may result in isolated nodes missing important connections. This uniform treatment does not account for the or-der-of-magnitude variation in station density between urban and rural areas.
Static Graph Structure: The current graph topology is fixed based solely on geo-graphic proximity, unable to adapt to dynamic pollutant dispersion patterns influenced by time-varying meteorological conditions (wind, atmospheric stability), topography, and emission sources. More sophisticated approaches could incorporate: (a) adaptive thresh-olds based on local station density; (b) meteorological features in graph construction; (c) data-driven correlation measures; or (d) fully learnable graph structures.
(2) Limited Cross-Regional Modeling: This study has not yet integrated multi-city ge-ographical correlation characteristics affecting long-range pollutant migration. Our cur-rent single-region graph structure may miss important inter-city connections where up-wind urban pollution affects downwind regions, limiting cross-regional collaborative prediction accuracy during regional pollution episodes.
6.2. Future Research Directions
To address these limitations, our research team plans to: (1) develop adaptive graph construction methods accounting for regional heterogeneity and meteorological factors; (2) explore dynamic, learnable graph structures; (3) introduce geospatial encoding techniques for multi-city topology networks; and (4) integrate physics-informed constraints from at-mospheric dispersion models. These enhancements aim to improve both local and cross-regional forecasting capabilities.
Point 9. Please avoid redundancies, for example (lines 305-307) and (467-469).
Response 9: Thank you for your suggestion. We agree with the editor. This revision modifies sections “437-469”.
Below are the additions. Please refer to the section 4.2.2 part of the revised manuscript.
520-522:We designed two ablation architectures, GLSTM-Block and GLSTM-Block-LSTM, to quantify the contribution of the temporal feature extraction component.
Point 10. Some techniques are described without citations, such as in lines 273 and 167. Please add references.
Response 10: Thank you for your suggestion. We agree with the reviewer. As required, relevant references were added.
Below are the additions. Please refer to the revised manuscript.
Layer normalization stabilizes the feature distribution at each layer, mitigating internal covariate shift and improving training convergence [37,45].
In order to avoid internal covariate shift issues and network information loss, the GLA-Net suggested in this study incorporates dense connections and layer normalization techniques across the network [45,54]
[37] Zou, G.; Lai, Z.; Ma, C.; Tu, M.; Fan, J.; Li, Y. When Will We Arrive? A Novel Multi-Task Spatio-Temporal Attention Network Based on Individual Preference for Estimating Travel Time. IEEE Transactions on Intelligent Transportation Systems 2023, 24, 11438-11452, doi:10.1109/TITS.2023.3276916.
[45] Ioffe, S.; Szegedy, C. Batch normalization: accelerating deep network training by reducing internal covariate shift. In Proceedings of the Proceedings of the 32nd International Conference on International Conference on Machine Learning - Volume 37, Lille, France, 2015; pp. 448–456.
[54] Huang, G.; Liu, Z.; Maaten, L.V.D.; Weinberger, K.Q. Densely Connected Convolutional Networks. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 21-26 July 2017, 2017; pp. 2261-2269.
Point 11. Add a statement about the authors' contributions.
Response 11: Thank you for your suggestion. We agree with the editor. As required, the authors' contributions were added.
Below are the additions. Please refer to the revised manuscript.
Author Contributions: Conceptualization, D. L.; methodology, J. W.; software, D. L.; validation, H. Y.; formal analysis, D. L.; investigation, D. L.; resources, D. L.; data curation, D. L.; writing—original draft preparation, D. L.; writing—review and editing, D. L.; visualization, M. L; supervision, G. Z; project ad-ministration, H. H.; funding acquisition, H. H.
Point 12. The authors mentioned funding but not acknowledgments. Why?
Response 12: Thank you for your suggestion. We agree with the editor. We thank the reviewer for this inquiry. We did not include a separate Acknowledgments section in the original submission because the journal does not mandate it as a required section. However, we have ensured that all essential information for research transparency and reproducibility is clearly stated in the manuscript:
Funding information has been explicitly declared, listing all financial support sources that enabled this research.
Data sources have been clearly documented in the section 2.2, including the specific databases and institutions from which the air quality and meteorological data were obtained, thereby facilitating reproducibility by other researchers.
We believe these disclosures fulfill the key purposes of acknowledgments—transparency regarding funding sources and data provenance. However, if the reviewer or editor considers a formal Acknowledgments section necessary, we would be happy to add one in the revised manuscript to thank the data providers and funding agencies in a more structured format.
Below are the additions. Please refer to the revised manuscript.
Institutional Review Board Statement: No applicable.
Informed Consent Statement: No applicable.
Data Availability Statement: All data generated or analyzed during this study are included in this article.
Conflicts of Interest: The authors declare no conflict of interest.
Thank you for the valuable comments from the editor, I will work harder in the future. I wish you good health!
Author Response File:
Author Response.pdf

