Air Pollutant Concentration Prediction Using a Generative Adversarial Network with Multi-Scale Convolutional Long Short-Term Memory and Enhanced U-Net
Abstract
1. Introduction
- (1)
- A novel generative adversarial architecture is proposed. The generator employs a multi-scale ConvLSTM integrated with a map masking layer, which enhances multi-scale spatiotemporal feature extraction while effectively suppressing boundary blurring in predictions. The discriminator utilizes an architecturally enhanced U-Net network. By incorporating spectral normalization and a raw score map output mechanism, it mitigates the premature convergence commonly encountered in traditional GAN during pixel-level training.
- (2)
- A joint optimization training mechanism is constructed. A composite loss function combining adversarial loss, feature matching loss, and spatiotemporal smoothness constraints is designed, significantly improving gradient propagation efficiency. Using historical pollutant grid sequences as sliding window conditions, the mechanism iteratively refines the generated sequences by leveraging the discriminator’s multi-level features, effectively alleviating memory degradation in long-term forecasting.
- (3)
- A comprehensive spatiotemporal PM2.5 concentration prediction framework is established. Based on CWGAN-GP, the framework enables refined spatiotemporal forecasting of atmospheric pollutant concentrations over long periods for target cities. Experimental results on real-world datasets demonstrate that the model achieves significantly higher accuracy in 12 h prediction tasks compared to various state-of-the-art deep learning models.
2. Study Area and Dataset Analysis
2.1. Study Area
2.2. Data Sources
2.3. Data Preprocessing
2.3.1. Concentration Grid Generation

2.3.2. Concentration Grid Processing
3. Methodology
3.1. Framework Overview
3.2. CWGAN-GP
3.3. Inception-Style ConvLSTM
3.3.1. Multi-Scale Feature Extraction and Fusion
3.3.2. Gated Spatiotemporal Modeling
3.4. U-Net with Architectural Enhancements
- (1)
- Spectral normalization and activation removal: Apply spectral normalization (SN) to all convolutional kernel weights and remove all activation functions within layers to enforce Lipschitz continuity throughout the discriminator. The normalized weights are computed as:where denotes the spectral norm (largest singular value) of weight matrix , approximated via power iteration; represents the spectrally normalized weights; and corresponds to the original convolutional kernel weights.
- (2)
- Dual-attention guidance mechanism based on map masking: Given that valid information in grid map datasets is only present within target regions, this study introduces a spatial attention mechanism constrained by map masks into the feature maps at various network layers, thereby enhancing the pixel-level feature extraction accuracy of both skip connections and convolutional layers. Additionally, a feature guidance mechanism is deployed along the skip connection paths, which dynamically adjusts the fusion ratio between encoder and decoder features through channel attention weights, effectively suppressing the transmission of redundant background information. The channel attention weight and the adaptively fused output feature map are defined by the following equations, respectively:where is the binary mask matrix (1 for target regions, 0 otherwise) with spatial dimensions (height) and (width) matching the current feature map; and are convolutional weight matrices; denotes the Hadamard product; represents channel-wise concatenation of encoder and decoder features; is the sigmoid activation function.
- (3)
- Instance normalization in shallow encoder layers: Instance Normalization (IN) layers are incorporated into shallow encoder blocks. The normalized output features are calculated as:where is the feature value at spatial position of channel for the t-th sample; and are the height and width of the feature map; and are the mean and variance for channel ; is the learnable scale parameter for channel , is the learnable shift parameter for channel , and is a numerical stability constant preventing division by zero.
- (4)
- Linear raw score map output: A linear convolutional transformation is applied to the final feature map to directly output a raw score map instead of probability values. This avoids discriminator saturation, preserves accurate expression of real/generated data discrepancies, and ensures continuous gradient signals for the generator:where is the convolutional kernel, denotes the convolution operation, is the final U-Net feature map (, and being height, width, and channel dimensions), and is the bias term.
3.5. Loss Function
3.6. Metrics
4. Results
4.1. Parameter Setting
4.2. Multi-Timescale Prediction
4.3. Model-Centric Performance Validation
4.4. Scenario-Oriented Robustness Evaluation
5. Discussion
5.1. Comparison with Previous Prediction Models
5.2. Long-Term Series Prediction and Model Comparison
6. Conclusions
- (1)
- The generator integrates physical diffusion mechanisms with deep learning. This integration enables adaptive capture of multi-scale spatial features in pollutant concentration fields.
- (2)
- The adversarial training framework provides strong robustness. It effectively learns nonlinear dynamic characteristics at concentration field boundaries. This capability suppresses error accumulation caused by traditional models’ reliance on stationarity assumptions.
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Ma, X.; Liu, H.; Peng, Z. Improving WRF-Chem PM2.5 predictions by combining data assimilation and deep-learning-based bias correction. Environ. Int. 2025, 195, 109199. [Google Scholar]
- Wu, D.; Zheng, H.; Li, Q.; Jin, L.; Lyu, R.; Ding, X.; Huo, Y.; Zhao, B.; Jiang, J.; Chen, J.; et al. Toxic potency-adjusted control of air pollution for solid fuel combustion. Nat. Energy 2022, 7, 194–202. [Google Scholar] [CrossRef]
- Fan, Y.; Chen, Z.; He, T. The Impact of Carbon-Emission Trading Scheme Policies on Air Quality in Chinese Cities. Sustainability 2024, 16, 10023. [Google Scholar] [CrossRef]
- Zhao, M.; Wang, K. Short-term effects of PM2.5 components on the respiratory infectious disease: A global perspective. Environ. Geochem. Health 2024, 46, 293. [Google Scholar] [CrossRef]
- Ding, L.; Fang, X.; Cheng, K. The impact of PM2.5 pollution on residents’ health and economic loss accounting in China. Econ. Geogr. 2021, 41, 82–92. [Google Scholar]
- Shi, H.; Chen, L.; Zhang, S.; Li, R.; Wu, Y.; Zou, H.; Wang, C.; Cai, M.; Lin, H. Dynamic association of ambient air pollution with incidence and mortality of pulmonary hypertension: A multistate trajectory analysis. Ecotoxicol. Environ. Saf. 2023, 262, 115126. [Google Scholar] [CrossRef]
- Wen, C.; Liu, S.; Yao, X.; Peng, L.; Li, X.; Hu, Y.; Chi, T. A novel spatiotemporal convolutional long short-term neural network for air pollution prediction. Sci. Total Environ. 2019, 654, 1091–1099. [Google Scholar] [CrossRef] [PubMed]
- Liu, X.; Li, W. MGC-LSTM: A deep learning model based on graph convolution of multiple graphs for PM2.5 prediction. Int. J. Environ. Sci. Technol. 2023, 20, 10297–10312. [Google Scholar] [CrossRef]
- Lolli, S. Urban PM2.5 concentration monitoring: A review of recent advances in ground-based, satellite, model, and machine learning integration. Urban Clim. 2025, 63, 102566. [Google Scholar] [CrossRef]
- Grell, G.A.; Peckham, S.E.; Schmitz, R.; McKeen, S.A.; Frost, G.; Skamarock, W.C.; Eder, B. Fully coupled "online" chemistry within the WRF model. Atmos. Environ. 2005, 39, 6957–6975. [Google Scholar] [CrossRef]
- Seinfeld, J.H.; Pandis, S.N. Atmospheric Chemistry and Physics: From Air Pollution to Climate Change; John Wiley & Sons: Hoboken, NJ, USA, 2016. [Google Scholar]
- Chi, X.; Li, Z.; Liu, H.; Chen, J.; Gao, J. Predicting air pollutant emissions of the foundry industry: Based on the electricity big data. Sci. Total Environ. 2024, 917, 170323. [Google Scholar] [CrossRef]
- Box, G.E.P.; Jenkins, G.M.; Reinsel, G.C. Time Series Analysis: Forecasting and Control, 5th ed.; John Wiley & Sons: Hoboken, NJ, USA, 2015. [Google Scholar]
- Kutner, M.H.; Nachtsheim, C.J.; Neter, J.; Li, W. Applied Linear Statistical Models, 5th ed.; McGraw-Hill Irwin: New York, NY, USA, 2004. [Google Scholar]
- Wang, J.; Ogawa, S. Effects of Meteorological Conditions on PM2.5 Concentrations in Nagasaki, Japan. Int. J. Environ. Res. Public Health 2015, 12, 9089–9101. [Google Scholar] [CrossRef]
- Brunsdon, C.; Fotheringham, A.S.; Charlton, M.E. Geographically Weighted Regression: A Method for Exploring Spatial Nonstationarity. Geogr. Anal. 1996, 28, 281–298. [Google Scholar] [CrossRef]
- Mogollón-Sotelo, C.; Casallas, A.; Vidal, S.; Celis, N.; Ferro, C.; Belalcazar, L. A support vector machine model to forecast ground-level PM2.5 in a highly populated city with a complex terrain. Air Qual. Atmos. Health 2021, 14, 399–409. [Google Scholar] [CrossRef]
- Wei, C.; Zhao, C.; Hu, Y.; Tian, Y. Predicting the Concentration Levels of PM2.5 and O3 for Highly Urbanized Areas Based on Machine Learning Models. Sustainability 2025, 17, 9211. [Google Scholar] [CrossRef]
- Kim, H.Y.; Won, C.H. Forecasting the volatility of stock price index: A hybrid model integrating LSTM with multiple GARCH-type models. Expert Syst. Appl. 2018, 103, 25–37. [Google Scholar] [CrossRef]
- Mao, X.; Liu, G.; Wang, J.; Lai, Y. BiTCN-ISInformer: A Parallel Model for Regional Air Pollutant Concentration Prediction Using Bidirectional Temporal Convolutional Network and Enhanced Informer. Sustainability 2025, 17, 8631. [Google Scholar] [CrossRef]
- Mu, L.; Bi, S.; Ding, X.; Xu, Y. Transformer-based ozone multivariate prediction considering interpretable and priori knowledge: A case study of Beijing, China. J. Environ. Manag. 2024, 366, 121883. [Google Scholar] [CrossRef]
- Xia, H.; Chen, X.; Wang, Z.; Chen, X.; Dong, F. A Multi-Modal Deep-Learning Air Quality Prediction Method Based on Multi-Station Time-Series Data and Remote-Sensing Images: Case Study of Beijing and Tianjin. Entropy 2024, 26, 91. [Google Scholar] [CrossRef] [PubMed]
- Su, I.F.; Chung, Y.C.; Lee, C.; Huang, P.M. Effective PM2.5 concentration forecasting based on multiple spatial–temporal GNN for areas without monitoring stations. Expert Syst. Appl. 2023, 234, 121074. [Google Scholar] [CrossRef]
- Zhang, B.; Zou, G.; Qin, D.; Ni, Q.; Mao, H.; Li, M. RCL-Learning: ResNet and convolutional long short-term memory-based spatiotemporal air pollutant concentration prediction model. Expert Syst. Appl. 2022, 207, 118017. [Google Scholar] [CrossRef]
- Kalajdjieski, J.; Zdravevski, E.; Corizzo, R.; Lameski, P.; Kalajdziski, S.; Pires, I.M.; Garcia, N.M.; Trajkovik, V. Air pollution prediction with multi-modal data and deep neural networks. Remote Sens. 2020, 12, 4142. [Google Scholar] [CrossRef]
- Yin, C.; Mao, Y.; Deng, L.; Chen, M.; Rong, Y.; He, X.; Zhou, X. STLLM-GAN: Spatio-temporal LLM Generative Adversarial Network for PM2.5 prediction. Expert Syst. Appl. 2025, 292, 128250. [Google Scholar] [CrossRef]
- Arjovsky, M.; Chintala, S.; Bottou, L. Wasserstein GAN. arXiv 2017, arXiv:1701.07875. [Google Scholar]
- Arjovsky, M.; Bottou, L. Towards Principled Methods for Training Generative Adversarial Networks. arXiv 2017, arXiv:1701.04862. [Google Scholar] [CrossRef]
- Schonfeld, E.; Schiele, B.; Khoreva, A. A u-net based discriminator for generative adversarial networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Online, 14–19 June 2020; pp. 8207–8216. [Google Scholar]
- Zhang, K.; Yang, X.; Cao, H.; Thé, J.; Tan, Z.; Yu, H. Multi-step forecast of PM2.5 and PM10 concentrations using convolutional neural network integrated with spatial–temporal attention and residual learning. Environ. Int. 2023, 171, 107691. [Google Scholar] [CrossRef]
- Yu, B.; Yin, H.; Zhu, Z. Spatio-temporal graph convolutional networks: A deep learning framework for traffic forecasting. arXiv 2017, arXiv:1709.04875. [Google Scholar]
- Huang, C.-J.; Kuo, P.-H. A deep CNN-LSTM model for particulate matter (PM2.5) forecasting in smart cities. Sensors 2018, 18, 2220. [Google Scholar] [CrossRef] [PubMed]
- Park, S.; Kim, M.; Kim, M.; Namgung, H.G.; Kim, K.T.; Cho, K.H.; Kwon, S.B. Predicting PM10 concentration in Seoul metropolitan subway stations using artificial neural network (ANN). J. Hazard. Mater. 2018, 341, 75–82. [Google Scholar] [CrossRef] [PubMed]
- Li, L.; Khalili, R.; Lurmann, F.; Pavlovic, N.; Wu, J.; Xu, Y.; Liu, Y.; O’Sharkey, K.; Ritz, B.; Oman, L.; et al. Knowledge-informed deep learning to mitigate bias in joint air pollutant prediction. Environ. Int. 2025, 206, 109915. [Google Scholar] [CrossRef]
- Fang, T.; Li, X.; Shi, C.; Zhang, X.; Xiao, W.; Kou, Y.; Mumtaz, I.; Huang, Z. Memo-UNet: Leveraging historical information for enhanced wave height prediction. Neurocomputing 2025, 634, 129840. [Google Scholar] [CrossRef]
- Tong, Q.; Wang, L.; Dai, Q.; Zheng, C.; Zhou, F. Enhanced cloud removal via temporal U-Net and cloud cover evolution simulation. Sci. Rep. 2025, 15, 4544. [Google Scholar] [CrossRef] [PubMed]
- Sahragard, E.; Farsi, H.; Mohamadzadeh, S. Advancing semantic segmentation: Enhanced UNet algorithm with attention mechanism and deformable convolution. PLoS ONE 2025, 20, e0305561. [Google Scholar] [CrossRef] [PubMed]
- Li, Q.; Zhu, H. Target classification with low-resolution radars based on cyclic bispectrum and improved ACGAN. Measurement 2025, 259, 119715. [Google Scholar] [CrossRef]
- Choudhury, A.; Middya, A.I.; Roy, S. Attention enhanced hybrid model for spatiotemporal short-term forecasting of particulate matter concentrations. Sustain. Cities Soc. 2022, 86, 104112. [Google Scholar] [CrossRef]










| Network Layer | Layer Hierarchy | Parameters | Input Size | Output Size |
|---|---|---|---|---|
| Input sequence | (2, 7/8, 1, 50, 75) | |||
| Encoder | IS ConvLSTMBlock (Layer 1) | f = [1 × 1, 3 × 3, 5 × 5]; s = 1; p = [1, 1, 2]; d = 32 | (2, 7/8, 1, 50, 75) | (2, 7/8, 16, 50, 75) |
| IS ConvLSTMBlock (Layer 2) | f = [1 × 1, 3 × 3, 5 × 5]; s = 1; p = [1, 1, 2]; d = 64 | (2, 7/8, 16, 50, 75) | (2, 7/8, 32, 50, 75) | |
| IS ConvLSTMBlock (Layer 3) | f = [1 × 1, 3 × 3, 5 × 5]; s = 1; p = [1, 1, 2]; d = 32 | (2, 7/8, 32, 50, 75) | (2, 7/8, 64, 50, 75) | |
| Decoder | IS ConvLSTMBlock (Layer 1) | f = [1 × 1, 3 × 3, 5 × 5]; s = 1; p = [1, 1, 2]; d = 64 | (2, 7/8, 64, 50, 75) | (2, 7/8, 32, 50, 75) |
| IS ConvLSTMBlock (Layer 2) | f = [1 × 1, 3 × 3, 5 × 5]; s = 1; p = [1, 1, 2]; d = 32 | (2, 7/8, 32, 50, 75) | (2, 7/8, 16, 50, 75) | |
| IS ConvLSTMBlock (Layer 3) | f = [1 × 1, 3 × 3, 5 × 5]; s = 1; p = [1, 1, 2]; d = 1 | (2, 7/8, 16, 50, 75) | (2, 7/8, 1, 50, 75) | |
| Mapping layer | seq_len = 7/8; pre_len = 3/12 | (2, 7/8, 50, 75, 1) | (2, 3/12, 50, 75, 1) | |
| Output sequence | (2, 3/12, 1, 50, 75) |
| Model | RMSE | MAE | R2 |
|---|---|---|---|
| ConvLSTM | 4.69 | 7.23 | 0.50 |
| CNN-LSTM | 5.88 | 8.10 | 0.78 |
| ConvGRU | 6.08 | 8.22 | 0.70 |
| STA-ResCNN [29] | 11.72 | 7.72 | - |
| ST-Transformer [30] | 6.92 | 4 | - |
| AGCTCN [31] | 8.75 | 11.76 | 0.64 |
| ICLU-CWGAN | 2.77 | 5.48 | 0.89 |
| Model | RMSE | MAE | R2 |
|---|---|---|---|
| ConvLSTM | 8.92 | 18.76 | 0.41 |
| CNN-LSTM | 10.45 | 9.73 | 0.66 |
| ConvGRU | 11.06 | 11.66 | 0.63 |
| ICLU-CWGAN | 4.61 | 6.42 | 0.80 |
| Observed Area | Observed Value (μg/m3) | Predicted Value (μg/m3) |
|---|---|---|
| Huangjiang | 16.9 | 14.9 |
| Hongmei | 31.3 | 34.2 |
| Qishi | 21.8 | 15.8 |
| Houjie | 19.3 | 20.8 |
| Zhongtang | 10.8 | 9.4 |
| Shilong | 7.3 | 16.1 |
| Qiaotou | 35.7 | 36.9 |
| Xiegang | 12.2 | 11.5 |
| Liaobu | 17.8 | 16.6 |
| Dalang | 16.6 | 12.1 |
| Changan | 3.1 | 11.2 |
| Humen | 14.9 | 8.8 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Zhang, J.; Su, P.; Wang, J.; Cai, Z. Air Pollutant Concentration Prediction Using a Generative Adversarial Network with Multi-Scale Convolutional Long Short-Term Memory and Enhanced U-Net. Sustainability 2025, 17, 11177. https://doi.org/10.3390/su172411177
Zhang J, Su P, Wang J, Cai Z. Air Pollutant Concentration Prediction Using a Generative Adversarial Network with Multi-Scale Convolutional Long Short-Term Memory and Enhanced U-Net. Sustainability. 2025; 17(24):11177. https://doi.org/10.3390/su172411177
Chicago/Turabian StyleZhang, Jiankun, Pei Su, Juexuan Wang, and Zhantong Cai. 2025. "Air Pollutant Concentration Prediction Using a Generative Adversarial Network with Multi-Scale Convolutional Long Short-Term Memory and Enhanced U-Net" Sustainability 17, no. 24: 11177. https://doi.org/10.3390/su172411177
APA StyleZhang, J., Su, P., Wang, J., & Cai, Z. (2025). Air Pollutant Concentration Prediction Using a Generative Adversarial Network with Multi-Scale Convolutional Long Short-Term Memory and Enhanced U-Net. Sustainability, 17(24), 11177. https://doi.org/10.3390/su172411177
