PM2.5 Concentration Prediction in the Cities of China Using Multi-Scale Feature Learning Networks and Transformer Framework
Abstract
1. Introduction
- A new air quality prediction framework called MSTTNet is proposed to model the local correlation and global dynamic change characteristics of air quality data in a fine-grained manner.
- MSTTNet innovatively integrates the multi-scale TCNs and transformer based on their architectural advantages. Multi-scale TCNs are designed to extract local anomalous mutability and spatial information for different dimensions. The transformer is adopted to adaptively learn and capture the significance of time steps and global features, enabling the weakening of the impact of noise information.
- Extensive experiments are performed among four benchmark air quality datasets to verify the prediction capacity of the MSTTNet. Numerical analyses manifest that the MSTTNet has superiority in comparison with its eight competitors.
2. Related Work
2.1. Statistical Analysis Models
2.2. Machine Learning Models
2.3. Hybrid Models
3. Proposed Method
3.1. Step 1: Data Preprocessing
3.2. Step 2: Feature Extraction
- 1. Multi-scale TCNs block
- 2. Transformer block

3.3. Step 3. PM Prediction
4. Experimentation and Results
4.1. Datasets Description and Evaluation Indicators
4.2. Baseline Models and Parameter Settings
4.3. Prediction Results Analysis and Comparisons
4.4. Performance Overhead Analysis
4.5. Parameter Sensitivity Analysis
- Regarding parameter ’epoch’: Simply expanding this parameter does not guarantee improved model accuracy. This performance limitation may derive from either inherent architectural deficiencies of the model or intrinsic data limitations. Beyond a certain point, the model’s performance plateaus and ceases to improve significantly with additional training iterations. For instance, when trained for 300 epochs, the model exhibits degraded performance compared to the 200 epochs configuration, suggesting the onset of overfitting beyond this optimal threshold. Consequently, we set the epoch value to 200 for the Beijing dataset to optimize the trade-off between potential accuracy and resource expenditure.
- Regarding parameter ’filter’: The proposed model’s parallel multi-scale design enables robust prediction performance despite employing limited filter quantities, demonstrating efficient feature extraction capability. Nevertheless, escalating filter quantities within the TCN layers fail to yield commensurate performance gains. More critically, this expansion significantly inflates the model’s parameter volume, thereby elevating the risk of overfitting. Furthermore, setting the number of filters excessively high introduces unnecessary computational cost and training time overhead. So, we empirically set the number of filters to 32 for the Beijing dataset.
- Regarding parameter ’head’: Increasing the number of heads in the transformer block does not guarantee improved model performance. An inappropriate number of heads, e.g., excessive or insufficient, may induce either overfitting or underfitting, both of which degrade model efficacy. Therefore, we empirically set the number of heads to 2 to balance performance and efficiency.
- Regarding the number of TCN blocks: Changing this structure parameter of the model (e.g., increasing TCN blocks quantities) does not bring desired accuracy improvement. Overly deep and complex model architectures increase the difficulty of model training, and the introduction of gradient issues can easily degrade model performance. More critically, blindly expanding the model structure also inflates the model’s parameter volume, thereby escalating computational costs. In this paper, we configure the TCN block to 2, as this architecture demonstrate acceptable prediction error during hyperparameter tuning.
- Regarding the number of transformer blocks: The transformer block quantities also affect the model prediction performance, as observed in Figure 10m–o. Increasing or decreasing the number of blocks cannot achieve the optimal performance trade-off. Moreover, increasing the number of model blocks will also increase computational overhead without bringing significant performance gains. Finally, we empirically set the number of transformer blocks to 2 to reach a trade-off between model performance and computational complexity.
- Regarding parameter ’time lag’: As the time lag increases, the performance of MSTTNet gradually deteriorates and fluctuates. Increasing the input length of historical data does not necessarily lead to performance improvements. This is mainly because the long historical window contains more noise data, which affects model learning and thus deteriorates model performance. Within the above range, when the time lag is equal to 36 h, the model prediction error is the largest among these parameters. However, the 12 and 24 h time lag windows achieve comparable prediction performance, indicating that the choice of a 24 h time lag in this paper is reasonable.

4.6. Ablation Analysis
- MSTTNet_1 (single-scale TCNs with transformer): This alternative abandons the multi-scale structure, which only retains the single-channel TCN, but the structure of the transformer block remains unchanged.
- MSTTNet_2 (multi-scale TCNs without transformer): This alternative does not consider the transformer block, which only retains the multi-scale TCNs architecture.
- MSTTNet_3 (transformer without multi-scale TCNs): To verify the effectiveness of the transformer module, this alternative does not consider the multi-scale TCN block, which only retains the transformer block.
5. Conclusions and Future Work
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Méndez, M.; Merayo, M.G.; Núñez, M. Machine learning algorithms to forecast air quality: A survey. Artif. Intell. Rev. 2023, 56, 10031–10066. [Google Scholar] [CrossRef] [PubMed]
- Mak, H.W.L.; Ng, D.C.Y. Spatial and socio-classification of traffic pollutant emissions and associated mortality rates in high-density hong kong via improved data analytic approaches. Int. J. Environ. Res. Public Health 2021, 18, 6532. [Google Scholar] [CrossRef] [PubMed]
- Cekim, H.O. Forecasting PM10 concentrations using time series models: A case of the most polluted cities in Turkey. Environ. Sci. Pollut. Res. Int. 2020, 27, 25612–25624. [Google Scholar] [CrossRef]
- Sohrab, S.; Csikós, N.; Szilassi, P. Landscape metrics as ecological indicators for PM10 prediction in European cities. Land 2024, 13, 2245. [Google Scholar] [CrossRef]
- Wei, Q.; Zhang, H.; Yang, J.; Niu, B.; Xu, Z. PM2.5 concentration prediction using a whale optimization algorithm based hybrid deep learning model in Beijing, China. Environ. Pollut. 2025, 371, 125953. [Google Scholar] [CrossRef]
- Du, S.; Li, T.; Yang, Y.; Horng, S.J. Deep air quality forecasting using hybrid deep learning framework. IEEE Trans. Knowl. Data Eng. 2019, 33, 2412–2424. [Google Scholar] [CrossRef]
- Ong, B.T.; Sugiura, K.; Zettsu, K. Dynamically pre-trained deep recurrent neural networks using environmental monitoring data for predicting PM2.5. Neural Comput. Appl. 2016, 27, 1553–1566. [Google Scholar]
- Govande, A.; Attada, R.; Shukla, K.K. Predicting PM2.5 levels over Indian metropolitan cities using Recurrent Neural Networks. Earth Sci. Inform. 2025, 18, 1. [Google Scholar] [CrossRef]
- Lin, M.D.; Liu, P.Y.; Huang, C.W.; Lin, Y.H. The application of strategy based on LSTM for the short-term prediction of PM2.5 in city. Sci. Total Environ. 2024, 906, 167892. [Google Scholar] [CrossRef]
- He, J.; Zhang, S.; Yu, M.; Liang, Q.; Cao, M.; Xu, H.; Liu, Z.; Liu, J. Predicting indoor PM2.5 levels in shared office using LSTM method. J. Build. Eng. 2025, 104, 112407. [Google Scholar] [CrossRef]
- Wang, X.; Yan, J.; Wang, X.; Wang, Y. Air quality forecasting using the GRU model based on multiple sensors nodes. IEEE Sens. Lett. 2023, 7, 6003804. [Google Scholar] [CrossRef]
- Liu, B.; Yan, S.; Li, J.; Li, Y.; Lang, J.; Qu, G. A spatiotemporal recurrent neural network for prediction of atmospheric PM2.5: A case study of Beijing. IEEE Trans. Comput. Soc. Syst. 2021, 8, 578–588. [Google Scholar] [CrossRef]
- Amnuaylojaroen, T. Prediction of PM2.5 in an urban area of northern Thailand using multivariate linear regression model. Adv. Meteorol. 2022, 2022, 3190484. [Google Scholar] [CrossRef]
- Hao, X.; Hu, X.; Liu, T.; Wang, C.; Wang, L. Estimating urban PM2.5 concentration: An analysis on the nonlinear effects of explanatory variables based on gradient boosted regression tree. Urban Clim. 2022, 44, 101172. [Google Scholar] [CrossRef]
- Wang, L.; Jin, X.; Huang, Z.; Zhu, H.; Chen, Z.; Liu, Y.; Feng, H. Short-Term PM2.5 prediction based on multi-modal meteorological data for consumer-grade meteorological electronic systems. IEEE Trans. Consum. Electr. 2024, 70, 3464–3474. [Google Scholar] [CrossRef]
- Xia, Y.; McCracken, T.; Liu, T.; Chen, P.; Metcalf, A.; Fan, C. Understanding the disparities of PM2.5 air pollution in urban areas via deep support vector regression. Environ. Sci. Technol. 2024, 58, 8404–8416. [Google Scholar] [CrossRef] [PubMed]
- Zaman, N.A.F.K.; Kanniah, K.D.; Kaskaoutis, D.G.; Latif, M.T. Improving the quantification of fine particulates (PM2.5) concentrations in Malaysia using simplified and computationally efficient models. J. Clean. Prod. 2024, 448, 141559. [Google Scholar] [CrossRef]
- Zhang, M.; Wu, D.; Xue, R. Hourly prediction of PM2.5 concentration in Beijing based on Bi-LSTM neural network. Multimed. Tools Appl. 2021, 80, 24455–24468. [Google Scholar] [CrossRef]
- Kumar, S.; Kumar, V. Multi-view Stacked CNN-BiLSTM (MvS CNN-BiLSTM) for urban PM2.5 concentration prediction of India’s polluted cities. J. Clean. Prod. 2024, 444, 141259. [Google Scholar] [CrossRef]
- Zhu, M.; Xie, J. Investigation of nearby monitoring station for hourly PM2.5 forecasting using parallel multi-input 1D-CNN-biLSTM. Expert Syst. Appl. 2023, 211, 118707. [Google Scholar] [CrossRef]
- Wu, S.; Li, H. Prediction of PM2.5 concentration in urban agglomeration of China by hybrid network model. J. Clean. Prod. 2022, 374, 133968. [Google Scholar] [CrossRef]
- Bai, S.; Kolter, J.Z.; Koltun, V. An empirical evaluation of generic convolutional and recurrent networks for sequence modeling. arXiv 2018, arXiv:1803.01271. [Google Scholar] [CrossRef]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. Adv. Neural Inf. Process Syst. 2017, 30, 6000–6010. [Google Scholar]
- Jia, K.; Yu, X.; Zhang, C.; Xie, W.; Zhao, D.; Xiang, J. TTAFPred: Prediction of time to aging failure for software systems based on a two-stream multi-scale features fusion network. Softw. Qual. J. 2024, 32, 1481–1513. [Google Scholar] [CrossRef]
- Liu, X.; Zhi, X.; Zhou, T.; Zhao, L.; Tian, L.; Gao, R.; Luo, J.; Cui, W.; Wang, Q. A holistic air monitoring dataset with complaints and POIs for anomaly detection and interpretability tracing. Sci. Data 2025, 12, 1288. [Google Scholar] [CrossRef]
- Peng, H.; Jiang, B.; Mao, Z.; Liu, S. Local enhancing transformer with temporal convolutional attention mechanism for bearings remaining useful life prediction. IEEE Trans. Instrum. Meas. 2023, 72, 3522312. [Google Scholar] [CrossRef]
- Sun, L.; Liu, M.; Liu, G.; Chen, X.; Yu, X. FD-TGCN: Fast and dynamic temporal graph convolution network for traffic flow prediction. Inf. Fusion 2024, 106, 102291. [Google Scholar] [CrossRef]
- Zhang, Q.; Liu, Q.; Ye, Q. An attention-based temporal convolutional network method for predicting remaining useful life of aero-engine. Eng. Appl. Artif. Intell. 2024, 127, 107241. [Google Scholar] [CrossRef]
- Yin, Z.; Kong, X.; Yin, C. Semi-supervised log anomaly detection based on bidirectional temporal convolution network. Comput. Secur. 2024, 140, 103808. [Google Scholar] [CrossRef]
- Li, L.; Li, Y.; Mao, R.; Li, L.; Hua, W.; Zhang, J. Remaining useful life prediction for lithium-ion batteries with a hybrid model based on TCN-GRU-DNN and dual attention mechanism. IEEE Trans. Transp. Electrif. 2023, 9, 4726–4740. [Google Scholar] [CrossRef]
- Li, Z.; Xie, Y.; Zhang, W.E.; Wang, P.; Zou, L.; Li, F.; Luo, X.; Li, C. Disentangle interest trend and diversity for sequential recommendation. Inf. Process. Manag. 2024, 61, 103619. [Google Scholar] [CrossRef]
- Akbar, S.; Zou, Q.; Raza, A.; Alarfaj, F.K. iAFPs-Mv-BiTCN: Predicting antifungal peptides using self-attention transformer embedding and transform evolutionary based multi-view features with bidirectional temporal convolutional networks. Artif. Intell. Med. 2024, 151, 102860. [Google Scholar] [CrossRef]
- Li, H.; Wu, X.J. CrossFuse: A novel cross attention mechanism based infrared and visible image fusion approach. Inf. Fusion 2024, 103, 102147. [Google Scholar] [CrossRef]
- Guo, Z.; Liu, Q.; Zhang, L.; Li, Z.; Li, G. L-tla: A lightweight driver distraction detection method based on three-level attention mechanisms. IEEE Trans. Reliab. 2024, 73, 1731–1742. [Google Scholar] [CrossRef]
- Kang, H.; Kang, P. Transformer-based multivariate time series anomaly detection using inter-variable attention mechanism. Knowl.-Based Syst. 2024, 290, 111507. [Google Scholar] [CrossRef]
- Sheng, Z.; Cao, Y.; Yang, Y.; Feng, Z.K.; Shi, K.; Huang, T.; Wen, S. Residual temporal convolutional network with dual attention mechanism for multilead-time interpretable runoff forecasting. IEEE Trans. Neural Netw. Learn Syst. 2024, 36, 8757–8771. [Google Scholar] [CrossRef]
- Yuan, X.; Luo, Z.; Zhang, N.; Guo, G.; Wang, L.; Li, C.; Niyato, D. Federated Transfer Learning for Privacy-Preserved Cross-City Traffic Flow Prediction. IEEE Trans. Intell. Transp. Syst. 2025. [Google Scholar] [CrossRef]
- Zhang, Z.; Song, W.; Wu, Q.; Sun, W.; Li, Q.; Jia, L. A novel local enhanced channel self-attention based on Transformer for industrial remaining useful life prediction. Eng. Appl. Artif. Intell. 2025, 141, 109815. [Google Scholar] [CrossRef]
- Luo, Q.; He, S.; Han, X.; Wang, Y.; Li, H. LSTTN: A long-short term transformer-based spatiotemporal neural network for traffic flow forecasting. Knowl.-Based Syst. 2024, 293, 111637. [Google Scholar] [CrossRef]
- Model, A. Forecasting Air Quality of Delhi Using. Advances in Data Sciences, Security and Applications: Proceedings of ICDSSA 2019; Springer: Berlin/Heidelberg, Germany, 2019; Volume 612, p. 315. [Google Scholar]
- Aladağ, E. Forecasting of particulate matter with a hybrid ARIMA model based on wavelet transformation and seasonal adjustment. Urban Clim. 2021, 39, 100930. [Google Scholar] [CrossRef]
- Abdullah, S.; Napi, N.N.L.M.; Ahmed, A.N.; Mansor, W.N.W.; Mansor, A.A.; Ismail, M.; Abdullah, A.M.; Ramly, Z.T.A. Development of multiple linear regression for particulate matter (PM10) forecasting during episodic transboundary haze event in Malaysia. Atmosphere 2020, 11, 289. [Google Scholar] [CrossRef]
- Zhou, W.; Wu, X.; Ding, S.; Cheng, Y. Predictive analysis of the air quality indicators in the Yangtze River Delta in China: An application of a novel seasonal grey model. Sci. Total Environ. 2020, 748, 141428. [Google Scholar] [CrossRef]
- Talepour, N.; Birgani, Y.T.; Kelly, F.J.; Jaafarzadeh, N.; Goudarzi, G. Analyzing meteorological factors for forecasting PM10 and PM2.5 levels: A comparison between MLR and MLP models. Earth Sci. Inform. 2024, 17, 5603–5623. [Google Scholar] [CrossRef]
- Zheng, Y.; Yi, X.; Li, M.; Li, R.; Shan, Z.; Chang, E.; Li, T. Forecasting fine-grained air quality based on big data. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Sydney, NSW, Australia, 10–13 August 2015; pp. 2267–2276. [Google Scholar]
- Samal, R.; Krishna, K. Auto imputation enabled deep Temporal Convolutional Network (TCN) model for PM2.5 forecasting. EAI Endorsed Trans. Scalable Inf. Syst. 2025, 12. [Google Scholar] [CrossRef]
- Pak, U.; Ma, J.; Ryu, U.; Ryom, K.; Juhyok, U.; Pak, K.; Pak, C. Deep learning-based PM2.5 prediction considering the spatiotemporal correlations: A case study of Beijing, China. Sci. Total Environ. 2020, 699, 133561. [Google Scholar] [CrossRef] [PubMed]
- Qi, Y.; Li, Q.; Karimian, H.; Liu, D. A hybrid model for spatiotemporal forecasting of PM2.5 based on graph convolutional neural network and long short-term memory. Sci. Total Environ. 2019, 664, 1–10. [Google Scholar] [CrossRef]
- Ren, Y.; Wang, S.; Xia, B. Deep learning coupled model based on TCN-LSTM for particulate matter concentration prediction. Atmos Pollut. Res. 2023, 14, 101703. [Google Scholar] [CrossRef]
- Li, A.; Wang, Y.; Qi, Q.; Li, Y.; Jia, H.; Zhou, X.; Guo, H.; Xie, S.; Liu, J.; Mu, Y. Improved PM2.5 prediction with spatio-temporal feature extraction and chemical components: The RCG-attention model. Sci. Total Environ. 2024, 955, 177183. [Google Scholar] [CrossRef]
- Nirmala, G.; Nayudu, P.P.; Kumar, A.R.; Sagar, R. Automatic cervical cancer classification using adaptive vision transformer encoder with CNN for medical application. Pattern Recogn. 2025, 160, 111201. [Google Scholar] [CrossRef]
- Liu, Z.; Feng, Y.; Liu, H.; Tang, R.; Yang, B.; Zhang, D.; Jia, W.; Tan, J. TVC Former: A transformer-based long-term multivariate time series forecasting method using time-variable coupling correlation graph. Knowl.-Based Syst. 2025, 314, 113147. [Google Scholar] [CrossRef]
- Liang, X.; Li, S.; Zhang, S.; Huang, H.; Chen, S.X. PM2.5 data reliability, consistency, and air quality assessment in five Chinese cities. J. Geophys. Res. 2016, 121, 10–220. [Google Scholar] [CrossRef]
- Lu, Y.; Wang, J.; Wang, D.; Yoo, C.; Liu, H. Incorporating temporal multi-head self-attention convolutional networks and LightGBM for indoor air quality prediction. Appl. Soft. Comput. 2024, 157, 111569. [Google Scholar] [CrossRef]
- Zou, R.; Huang, H.; Lu, X.; Zeng, F.; Ren, C.; Wang, W.; Zhou, L.; Dai, X. PD-LL-Transformer: An Hourly PM2. 5 Forecasting Method over the Yangtze River Delta Urban Agglomeration, China. Remote Sens. 2024, 16, 1915. [Google Scholar] [CrossRef]
- Sohrab, S.; Csikós, N.; Szilassi, P. Effect of geographical parameters on PM10 pollution in European landscapes: A machine learning algorithm-based analysis. Environ. Sci. Eur. 2024, 36, 152. [Google Scholar] [CrossRef]
- Shetty, S.; Schneider, P.; Stebel, K.; Hamer, P.D.; Kylling, A.; Berntsen, T.K. Estimating surface NO2 concentrations over Europe using Sentinel-5P TROPOMI observations and Machine Learning. Remote Sens. Environ. 2024, 312, 114321. [Google Scholar] [CrossRef]
- Panaite, F.A.; Rus, C.; Leba, M.; Ionica, A.C.; Windisch, M. Enhancing air-quality predictions on university campuses: A machine-learning approach to PM2. 5 forecasting at the University of Petroșani. Sustainability 2024, 16, 7854. [Google Scholar] [CrossRef]
- Owusu-Sekyere, K.; Chen, Y.; Tian, J.; Wang, J.; Dong, Q.; Wang, Z. A comprehensive study of interpolation methods in electrohydrodynamic cone-jet across diverse liquid conductivities. Phys. Fluids 2025, 37, 082071. [Google Scholar] [CrossRef]
- Sun, Y.; Li, J.; Xu, Y.; Zhang, T.; Wang, X. Deep learning versus conventional methods for missing data imputation: A review and comparative study. Expert Syst. Appl. 2023, 227, 120201. [Google Scholar] [CrossRef]
- Xue, Y.; Tang, Y.; Xu, X.; Liang, J.; Neri, F. Multi-objective feature selection with missing data in classification. IEEE Trans. Emerg. Top. Comput. Intell. 2021, 6, 355–364. [Google Scholar] [CrossRef]
- Hung, C.Y.; Wang, C.C.; Lin, S.W.; Jiang, B.C. An empirical comparison of the sales forecasting performance for plastic tray manufacturing using missing data. Sustainability 2022, 14, 2382. [Google Scholar] [CrossRef]
- Chen, Y.; Ye, C.; Wang, W.; Yang, P. Research on air quality prediction model based on bidirectional gated recurrent unit and attention mechanism. In Proceedings of the 4th International Conference on Advances in Image Processing, Chengdu, China, 13–15 November 2020; pp. 172–177. [Google Scholar]
- Mak, H.W.L.; Laughner, J.L.; Fung, J.C.H.; Zhu, Q.; Cohen, R.C. Improved satellite retrieval of tropospheric NO2 column density via updating of air mass factor (AMF): Case study of Southern China. Remote Sens. 2018, 10, 1789. [Google Scholar] [CrossRef]








| Variable Type | Variable Name | Data Type |
|---|---|---|
| Air quality data | PM2.5 | numerical |
| Meteorological data | Dew Point | numerical |
| Temperature | numerical | |
| Humidity | numerical | |
| Pressure | numerical | |
| Combined wind direction | Categorical (N/E/S/W/SE/NE/SW/NW) | |
| Cumulated wind speed | numerical | |
| hourly precipitation | numerical | |
| Cumulated precipitation | numerical | |
| Timestamp | year | numerical |
| month | numerical | |
| day | numerical | |
| hour | numerical | |
| season | numerical |
| Parameters | Beijing | Shanghai | Chengdu | Guangzhou |
|---|---|---|---|---|
| Epoch | 200 | 150 | 200 | 300 |
| Head | 2 | 2 | 3 | 2 |
| Filter | 32 | 32 | 32 | 32 |
| TCN block | 2 | 2 | 2 | 2 |
| Transformer block | 2 | 2 | 2 | 2 |
| Kernel size | (1, 3, 5) | (1, 3, 5) | (1, 3, 5) | (1, 3, 5) |
| Time lag | 24 | |||
| Optimizer | Adam | |||
| Batch size | 32 | |||
| Learning rate | 0.001 | |||
| Datasets | Indicators | MLP | LSTM | GRU | BiLSTM | BiGRU- Attention | 1DCNN | 1DCNN- LSTM | 1DCNN- BiLSTM- Attention | MSTTNet (Proposed) |
|---|---|---|---|---|---|---|---|---|---|---|
| Beijing | MAE | 11.85 | 11.59 | 11.95 | 11.37 | 11.71 | 11.81 | 11.69 | 13.93 | 11.06 |
| RMSE | 21.79 | 21.61 | 22.13 | 21.37 | 21.85 | 21.39 | 21.53 | 22.68 | 20.85 | |
| MAPE | 29.71 | 26.72 | 27.80 | 25.26 | 25.40 | 29.12 | 26.18 | 35.08 | 23.37 | |
| 0.9337 | 0.9348 | 0.9316 | 0.9369 | 0.9346 | 0.9361 | 0.9353 | 0.9326 | 0.9393 | ||
| Shanghai | MAE | 7.21 | 7.04 | 7.10 | 6.98 | 7.07 | 8.63 | 7.58 | 7.70 | 6.82 |
| RMSE | 11.56 | 11.55 | 11.51 | 11.46 | 11.40 | 12.50 | 11.62 | 11.71 | 11.09 | |
| MAPE | 19.86 | 18.68 | 18.62 | 18.40 | 18.81 | 28.21 | 22.83 | 23.39 | 19.31 | |
| 0.9079 | 0.9080 | 0.9088 | 0.9095 | 0.9103 | 0.8920 | 0.9064 | 0.9056 | 0.9152 | ||
| Chengdu | MAE | 9.00 | 8.94 | 8.87 | 8.68 | 8.78 | 9.10 | 8.89 | 9.76 | 8.53 |
| RMSE | 13.11 | 12.79 | 12.82 | 12.63 | 12.76 | 13.09 | 12.78 | 13.51 | 12.38 | |
| MAPE | 18.62 | 18.50 | 17.42 | 17.10 | 17.13 | 19.43 | 18.13 | 22.34 | 17.63 | |
| 0.9325 | 0.9356 | 0.9353 | 0.9373 | 0.9360 | 0.9325 | 0.9358 | 0.9283 | 0.9398 | ||
| Guangzhou | MAE | 6.13 | 5.83 | 5.87 | 5.80 | 5.91 | 6.08 | 5.83 | 6.61 | 5.63 |
| RMSE | 9.36 | 8.66 | 8.70 | 8.62 | 8.81 | 9.24 | 8.71 | 9.25 | 8.53 | |
| MAPE | 22.99 | 22.30 | 22.46 | 22.72 | 22.68 | 24.21 | 22.08 | 29.70 | 19.98 | |
| 0.8958 | 0.9107 | 0.9100 | 0.9115 | 0.9077 | 0.8981 | 0.9098 | 0.8998 | 0.9135 |
| Indicators | MLP | LSTM | GRU | BiLSTM | BiGRU-Attention | 1DCNN | 1DCNN-LSTM | 1DCNN-BiLSTM-Attention | |
|---|---|---|---|---|---|---|---|---|---|
| Imp. | MAE | 6.36% | 3.93% | 4.83% | 2.42% | 4.17% | 10.25% | 5.72% | 14.87% |
| RMSE | 5.70% | 3.05% | 3.70% | 2.17% | 3.36% | 6.73% | 3.23% | 7.38% | |
| MAPE | 10.62% | 6.06% | 5.51% | 2.87% | 3.57% | 19.50% | 9.60% | 26.15% | |
| 1.04% | 0.51% | 0.60% | 0.34% | 0.52% | 1.36% | 0.56% | 1.13% |
| Model | MLP | LSTM | GRU | BiLSTM | BiGRU-Attention |
|---|---|---|---|---|---|
| Parameter Scale | 1057 | 5281 | 14,273 | 37,505 | 41,049 |
| Model | 1DCNN | 1DCNN- LSTM | 1DCNN- BiLSTM- Attention | MSTTNet | |
| Parameter Scale | 1185 | 9153 | 27,357 | 43,041 |
| Datasets | Indicators | 1DCNN-BiLSTM-Attention | MSTTNet_1 | MSTTNet_2 | MSTTNet_3 | MSTTNet (Proposed) |
|---|---|---|---|---|---|---|
| Beijing | MAE | 13.93 | 11.29 | 11.36 | 12.42 | 11.06 |
| RMSE | 22.68 | 20.78 | 21.36 | 21.74 | 20.85 | |
| MAPE | 35.08 | 27.10 | 25.19 | 34.73 | 23.37 | |
| 0.9326 | 0.9397 | 0.9363 | 0.9340 | 0.9393 | ||
| Shanghai | MAE | 7.70 | 6.94 | 7.63 | 7.28 | 6.82 |
| RMSE | 11.71 | 11.11 | 11.77 | 11.67 | 11.09 | |
| MAPE | 23.39 | 20.84 | 23.08 | 20.00 | 19.31 | |
| 0.9056 | 0.9149 | 0.9046 | 0.9062 | 0.9152 | ||
| Chengdu | MAE | 9.76 | 8.62 | 8.63 | 9.01 | 8.53 |
| RMSE | 13.51 | 12.42 | 12.58 | 12.99 | 12.38 | |
| MAPE | 22.34 | 18.31 | 16.86 | 16.38 | 17.63 | |
| 0.9283 | 0.9394 | 0.9378 | 0.9337 | 0.9398 | ||
| Guangzhou | MAE | 6.61 | 5.85 | 6.08 | 5.66 | 5.63 |
| RMSE | 9.25 | 8.70 | 8.95 | 8.67 | 8.53 | |
| MAPE | 29.70 | 19.89 | 23.47 | 21.62 | 19.98 | |
| 0.8998 | 0.9101 | 0.9048 | 0.9105 | 0.9135 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Wang, Z.; Jia, K.; Zhang, W.; Zhang, C. PM2.5 Concentration Prediction in the Cities of China Using Multi-Scale Feature Learning Networks and Transformer Framework. Sustainability 2025, 17, 8891. https://doi.org/10.3390/su17198891
Wang Z, Jia K, Zhang W, Zhang C. PM2.5 Concentration Prediction in the Cities of China Using Multi-Scale Feature Learning Networks and Transformer Framework. Sustainability. 2025; 17(19):8891. https://doi.org/10.3390/su17198891
Chicago/Turabian StyleWang, Zhaohan, Kai Jia, Wenpeng Zhang, and Chen Zhang. 2025. "PM2.5 Concentration Prediction in the Cities of China Using Multi-Scale Feature Learning Networks and Transformer Framework" Sustainability 17, no. 19: 8891. https://doi.org/10.3390/su17198891
APA StyleWang, Z., Jia, K., Zhang, W., & Zhang, C. (2025). PM2.5 Concentration Prediction in the Cities of China Using Multi-Scale Feature Learning Networks and Transformer Framework. Sustainability, 17(19), 8891. https://doi.org/10.3390/su17198891

