Predicting the Concentration Levels of PM2.5 and O3 for Highly Urbanized Areas Based on Machine Learning Models
Abstract
1. Introduction
2. Materials and Methods
2.1. Study Area and Dataset
2.2. Models Based on Machine Learning Algorithms
2.2.1. Fundamental Principles of the Modeling Framework and Variable Selection
2.2.2. Necessity and Advantages of Lagged Variables in Model II
- (1)
- Capturing long-range transport and cumulative effects: Lagged variables are indispensable for modeling cross-regional pollution transport. For instance, dust plumes or industrial emissions from upwind regions may take 6–18 h to reach the monitoring site, with day (t−1) PM10 levels serving as a proxy for day t PM2.5 impacts [54].
- (2)
- Enhancing the prediction stability: By incorporating historical data, Model II mitigates the impact of real-time data anomalies (e.g., sensor malfunctions) and captures seasonal emission patterns (e.g., winter coal heating cycles), thereby improving forecast robustness.
2.2.3. Introduction to Model Design
2.3. Model Performance Evaluation
3. Results and Discussion
3.1. The Temporal and Spatial Distribution of PM2.5 and O3
3.2. PM2.5 and O3 Predictions Using Machine Learning Algorithms
3.3. Major Impact Factors of PM2.5 and O3 Predictions
3.4. Policy Implications
Suggestions for Different Seasons
- Strengthening Winter Pollution Prevention and Control
- 2.
- Consolidating Summer Pollution Prevention and Control
- 3.
- Responding to Interannual Fluctuations
3.5. Suggestions Based on PM2.5 and O3 Concentration Differences Among Cities
- For PM2.5-Focused Cities (Handan, Xingtai, and Shijiazhuang)
- 2.
- For Low-PM2.5 Cities (Chengde and Zhangjiakou)
- 3.
- For High-O3 Cities (Cangzhou, Hengshui, and Xingtai)
4. Conclusions
- Spatiotemporal Patterns of Pollutants
- 2.
- Machine Learning Models for Pollution Prediction
- 3.
- Key Influencing Factors
- 4.
- Policy Implications and Future Directions
Supplementary Materials
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Li, C.; van Donkelaar, A.; Hammer, M.S.; McDuffie, E.E.; Burnett, R.T.; Spadaro, J.V.; Chatterjee, D.; Cohen, A.J.; Apte, J.S.; Southerland, V.A.; et al. Reversal of trends in global fine particulate matter air pollution. Nat. Commun. 2023, 14, 5349. [Google Scholar] [CrossRef]
- Zhao, X.; Zhang, Z.; Xu, J.; Gao, J.; Cheng, S.; Zhao, X.; Xia, X.; Hu, B. Impacts of aerosol direct effects on PM2.5 and O3 respond to the reductions of different primary emissions in Beijing-Tianjin-Hebei and surrounding area. Atmos. Environ. 2023, 309, 119948. [Google Scholar] [CrossRef]
- Wang, S.; Ren, Y.; Xia, B. PM2.5 and O3 concentration estimation based on interpretable machine learning. Atmos. Pollut. Res. 2023, 14, 101866. [Google Scholar] [CrossRef]
- Zeng, Q.; Li, Y.; Tao, J.; Fan, M.; Chen, L.; Wang, L.; Wang, Y. Full-coverage estimation of PM2.5 in the Beijing-Tianjin-Hebei region by using a two-stage model. Atmos. Environ. 2023, 309, 119956. [Google Scholar] [CrossRef]
- Luo, Z.; Lu, P.; Chen, Z.; Liu, R. Ozone Concentration Estimation and Meteorological Impact Quantification in the Beijing-Tianjin-Hebei Region Based on Machine Learning Models. Earth Space Sci. 2024, 11, e2023EA003346. [Google Scholar] [CrossRef]
- Cao, J.J.; Lee, S.C.; Ho, K.F.; Fung, K. Characteristics, sources, and health impacts of atmospheric particulate matter in China. Sci. Total Environ. 2019, 659, 400–413. [Google Scholar]
- Wang, G.H.; Zhang, Y.H.; Zhang, Q.; Zheng, B.; He, K.B.; Cofala, J. China’s anthropogenic sulfur dioxide, nitrogen oxides, and primary fine particulate matter emissions, 1990–2015: A high-resolution emission inventory. Atmos. Chem. Phys. 2017, 17, 11925–11952. [Google Scholar]
- Liu, F.; Zhang, Y.; Zhang, Q.; Wang, S.; He, K. Vehicle emission control in China: Review and outlook. Sci. Total Environ. 2018, 619–620, 1195–1208. [Google Scholar] [CrossRef]
- Li, X.; Zhang, Q.; He, K.B.; Zheng, B.; Streets, D.G. Anthropogenic mercury emissions in China during 1990–2010: A provincial-level inventory. Atmos. Chem. Phys. 2016, 16, 15609–15622. [Google Scholar]
- Crutzen, P.J. The role of NO and NO2 in the chemistry of the troposphere and stratosphere. Ann. N. Y. Acad. Sci. 1979, 322, 16–47. [Google Scholar] [CrossRef]
- Lin, Y.; Zhang, Q.; Streets, D.G.; He, K.B.; Wang, S.X. Ozone trends in China from 1991 to 2012: Analysis of surface observations and model simulations. Atmos. Chem. Phys. 2014, 14, 5819–5834. [Google Scholar]
- Pope, C.A.; Dockery, D.W. Health effects of fine particulate air pollution: Lines that connect. J. Air Waste Manag. Assoc. 2006, 56, 709–742. [Google Scholar] [CrossRef]
- Brook, R.D.; Rajagopalan, S.; Pope, C.A., 3rd; Brook, J.R.; Bhatnagar, A.; Diez-Roux, A.V.; Holguin, F.; Hong, Y.; Luepker, R.V.; Mittleman, M.A.; et al. Particulate matter air pollution and cardiovascular disease: An update to the scientific statement from the American Heart Association. Circulation 2010, 121, 2331–2378. [Google Scholar] [CrossRef]
- Krewski, D.; Jerrett, M.; Burnett, R.T.; Ma, R.; Hughes, E.; Shi, Y.; Turner, M.C.; Pope, C.A., 3rd; Thurston, G.; Calle, E.E.; et al. Extended Follow-Up and Spatial Analysis of the American Cancer Society Study Linking Particulate Air Pollution and Mortality; Research Report; Health Effects Institute: Boston, MA, USA, 2009; pp. 5–86, discussion 115–136. [Google Scholar]
- Wang, S.; Zhang, Q.; Zheng, B.; Cofala, J.; He, K.B. Emission trends and future projections of air pollutants in China: Implications for air quality improvement. Atmos. Chem. Phys. 2023, 23, 3229–3246. [Google Scholar]
- Atkinson, R.; Arey, J. Atmospheric degradation of volatile organic compounds. Chem. Rev. 2003, 103, 4605–4638. [Google Scholar] [CrossRef]
- Bell, M.L.; McDermott, A.; Zeger, S.L.; Samet, J.M.; Dominici, F. Ozone and short-term mortality in 95 US urban communities, 1987–2000. JAMA 2006, 295, 1087–1095. [Google Scholar] [CrossRef]
- Zheng, J.; Zhang, Q.; He, K.B.; Wang, S.; Streets, D.G. Tropospheric ozone in China: Concentrations, trends, and sources. Atmos. Chem. Phys. 2021, 21, 13239–13258. [Google Scholar]
- Huang, Z.; Zhong, Z.; Sha, Q.; Xu, Y.; Zhang, Z.; Wu, L.; Wang, Y.; Zhang, L.; Cui, X.; Tang, M.; et al. An updated model-ready emission inventory for Guangdong Province by incorporating big data and mapping onto multiple chemical mechanisms. Sci. Total Environ. 2021, 769, 144535. [Google Scholar] [CrossRef] [PubMed]
- Ren, X.; Mi, Z.; Georgopoulos, P.G. Comparison of Machine Learning and Land Use Regression for fine scale spatiotemporal estimation of ambient air pollution: Modeling ozone concentrations across the contiguous United States. Environ. Int. 2020, 142, 105827. [Google Scholar] [CrossRef]
- Abdullah, S.; Ismail, M.; Ahmed, A.N. Identification of air pollution potential sources through principal component analysis (PCA). Int. J. Civ. Eng. Technol. 2018, 9, 1435–1442. [Google Scholar]
- Jain, A.; Lella, R.L. Pearson correlation coefficient based attribute weighted k-nn for air pollution prediction. In Proceedings of the 2020 IEEE 17th India Council International Conference (INDICON), New Delhi, India, 10–13 December 2020; pp. 1–8. [Google Scholar]
- Hoek, G.; Beelen, R.; de Hoogh, K.; Vienneau, D.; Gulliver, J.; Fischer, P.; Briggs, D. A review of land-use regression models to assess spatial variation of outdoor air pollution. Atmos. Environ. 2008, 42, 7561–7578. [Google Scholar] [CrossRef]
- Zhou, Q.; Wang, C.; Fang, S. Application of geographically weighted regression (GWR) in the analysis of the cause of haze pollution in China. Atmos. Pollut. Res. 2019, 10, 835–846. [Google Scholar] [CrossRef]
- Han, Y.; Zhang, Q.; Li, V.O.K.; Lam, J.C.K. Deep-AIR: A Hybrid CNN-LSTM Framework for Air Quality Modeling in Metropolitan Cities. arXiv 2021, arXiv:2103.14587. [Google Scholar]
- Hu, X.; Belle, J.H.; Meng, X.; Wildani, A.; Waller, L.A.; Strickland, M.J.; Liu, Y. Estimating PM2.5 Concentrations in the Conterminous United States Using the Random Forest Approach. Environ. Sci. Technol. 2017, 51, 6936–6944. [Google Scholar] [CrossRef]
- Zhang, K.; Batterman, Z.S. Air pollution and health risks due to vehicle traffic. Sci. Total Environ. 2013, 450–451, 307–316. [Google Scholar] [CrossRef]
- Zhang, K.; Batterman, S. Air pollution prediction using XGBoost with multiple spatiotemporal features. Atmos. Pollut. Res. 2019, 10, 768–775. [Google Scholar] [CrossRef]
- Jogin, M.; Mohana; Madhulika, M.S.; Divya, G.D.; Meghana, R.K.; Apoorva, S. Feature Extraction using Convolution Neural Networks (CNN) and Deep Learning. In Proceedings of the 2018 3rd IEEE International Conference on Recent Trends in Electronics, Information & Communication Technology (RTEICT), Bangalore, India, 18–19 May 2018; pp. 2319–2323. [Google Scholar]
- Zhou, C.; Wang, F.; Guo, Y.; Liu, C.; Ji, D.; Wang, Y.; Xu, X.; Lu, X.; Wang, Y.; Carmichael, G.; et al. Reconstructed daily ground-level O3 in China over 2005–2021 for climatological, ecological, and health research. Earth Syst. Sci. Data Discuss. 2022. [Google Scholar] [CrossRef]
- Gui, K.; Che, H.; Zeng, Z.; Wang, Y.; Zhai, S.; Wang, Z.; Luo, M.; Zhang, L.; Liao, T.; Zhao, H.; et al. Construction of a virtual PM2.5 observation network in China based on high-density surface meteorological observations using the Extreme Gradient Boosting model. Environ. Int. 2020, 141, 105801. [Google Scholar] [CrossRef]
- Yin, J.; Jiang, J.; Tong, L.; Huang, P. FCNN+: An Improved Fully Connected Neural Network High-accuracy Prediction Model. In Proceedings of the 2023 8th International Conference on Information Systems Engineering (ICISE), Dalian, China, 23–25 June 2023; pp. 539–542. [Google Scholar] [CrossRef]
- Tian, H.; Zhao, Y.; Luo, M.; He, Q.; Han, Y.; Zeng, Z. Estimating PM2.5 from multisource data: A comparison of different machine learning models in the Pearl River Delta of China. Urban Clim. 2021, 35, 100740. [Google Scholar] [CrossRef]
- Murphy, K.P. Machine Learning: A Probabilistic Perspective; MIT Press: Cambridge, MA, USA, 2012. [Google Scholar]
- Gupta, P.; Zhan, S.; Mishra, V.; Aekakkararungroj, A.; Markert, A.; Paibong, S.; Chishtie, F. Machine learning algorithm for estimating surface PM2.5 in Thailand. Aerosol Air Qual. Res. 2021, 21, 210105. [Google Scholar] [CrossRef]
- Vu, T.V.; Shi, Z.; Cheng, J.; Zhang, Q.; He, K.; Wang, S.; Harrison, R.M. Assessing the impact of clean air action on air quality trends in Beijing using a machine learning technique. Atmos. Chem. Phys. 2019, 19, 11303–11314. [Google Scholar] [CrossRef]
- Zhao, J.; Deng, F.; Cai, Y.; Chen, J. Long short-term memory-Fully connected (LSTM-FC) neural network for PM2.5 concentration prediction. Chemosphere 2019, 220, 486–492. [Google Scholar] [CrossRef] [PubMed]
- Rauschmayr, N.; Kumar, V.; Huilgol, R.; Olgiati, A.; Bhattacharjee, S.; Harish, N.; Kenthapadi, K. Amazon sagemaker debugger: A system for real-time insights into machine learning model training. Proc. Mach. Learn. Syst. 2021, 3, 770–782. [Google Scholar]
- Takoutsing, B.; Heuvelink, G.B.M. Comparing the prediction performance, uncertainty quantification and extrapolation potential of regression kriging and random forest while accounting for soil measurement errors. Geoderma 2022, 428, 116192. [Google Scholar] [CrossRef]
- Vaysse, K.; Lagacherie, P. Using quantile regression forest to estimate uncertainty of digital soil mapping products. Geoderma 2017, 291, 55–64. [Google Scholar] [CrossRef]
- Xiao, Z.; Li, H.; Gao, Y. Analysis of the impact of the Beijing-Tianjin-Hebei coordinated development on environmental pollution and its mechanism. Environ. Monit. Assess 2022, 194, 91. [Google Scholar] [CrossRef] [PubMed]
- Monks, P.S.; Archibald, A.T.; Colette, A.; Cooper, O.; Coyle, M.; Derwent, R.; Fowler, D.; Granier, C.; Law, K.S.; Mills, G.E.; et al. Tropospheric ozone and its precursors from the urban to the global scale from air quality to short-lived climate forcer. Atmos. Chem. Phys. 2015, 15, 8889–8973. [Google Scholar] [CrossRef]
- Jacob, D.J.; Winner, D.A. Effect of climate change on air quality. Atmos. Environ. 2009, 43, 51–63. [Google Scholar] [CrossRef]
- Zhang, Q.; Jimenez, J.L.; Canagaratna, M.R.; Ulbrich, I.M.; Ng, N.L.; Worsnop, D.R.; Sun, Y. Understanding atmospheric organic aerosols via factor analysis of aerosol mass spectrometry: A review. Anal. Bioanal. Chem. 2011, 401, 3045–3067. [Google Scholar] [CrossRef]
- Arya, S.P. Air Pollution Meteorology and Dispersion; Oxford University Press: Oxford, UK, 1999. [Google Scholar]
- Han, L.; Lan, T.; Cheng, S.; Wang, Y.; Qi, C.; Tian, J.; Wang, H.; Han, D.; Wang, S. Evolution Characteristics of PM2.5 and O3 and Their Synergistic Effects on Atmospheric Compound Pollution in Tangshan. Environ. Sci. 2024, 45, 4385–4397. [Google Scholar]
- Alves, C.; Evtyugina, M.; Vicente, E.; Vicente, A.; Rienda, I.C.; de la Campa, A.S.; Duarte, I. PM2. 5 chemical composition and health risks by inhalation near a chemical complex. J. Environ. Sci. 2023, 124, 860–874. [Google Scholar] [CrossRef]
- Zeng, J.; Zhang, L.; Yao, C.; Xie, T.; Rao, L.; Lu, H.; Lu, S. Relationships between chemical elements of PM2.5 and O3 in Shanghai atmosphere based on the 1-year monitoring observation. J. Environ. Sci. 2020, 95, 49–57. [Google Scholar] [CrossRef] [PubMed]
- Seinfeld, J.H.; Pandis, S.N. Atmospheric Chemistry and Physics: From Air Pollution to Climate Change; John Wiley & Sons: Hoboken, NJ, USA, 2016. [Google Scholar]
- Ervens, B.; Turpin, B.J.; Weber, R.J. Secondary organic aerosol formation in cloud droplets and aqueous particles (aqSOA): A review of laboratory, field and model studies. Atmos. Chem. Phys. 2011, 11, 11069–11129. [Google Scholar] [CrossRef]
- Tao, J.; Zhang, L.; Cao, J.; Zhong, L.; Chen, D.; Yang, Y.; Zhang, R. Source apportionment of PM2. 5 at urban and suburban areas of the Pearl River Delta region, south China-With emphasis on ship emissions. Sci. Total Environ. 2017, 574, 1559–1570. [Google Scholar] [CrossRef]
- Liu, L.; Long, X.; Li, Y.; Zang, Z.; Wang, F.; Han, Y.; Yang, J. Impacts of meteorology and emission reductions on haze pollution during the lockdown in the North China Plain. Atmos. Chem. Phys. 2025, 25, 1569–1585. [Google Scholar] [CrossRef]
- Shi, C.; Yuan, R.; Wu, B.; Meng, Y.; Zhang, H.; Zhang, H.; Gong, Z. Meteorological conditions conducive to PM2. 5 pollution in winter 2016/2017 in the Western Yangtze River Delta, China. Sci. Total Environ. 2018, 642, 1221–1232. [Google Scholar] [CrossRef]
- Bhatti, U.A.; Yan, Y.; Zhou, M.; Ali, S.; Hussain, A.; Qingsong, H.; Yuan, L. Time series analysis and forecasting of air pollution particulate matter (PM 2.5): An SARIMA and factor analysis approach. IEEE Access 2021, 9, 41019–41031. [Google Scholar] [CrossRef]
- Wang, D.; Ban, X.; Ji, L.; Guan, X.; Liu, K.; Qian, X. An Adaptive Shrinking Grid Search Chaotic Wolf Optimization Algorithm Using Standard Deviation Updating Amount. Comput. Intell. Neurosci. 2020, 2020, 1–15. [Google Scholar] [CrossRef]
- Mahajan, S.; Liu, H.-M.; Tsai, T.-C.; Chen, L.-J. Improving the Accuracy and Efficiency of PM2.5 Forecast Service Using Cluster-Based Hybrid Neural Network Model. IEEE Access 2018, 6, 19193–19204. [Google Scholar] [CrossRef]
- Park, C.B.; Sugimoto, N.; Matsui, I.; Shimizu, A.; Tatarov, B.; Kamei, A.; Westphal, D.L. Long-range transport of Saharan dust to east Asia observed with lidars. Sola 2005, 1, 121–124. [Google Scholar] [CrossRef][Green Version]
- Li, R.; Cui, L.; Meng, Y.; Zhao, Y.; Fu, H. Satellite-based prediction of daily SO2 exposure across China using a high-quality random forest-spatiotemporal Kriging (RF-STK) model for health risk assessment. Atmos. Environ. 2019, 208, 10–19. [Google Scholar] [CrossRef]
- Huang, K.; Xiao, Q.; Meng, X.; Geng, G.; Wang, Y.; Lyapustin, A.; Gu, D.; Liu, Y. Predicting monthly high-resolution PM2.5 concentrations with random forest model in the North China Plain. Environ. Pollut. 2018, 242 Pt A, 675–683. [Google Scholar] [CrossRef]
- Wei, J.; Li, Z.; Lyapustin, A.; Sun, L.; Peng, Y.; Xue, W.; Cribb, M. Reconstructing 1-km-resolution high-quality PM2.5 data records from 2000 to 2018 in China: Spatiotemporal variations and policy implications. Remote Sens. Environ. 2021, 252, 112136. [Google Scholar] [CrossRef]
- Wei, J.; Zhang, C.; Li, Z.; Pinker, R.T.; Wang, J.; Sun, L.; Xue, W.; Li, R.; Cribb, M. Himawari-8-derived diurnal variations in ground-level PM 2.5 pollution across China using the fast space-time Light Gradient Boosting Machine (LightGBM). Atmos. Chem. Phys. 2021, 21, 7863–7880. [Google Scholar]
- Jung, C.R.; Hwang, B.F.; Chen, W.T. Incorporating long-term satellite-based aerosol optical depth, localized land use data, and meteorological variables to estimate ground-level PM2.5 concentrations in Taiwan from 2005 to 2015. Environ. Pollut. 2018, 237, 1000–1010. [Google Scholar] [CrossRef]
- Ma, R.; Ban, J.; Wang, Q.; Zhang, Y.; Yang, Y.; He, M.Z.; Li, T. Random forest model based fine scale spatiotemporal O3 trends in the Beijing-Tianjin-Hebei region in China, 2010 to 2017. Environ. Pollut. 2021, 276, 116635. [Google Scholar] [CrossRef]
- Hu, X.; Zhang, J.; Xue, W.; Zhou, L.; Che, Y.; Han, T. Estimation of the near-surface ozone concentration with full spatiotemporal coverage across the Beijing-Tianjin-Hebei region based on extreme gradient boosting combined with a WRF-chem model. Atmosphere 2022, 13, 632. [Google Scholar] [CrossRef]
- Breiman, L. Random forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
- Chen, T.; Guestrin, C. Xgboost: A scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; pp. 785–794. [Google Scholar]
- Rumelhart, D.E.; Hinton, G.E.; Williams, R.J. Learning representations by back-propagating errors. Nature 1986, 323, 533–536. [Google Scholar] [CrossRef]





| Meteorological Factor | Description | Annual | Spring | Summer | Autumn | Winter |
|---|---|---|---|---|---|---|
| Mean (std) | Mean (std) | Mean (std) | Mean (std) | Mean (std) | ||
| T | Temperature (°C) | 12.29 (12.20) | 13.20 (8.25) | 25.71 (4.79) | 12.64 (8.56) | −2.59 (5.22) |
| PS | Surface Pressure (kPa) | 98.95 (0.996) | 98.75 (0.75) | 97.87 (0.46) | 99.27 (0.71) | 99.89 (0.65) |
| RH | Relative Humidity (%) | 60.11 (20.95) | 49.96 (21.58) | 65.07 (21.10) | 64.57 (19.29) | 60.53 (18.27) |
| TDEW | Dew/Frost Point (°C) | 3.52 (11.57) | 1.06 (6.62) | 17.33 (4.62) | 5.25 (8.26) | −9.80 (4.87) |
| WS | Wind Speed (m/s) | 3.52 (1.46) | 4.23 (1.63) | 3.07 (1.11) | 3.24 (1.36) | 3.53 (1.43) |
| WD | Wind Direction (Degrees) | 189.74 (67.63) | 193.21 (66.45) | 172.03 (56.6) | 192.64 (70.32) | 197.69 (71.69) |
| PREP | Precipitation Corrected (mm/h) | 0.082 (0.29) | 0.041 (0.16) | 0.22 (0.48) | 0.061 (0.24) | 0.010 (0.064) |
| IRRA | All-Sky Surface Shortwave Downward Irradiance (Wh/m2) | 174.25 (243.4) | 219.59 (282.4) | 221.13 (270.11) | 145.70 (214.03) | 110.46 (170.79) |
| Item | PM2.5 | PM10 | O3 | SO2 | NO2 | CO | T | P | RH | TDEW | WS | WD | PREP | IRRA |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| PM2.5 | 1.00 | |||||||||||||
| PM10 | 0.74 | 1.00 | ||||||||||||
| O3 | −0.31 | −0.19 | 1.00 | |||||||||||
| SO2 | 0.49 | 0.35 | −0.21 | 1.00 | ||||||||||
| NO2 | 0.64 | 0.42 | −0.65 | 0.59 | 1.00 | |||||||||
| CO | 0.83 | 0.47 | −0.37 | 0.71 | 0.71 | 1.00 | ||||||||
| T | −0.40 | −0.27 | 0.69 | −0.28 | −0.53 | −0.38 | 1.00 | |||||||
| PS | 0.31 | 0.21 | −0.56 | 0.22 | 0.42 | 0.30 | −0.80 | 1.00 | ||||||
| RH | 0.10 | −0.04 | −0.33 | −0.14 | 0.25 | 0.10 | −0.19 | −0.05 | 1.00 | |||||
| TDEW | −0.34 | −0.28 | 0.49 | −0.33 | −0.38 | −0.32 | 0.85 | −0.78 | 0.33 | 1.00 | ||||
| WS | −0.02 | 0.07 | 0.09 | 0.02 | −0.13 | −0.05 | −0.04 | 0.02 | −0.44 | −0.28 | 1.00 | |||
| WD | 0.05 | 0.08 | −0.18 | 0.07 | 0.16 | 0.04 | −0.19 | 0.04 | −0.11 | −0.25 | 0.09 | 1.00 | ||
| PREP | −0.12 | −0.12 | 0.10 | −0.14 | −0.16 | −0.08 | 0.21 | −0.23 | 0.24 | 0.32 | 0.06 | −0.17 | 1.00 | |
| IRRA | −0.15 | −0.08 | 0.34 | 0.06 | −0.34 | −0.08 | 0.44 | −0.17 | −0.68 | 0.05 | 0.26 | −0.04 | −0.04 | 1.00 |
| Model | Pollutant | Algorithm | RMSE | MAE | R2 | |||
|---|---|---|---|---|---|---|---|---|
| Training | Testing | Training | Testing | Training | Testing | |||
| Model I | PM2.5 | XGBoost | 1.92 | 4.69 | 1.36 | 3.05 | 0.996 | 0.979 |
| RF | 1.95 | 5.23 | 1.27 | 3.34 | 0.996 | 0.973 | ||
| FCNN | 3.31 | 4.37 | 2.45 | 3.09 | 0.989 | 0.981 | ||
| O3 | XGBoost | 4.51 | 11.3 | 3.10 | 7.90 | 0.990 | 0.938 | |
| RF | 4.99 | 13.2 | 3.39 | 9.06 | 0.988 | 0.915 | ||
| FCNN | 8.64 | 11.3 | 6.38 | 8.23 | 0.964 | 0.938 | ||
| Model II | PM2.5 | XGBoost | 5.88 | 13.0 | 4.04 | 8.60 | 0.966 | 0.830 |
| RF | 16.1 | 17.5 | 11.15 | 12.2 | 0.745 | 0.693 | ||
| FCNN | 11.0 | 13.4 | 8.16 | 9.68 | 0.881 | 0.821 | ||
| O3 | XGBoost | 7.01 | 15.8 | 4.87 | 11.2 | 0.976 | 0.878 | |
| RF | 15.1 | 17.4 | 11.4 | 12.9 | 0.891 | 0.853 | ||
| FCNN | 13.7 | 15.4 | 10.2 | 11.3 | 0.910 | 0.885 | ||
| City | Pollutant | Model I | Model II | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| RMSE | MAE | R2 | RMSE | MAE | R2 | ||||||||
| Training | Testing | Training | Testing | Training | Testing | Training | Testing | Training | Testing | Training | Testing | ||
| Beijing | PM2.5 | 3.04 | 8.69 | 2.15 | 4.97 | 0.993 | 0.945 | 7.03 | 15.66 | 4.79 | 10.25 | 0.962 | 0.807 |
| O3 | 4.64 | 11 | 3.24 | 7.62 | 0.99 | 0.947 | 7.04 | 15.17 | 5.02 | 10.82 | 0.978 | 0.896 | |
| Tianjin | PM2.5 | 3.39 | 8.54 | 2.36 | 5.37 | 0.993 | 0.953 | 8.09 | 17.42 | 5.59 | 11.88 | 0.958 | 0.799 |
| O3 | 5.26 | 12.9 | 3.61 | 8.75 | 0.989 | 0.933 | 7.43 | 15.82 | 5.23 | 11.2 | 0.978 | 0.901 | |
| Shijiazhuang | PM2.5 | 2.67 | 6.98 | 1.89 | 4.24 | 0.996 | 0.976 | 7.39 | 16.85 | 5.08 | 11.05 | 0.972 | 0.851 |
| O3 | 5.36 | 12.9 | 3.7 | 8.72 | 0.989 | 0.936 | 7.16 | 16.55 | 5.09 | 11.81 | 0.980 | 0.895 | |
| Baoding | PM2.5 | 3.28 | 9.41 | 2.38 | 5.22 | 0.995 | 0.958 | 5.23 | 12.42 | 3.67 | 8.05 | 0.987 | 0.922 |
| O3 | 5.69 | 13.5 | 3.96 | 9.33 | 0.989 | 0.935 | 8.5 | 18.68 | 6.04 | 13.58 | 0.974 | 0.877 | |
| Tangshan | PM2.5 | 3.25 | 8.42 | 2.34 | 5.31 | 0.993 | 0.954 | 8.63 | 19.51 | 5.93 | 12.88 | 0.952 | 0.741 |
| O3 | 6.18 | 15.3 | 4.16 | 10.3 | 0.985 | 0.911 | 8.71 | 19.17 | 6.17 | 13.53 | 0.971 | 0.856 | |
| Qinhuangdao | PM2.5 | 2.68 | 6.43 | 1.94 | 4.08 | 0.993 | 0.958 | 7.04 | 15.56 | 4.72 | 10.14 | 0.950 | 0.768 |
| O3 | 5.74 | 12.9 | 4.06 | 9.19 | 0.984 | 0.918 | 8.68 | 18.50 | 6.25 | 13.50 | 0.964 | 0.834 | |
| Handan | PM2.5 | 3.92 | 9.63 | 2.84 | 6.34 | 0.993 | 0.954 | 9.57 | 22.02 | 6.65 | 14.49 | 0.959 | 0.763 |
| O3 | 6.02 | 14.5 | 4.25 | 10.2 | 0.987 | 0.923 | 8.14 | 18.15 | 5.82 | 13.10 | 0.976 | 0.875 | |
| Zhangjiakou | PM2.5 | 2.85 | 7.61 | 2.09 | 3.89 | 0.989 | 0.946 | 4.62 | 18.31 | 3.14 | 7.17 | 0.971 | 0.648 |
| O3 | 5.79 | 12.3 | 4.18 | 9 | 0.974 | 0.886 | 6.37 | 13.70 | 4.49 | 9.78 | 0.969 | 0.856 | |
| Chengde | PM2.5 | 2.78 | 7.44 | 2.04 | 4.07 | 0.987 | 0.918 | 5.57 | 11.55 | 3.77 | 7.48 | 0.948 | 0.751 |
| O3 | 5.63 | 13.3 | 3.88 | 9.19 | 0.985 | 0.916 | 8.28 | 17.33 | 5.84 | 12.32 | 0.966 | 0.856 | |
| Langfang | PM2.5 | 3.14 | 12.1 | 2.32 | 5.01 | 0.994 | 0.914 | 8.67 | 18.52 | 5.95 | 12.16 | 0.952 | 0.772 |
| O3 | 5.99 | 14.1 | 4.11 | 9.59 | 0.987 | 0.927 | 9.48 | 19.64 | 6.79 | 14.14 | 0.967 | 0.859 | |
| Cangzhou | PM2.5 | 3.29 | 7.99 | 2.39 | 5.09 | 0.993 | 0.96 | 8.36 | 18.77 | 5.74 | 12.30 | 0.957 | 0.777 |
| O3 | 6.21 | 14.3 | 4.33 | 10.2 | 0.984 | 0.915 | 8.21 | 17.77 | 5.89 | 13.28 | 0.973 | 0.872 | |
| Hengshui | PM2.5 | 3.65 | 8.45 | 2.68 | 5.76 | 0.993 | 0.963 | 9.03 | 19.97 | 6.24 | 13.21 | 0.958 | 0.785 |
| O3 | 6.04 | 13.8 | 4.23 | 9.89 | 0.986 | 0.929 | 8.13 | 17.46 | 5.82 | 12.87 | 0.974 | 0.879 | |
| Xingtai | PM2.5 | 4.08 | 9.31 | 2.94 | 6.25 | 0.993 | 0.959 | 8.75 | 21.31 | 6.08 | 14.07 | 0.965 | 0.790 |
| O3 | 6.41 | 14.9 | 4.49 | 10.4 | 0.986 | 0.925 | 8.75 | 19.21 | 6.29 | 14.10 | 0.974 | 0.873 | |
| Study Area | Pollutant | Spatiotemporal Resolution | Model | Model Fitting (Training Dataset) | Cross-Validation (Testing Dataset) | Source | |||
|---|---|---|---|---|---|---|---|---|---|
| R2 | RMSE (μg/m3) | Method | R2 | RMSE (μg/m3) | |||||
| China | PM2.5 | 1 km (monthly) | RF | - | - | 10-fold | 0.88 | 14.89 | (Huang et al., 2018) [59] |
| China | PM2.5 | 1 km (daily) | Space-time extra | 0.92–0.94 | 5.11–9.92 | 10-fold | 0.86–0.90 | 10.00–18.40 | (Wei et al., 2021) [60] |
| trees | |||||||||
| China | PM2.5 | 5 km (hourly) | STLG | 0.97–0.98 | 4.18–7.31 | 10-fold | 0.81–0.85 | 11.24–15.56 | (Wei et al., 2021) [61] |
| Taiwan | PM2.5 | 10 km (twice-daily) | LME | 0.77 | 11.4 | 10-fold | 0.66 | 12.9 | (Jung et al., 2018) [62] |
| BTH | PM2.5 | 5 km (hourly) | XGBoost | 0.98–0.99 | 2.67–4.08 | 10-fold | 0.91–0.98 | 6.43–12.1 | Our study (Model I) |
| BTH | PM2.5 | 5 km (hourly) | XGBoost | 0.95–0.99 | 4.62–9.57 | 10-fold | 0.65–0.92 | 11.55–22.02 | Our study (Model II) |
| BTH | O3 | Daily (MDA8H) | RF | - | - | 10-fold | 0.84 | (Ma et al., 2021) [63] | |
| BTH | O3 | Daily | WRFC-XGB | 0.91–0.95 | 15.50–17.7 | 10-fold | 0.91–0.95 | 13.57–17.7 | (Hu et al., 2022) [64] |
| BTH | O3 | 5 km (hourly) | XGBoost | 0.97–0.99 | 4.64–6.41 | 10-fold | 0.89–0.95 | 11–15.3 | Our study (Model I) |
| BTH | O3 | 5 km (hourly) | XGBoost | 0.96–0.98 | 6.37–9.48 | 10-fold | 0.83–0.90 | 13.7–19.64 | Our study (Model II) |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Wei, C.; Zhao, C.; Hu, Y.; Tian, Y. Predicting the Concentration Levels of PM2.5 and O3 for Highly Urbanized Areas Based on Machine Learning Models. Sustainability 2025, 17, 9211. https://doi.org/10.3390/su17209211
Wei C, Zhao C, Hu Y, Tian Y. Predicting the Concentration Levels of PM2.5 and O3 for Highly Urbanized Areas Based on Machine Learning Models. Sustainability. 2025; 17(20):9211. https://doi.org/10.3390/su17209211
Chicago/Turabian StyleWei, Chao, Chen Zhao, Yuanan Hu, and Yutai Tian. 2025. "Predicting the Concentration Levels of PM2.5 and O3 for Highly Urbanized Areas Based on Machine Learning Models" Sustainability 17, no. 20: 9211. https://doi.org/10.3390/su17209211
APA StyleWei, C., Zhao, C., Hu, Y., & Tian, Y. (2025). Predicting the Concentration Levels of PM2.5 and O3 for Highly Urbanized Areas Based on Machine Learning Models. Sustainability, 17(20), 9211. https://doi.org/10.3390/su17209211

