Application of ANN, XGBoost, and Other ML Methods to Forecast Air Quality in Macau
Abstract
:1. Introduction
2. Materials and Methods
2.1. Data Acquisition
2.2. Procedure of Study
2.3. Learning Algorithms
2.4. Implementation and Evaluation Methods of ML Models
3. Results and Discussion
3.1. Performance of ML Models in 2020 (24 h)
3.2. Performance of ML Models in 2021 (24 h)
3.3. Performance of ML Models (48-h)
3.4. Limitation of the Study
4. Conclusions
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Mendes, L.; Monjardino, J.; Ferreira, F. Air Quality Forecast by Statistical Methods: Application to Portugal and Macao. Front. Big Data 2022, 5, 826517. [Google Scholar] [CrossRef] [PubMed]
- Lei, T.M.; Siu, S.W.; Monjardino, J.; Mendes, L.; Ferreira, F. Using Machine Learning Methods to Forecast Air Quality: A Case Study in Macao. Atmosphere 2022, 13, 1412. [Google Scholar] [CrossRef]
- Li, X.; Lopes, D.; Mok, K.M.; Miranda, A.I.; Yuen, K.V.; Hoi, K.I. Development of a road traffic emission inventory with high spatial–temporal resolution in the world’s most densely populated region—Macau. Environ. Monit. Assess. 2019, 191, 239. [Google Scholar] [CrossRef]
- Azarov, V.; Manzhilevskaya, S.; Petrenko, L. The Pollution Prevention during the Civil Construction. EDP Sci. 2018, 196, 04073. [Google Scholar] [CrossRef]
- Lee, Y.C.; Savtchenko, A. Relationship between Air Pollution in Hong Kong and in the Pearl River Delta Region of South China in 2003 and 2004: An Analysis. J. Appl. Meteorol. Climatol. 2006, 45, 269–282. [Google Scholar] [CrossRef]
- Fang, X.; Fan, Q.; Li, H.; Liao, Z.; Xie, J.; Fan, S. Multi-scale correlations between air quality and meteorology in the Guangdong−Hong Kong−Macau Greater Bay Area of China during 2015. Atmos. Environ. 2018, 191, 463–477. [Google Scholar] [CrossRef]
- Tong, C.H.M.; Yim, S.H.L.; Rothenberg, D.; Wang, C.; Lin, C.Y.; Chen, Y.D.; Lau, N.C. Projecting the impacts of atmospheric conditions under climate change on air quality over the Pearl River Delta region. Atmos. Environ. 2018, 193, 79–87. [Google Scholar] [CrossRef]
- Bernstein, J.A.; Alexis, N.; Barnes, C.; Bernstein, I.L.; Bernstein, J.A.; Nel, A.; Peden, D.; Diaz-Sanchez, D.; Tarlo, S.M.; Williams, P.B. Health effects of air pollution. J. Allergy Clin. Immunol. 2004, 114, 1116–1123. [Google Scholar] [CrossRef] [PubMed]
- Fang, X.; Fan, Q.; Liao, Z.; Xie, J.; Xu, X.; Fan, S. Spatial-temporal characteristics of the air quality in the Guangdong−Hong Kong−Macau Greater Bay Area of China during 2015. Atmos. Environ. 2019, 210, 14–34. [Google Scholar] [CrossRef]
- Sheng, N.; Tang, U.W. Risk assessment of traffic-related air pollution in a world heritage city. Int. J. Environ. Sci. Technol. 2013, 10, 11–18. [Google Scholar] [CrossRef] [Green Version]
- Valavanidis, A.; Fiotakis, K.; Vlachogianni, T. Airborne Particulate Matter and Human Health: Toxicological Assessment and Importance of Size and Composition of Particles for Oxidative Damage and Carcinogenic Mechanisms. J. Environ. Sci. Health Part C 2008, 26, 339–362. [Google Scholar] [CrossRef] [PubMed]
- Londahl, J.; Massling, A.; Pagels, J.; Swietlicki, E.; Vaclavik, E.; Loft, S. Size-Resolved Respiratory-Tract Deposition of Fine and Ultrafine Hydrophobic and Hygroscopic Aerosol Particles During Rest and Exercise. Inhal. Toxicol. 2010, 19, 109–116. [Google Scholar] [CrossRef] [PubMed]
- Lin, Y.; Zou, J.; Yang, W.; Li, C.Q. A Review of Recent Advances in Research on PM2.5 in China. Int. J. Environ. Res. Public Health 2018, 15, 438. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Wittenberg, B.A.; Wittenberg, J.B. Effects of carbon monoxide on isolated heart muscle cells. Res. Rep. Health Eff. Inst. 1993, 62, 1–21. [Google Scholar]
- Townsend, C.L.; Maynard, R.L. Effects on health of prolonged exposure to low concentrations of carbon monoxide. Occup. Environ. Med. 2002, 59, 708–711. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Shimadera, H.; Kojima, T.; Kondo, A. Evaluation of Air Quality Model Performance for Simulating Long-Range Transport and Local Pollution of PM2.5 in Japan. Adv. Meteorol. 2016, 2016, 5694251. [Google Scholar] [CrossRef] [Green Version]
- Kahraman, A.C.; Sivri, N. Comparison of metropolitan cities for mortality rates attributed to ambient air pollution using the AirQ model. Environ. Sci. Pollut. Res. 2022, 29, 43034–43047. [Google Scholar] [CrossRef] [PubMed]
- Xue, W.; Wang, Y. Domestic and Foreign Research Progress of Air Quality. Environ. Sustain. Dev. 2013, 38, 14–20. [Google Scholar]
- Chaloulakou, A.; Kassomenos, P.; Spyrellis, N.; Demokritou, P.; Koutrakis, P. Measurements of PM10 and PM2.5 particle concentrations in Athens, Greece. Atmos. Environ. 2003, 37, 649–660. [Google Scholar] [CrossRef]
- Elangasinghe, M.A.; Singhal, N.; Dirks, K.N.; Salmond, J.A. Development of an ANN–based air pollution forecasting system with explicit knowledge through sensitivity analysis. Atmos. Pollut. Res. 2014, 5, 696–708. [Google Scholar] [CrossRef] [Green Version]
- Maleki, H.; Sorooshian, A.; Goudarzi, G. Air pollution prediction by using an artificial neural network model. Clean Technol. Environ. Policy 2019, 21, 1341–1352. [Google Scholar] [CrossRef]
- Sinnott, R.O.; Guan, Z. Prediction of Air Pollution through Machine Learning Approaches on the Cloud. In Proceedings of the 2018 IEEE/ACM 5th International Conference on Big Data Computing Applications and Technologies (BDCAT), Zurich, Switzerland, 17–20 December 2018. [Google Scholar]
- Suárez Sánchez, A.; Garcia Nieto, P.J.; Riesgo Fernández, P.; del Coz Díaz, J.J.; Iglesias-Rodriguez, F.J. Application of an SVM-based regression model to the air quality study at local scale in the Avilés urban area (Spain). Math. Comput. Model. 2011, 54, 1453–1466. [Google Scholar] [CrossRef]
- Bhattacharya, E.; Bhattacharya, D. A Review of Recent Deep Learning Models in COVID-19 Diagnosis. Eur. J. Eng. Technol. Res. 2021, 6, 10–15. [Google Scholar] [CrossRef]
- Yu, R.; Yang, Y.; Yang, L.; Han, G.; Move, O.A. RAQ-A Random Forest Approach for Predicting Air Quality in Urban Sensing Systems. Sensors 2016, 16, 86. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Liu, H.; Lang, B. Machine Learning and Deep Learning Methods for Intrusion Detection Systems: A Survey. Appl. Sci. 2019, 9, 4396. [Google Scholar] [CrossRef] [Green Version]
- Pan, B. Application of XGBoost algorithm in hourly PM2.5 concentration prediction. In IOP Conference Series Earth and Environmental Science; IOP Publishing: Bristol, UK, 2018; p. 113. [Google Scholar]
- Jiao, W.; Frey, H.C. Comparison of Fine Particulate Matter and Carbon Monoxide Exposure Concentrations for Selected Transportation Modes. Transportation Research Record. J. Transp. Res. Board 2014, 2428, 54–62. [Google Scholar] [CrossRef]
- Awad, M.; Khanna, R. Support Vector Regression. In Efficient Learning Machines; Springer: Cham, Switzerland, 2015; pp. 67–80. [Google Scholar]
- Breiman, L. Random Forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef] [Green Version]
- Chen, T.; Guestrin, C. XGBoost: A Scalable Tree Boosting System. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD ‘16); Association for Computing Machinery: New York, NY, USA, 2016; pp. 785–794. [Google Scholar]
- Lundberg, S.M.; Lee, S.I. A Unified Approach to Interpreting Model Predictions. Adv. Neural Inf. Process. Syst. 2017, 30, 4766–4775. [Google Scholar]
- Rozemberczki, B.; Watson, L.; Bayer, P.; Yang, H.T.; Kiss, O.; Nilsson, S.; Sarkar, R. The Shapley Value in Machine Learning. arXiv 2022, arXiv:2202.05594. [Google Scholar]
- Akbal, Y.; Ünlü, K.D. A deep learning approach to model daily particular matter of Ankara: Key features and forecasting. Int. J. Environ. Sci. Technol. 2021, 19, 5911–5927. [Google Scholar] [CrossRef]
- Esager, M.W.M.; Ünlü, K.D. Forecasting Air Quality in Tripoli: An Evaluation of Deep Learning Models for Hourly PM2.5 Surface Mass Concentrations. Atmosphere 2023, 14, 478. [Google Scholar] [CrossRef]
Data Type | Source | Variables | Description |
---|---|---|---|
Air pollutant concentration (PM2.5, PM10, CO) | SMG, surface air quality station (at Taipa Ambient, Macau) Data type: daily mean data converted from hourly data in past 24-h | PM_23D1, PM_23D2, PM_23D3, PM_16D1 CO_23D1, CO_23D2, CO_23D3, CO_16D1 | Daily mean concentration for the PM10. PM2.5 and CO for the last 3 days (23D1, 23D2, 23D3) and the 16D1 from 16:00 of yesterday to 15:00 of today in µg/m3) |
Meteorological data | Upper air observation (Upper-air Sounding System) King’s Park Station (Number 45004) Data collected at 1200 h UTC | H_1000, H850, H700, H_500 TAR_925, TAR_850, TAR_700 HR_925, HR_850, HR_700 TD_925, TD_850, TD_700 THI_850, THI_700, THI_500 STB_925, STB_850, STB_700 | Geopotential height at pressure levels in (m) Air temperature at pressure levels in (°C) Relative humidity at pressure levels in (%) Dew point at pressure levels, in (°C) Thickness at pressure levels (connected to mean temperature in the layer in (m) Stability at pressure levels (an indicator of atmospheric stability) in (°C) |
SMG, surface air quality station (at Taipa Ambient, Macau) Surface relative humidity Data type: daily mean data converted by hourly data in past 24-h | T_AIR_MD, T_AIR_MX, T_AIR_MN HRMD, HRMX, HRMN | Air temperature, mean, min, max (an indicator of air stability at surface level) in (°C) Maximum, minimum and mean of relative humidity at the surface in (%) | |
Other data | Geographical data and community activities in Macau | DD FF | Duration of sunshine in (h) Weekday indicator (flag): weekday = 0, weekend = 1 |
Name of Models | Model Parameters and Hyperparameters | |
---|---|---|
MLR | All Features | |
ANN | Number of Neurons | First layer: 1024 |
Second layer: 2048 | ||
Third layer: 1024 | ||
Learning rate | 0.00002 | |
Epochs | 100 | |
Batch size | 32 | |
Validation_split | 0.3 | |
SVM | C | 0.1 |
RF | n_estimators | 80 |
XGBoost | Eta | 0.3 |
max_depth | 6 |
ML Models | Pollutant | Model Performance Indicator | Model Build with SHAP/Feature Selection | |||||
---|---|---|---|---|---|---|---|---|
R2 | RMSE | MAE | BIAS | Yes | No | |||
RF | PM10 | 0.92 | 5.86 | 4.36 | 2.40 | √ | ||
PM2.5 | 0.88 | 3.64 | 2.71 | 1.36 | √ | |||
CO | 0.92 | 0.06 | 0.04 | 0.01 | √ | |||
With DD | 0.89 | 0.06 | 0.04 | −0.02 | ||||
ANN | PM10 | 0.83 | 8.21 | 6.46 | 4.03 | √ | ||
PM2.5 | 0.82 | 4.45 | 3.03 | 1.51 | √ | |||
CO | 0.87 | 0.07 | 0.05 | 0.02 | √ | |||
With DD | −0.96 | 0.27 | 0.23 | −0.23 | ||||
XGboost | PM10 | 0.89 | 6.65 | 4.49 | 2.88 | √ | ||
PM2.5 | 0.83 | 4.42 | 3.51 | 2.41 | √ | |||
CO | 0.90 | 0.06 | 0.04 | 0.00 | √ | |||
With DD | 0.88 | 0.06 | 0.05 | 0.00 | ||||
SVM | PM10 | 0.90 | 6.31 | 5.02 | 3.51 | √ | ||
PM2.5 | 0.86 | 4.04 | 3.21 | 2.32 | √ | |||
CO | 0.94 | 0.05 | 0.03 | 0.00 | √ | |||
With DD | 0.43 | 0.15 | 0.12 | 0.12 | ||||
MLR | PM10 | 0.90 | 6.27 | 5.01 | 3.53 | √ | ||
PM2.5 | 0.85 | 4.18 | 3.37 | 2.51 | √ | |||
CO | 0.88 | 0.07 | 0.05 | 0.01 | √ | |||
With DD | −2.09 | 0.34 | 0.30 | 0.30 |
PM10 24-h forecast 2020 (RF) | Variables | R2 | Adjusted R2 | Result (√ = highest adjusted R2) |
Pollutant (16D1, 23D1 to 23D3) | 0.890 | 0.889 | ||
Pollutant + Upper Air | 0.910 | 0.904 | ||
Pollutant + near ground surface | 0.906 | 0.901 | ||
All variables included | 0.916 | 0.907 | √ | |
PM2.5 24-h forecast 2020 (RF) | Variables | R2 | Adjusted R2 | Result (√ = highest adjusted R2) |
Pollutant (16D1, 23D1 to 23D3) | 0.86 | 0.643 | ||
Pollutant + Upper Air | 0.85 | 0.574 | ||
Pollutant + near ground surface | 0.87 | 0.752 | ||
All variables included | 0.88 | 0.767 | √ | |
CO 24-h forecast 2020 (SVM) | Variables | R2 | Adjusted R2 | Result (√ = highest adjusted R2) |
Pollutant (16D1, 23D1 to 23D3) | 0.647 | 0.643 | ||
Pollutant + Upper Air | 0.602 | 0.574 | ||
Pollutant + near ground surface | 0.764 | 0.752 | ||
All variables included | 0.790 | 0.767 | √ |
ML Models | Pollutant | Model Performance Indicator | Model Build with SHAP/Feature Selection | |||||
---|---|---|---|---|---|---|---|---|
R2 | RMSE | MAE | BIAS | Yes | No | |||
RF | PM10 | 0.91 | 6.67 | 4.42 | 0.67 | √ | ||
PM2.5 | 0.89 | 4.28 | 3.02 | 0.21 | √ | |||
CO | 0.67 | 0.10 | 0.08 | 0.07 | √ | |||
ANN | PM10 | 0.88 | 7.55 | 5.18 | 1.25 | √ | ||
PM2.5 | 0.82 | 3.31 | 2.38 | 0.56 | √ | |||
CO | 0.79 | 0.08 | 0.06 | 0.00 | √ | |||
XGBoost | PM10 | 0.88 | 7.52 | 5.05 | 0.43 | √ | ||
PM2.5 | 0.87 | 3.61 | 2.67 | 0.69 | √ | |||
CO | 0.65 | 0.11 | 0.08 | 0.07 | √ | |||
SVM | PM10 | 0.92 | 6.28 | 4.13 | 0.50 | √ | ||
PM2.5 | 0.88 | 3.48 | 2.51 | 0.45 | √ | |||
CO | 0.76 | 0.09 | 0.07 | 0.07 | √ | |||
MLR | PM10 | 0.91 | 6.52 | 4.20 | 0.68 | √ | ||
PM2.5 | 0.87 | 3.65 | 2.54 | 0.39 | √ | |||
CO | 0.77 | 0.09 | 0.07 | 0.06 | √ |
PM10 24-h forecast 2021 (SVM) | Variables | R2 | Adjusted R2 | Result (√ = highest adjusted R2) |
Pollutant (16D1, 23D1 to 23D3) | 0.890 | 0.889 | ||
Pollutant + Upper Air | 0.905 | 0.898 | ||
Pollutant + near ground surface | 0.907 | 0.902 | ||
All variables included | 0.916 | 0.907 | √ | |
PM2.5 24-h forecast 2021 (RF) | Variables | R2 | Adjusted R2 | Result (√ = highest adjusted R2) |
Pollutant (16D1, 23D1 to 23D3) | 0.855 | 0.853 | ||
Pollutant + Upper Air | 0.873 | 0.864 | ||
Pollutant + near ground surface | 0.872 | 0.866 | ||
All variables included | 0.899 | 0.888 | √ | |
CO 24-h forecast 2021 (ANN) | Variables | R2 | Adjusted R2 | Result (√ = highest adjusted R2) |
Pollutant (16D1, 23D1 to 23D3) | 0.607 | 0.603 | ||
Pollutant + Upper Air | 0.656 | 0.632 | ||
Pollutant + near ground surface | 0.773 | 0.762 | ||
All variables included | 0.787 | 0.764 | √ |
ML Models | Pollutant | Model Performance Indicator | 48-h Model Build with SHAP/Feature Selection | ||||
---|---|---|---|---|---|---|---|
R2 | RMSE | MAE | BIAS | Yes | No | ||
RF | PM10 | 0.66 | 11.70 | 8.84 | 3.58 | √ | |
PM2.5 | 0.49 | 7.59 | 5.72 | 2.14 | √ | ||
CO | 0.61 | 0.12 | 0.09 | 0.01 | √ | ||
ANN | PM10 | 0.65 | 11.96 | 9.28 | 3.69 | √ | |
PM2.5 | 0.50 | 7.56 | 5.68 | 2.27 | √ | ||
CO | 0.57 | 0.13 | 0.10 | 0.01 | √ | ||
XGBoost | PM10 | 0.66 | 11.68 | 8.56 | 3.61 | √ | |
PM2.5 | 0.43 | 8.03 | 5.99 | 1.96 | √ | ||
CO | 0.54 | 0.13 | 0.10 | 0.01 | √ | ||
SVM | PM10 | 0.64 | 12.10 | 9.55 | 3.29 | √ | |
PM2.5 | 0.55 | 6.97 | 5.02 | 1.03 | √ | ||
CO | 0.62 | 0.12 | 0.09 | 0.01 | √ | ||
MLR | PM10 | 0.61 | 12.57 | 10.18 | 4.77 | √ | |
PM2.5 | 0.53 | 7.33 | 5.64 | 1.93 | √ | ||
CO | 0.59 | 0.12 | 0.09 | 0.01 | √ |
PM10 48-h forecast 2020 (RF) | Variables | R2 | Adjusted R2 | Result (√ = highest adjusted R2) |
Pollutant (16D2, 23D2 to 23D4) | 0.598 | 0.594 | ||
Model (without features reduction) | 0.496 | 0.441 | ||
Reduced-feature model with meteorological feature (nos. of variables: 10) | 0.662 | 0.652 | √ | |
CO 48-h forecast 2020 (SVM) | Variables | R2 | Adjusted R2 | Result (√ = highest adjusted R2) |
Pollutant (16D2, 23D2 to 23D4) | 0.556 | 0.551 | ||
Model (without features reduction) | −9.463 | −10.611 | ||
Reduced-feature model with meteorological feature (nos. of variables: 13) | 0.622 | 0.608 | √ | |
PM2.5 48-h forecast 2020 (SVM) | Variables | R2 | Adjusted R2 | Result (√ = highest adjusted R2) |
Pollutant (16D2, 23D2 to 23D4) | 0.528 | 0.523 | ||
Model (without features reduction) | 0.086 | −0.014 | ||
Reduced-feature model with meteorological feature (nos. of variables: 10) | 0.552 | 0.539 | √ |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Lei, T.M.T.; Ng, S.C.W.; Siu, S.W.I. Application of ANN, XGBoost, and Other ML Methods to Forecast Air Quality in Macau. Sustainability 2023, 15, 5341. https://doi.org/10.3390/su15065341
Lei TMT, Ng SCW, Siu SWI. Application of ANN, XGBoost, and Other ML Methods to Forecast Air Quality in Macau. Sustainability. 2023; 15(6):5341. https://doi.org/10.3390/su15065341
Chicago/Turabian StyleLei, Thomas M. T., Stanley C. W. Ng, and Shirley W. I. Siu. 2023. "Application of ANN, XGBoost, and Other ML Methods to Forecast Air Quality in Macau" Sustainability 15, no. 6: 5341. https://doi.org/10.3390/su15065341