Forecasting the Concentration of Particulate Matter in the Seoul Metropolitan Area Using a Gaussian Process Model
Abstract
:1. Introduction
2. Previous Research
2.1. Development on Prediction Model Structure
2.1.1. Prediction with a Linear Model
2.1.2. Prediction with a Neural Network Model
2.1.3. Prediction with a Nonlinear and Nonparametric Regression Model
2.2. Integration of Societal and Urban Information into Prediction
3. Prediction Model of Particulate Matter Concentration
3.1. Prediction Models
Vector Autoregressive Integrated Moving Average with Linear Regression (Varima + Lr)
3.2. Prediction on Diverse Locations
Gaussian Process Regression
- Periodic kernel
- RBF kernel
- Matérn 1/2 (M12) kernel
- Matérn 3/2 (M32) kernel
3.3. Input Data for the Prediction Model
3.3.1. Particulate Matter and Air Quality Data
3.3.2. Location and Time
3.3.3. Meteorological Data
3.3.4. Topographic Data
3.3.5. Traffic Data
3.3.6. Ultraviolet Information
3.3.7. Power Plant Data
3.4. Performance Indicator of the Forecasting Model
3.4.1. Root Mean Squared Error (RMSE)
3.4.2. Index-of-Agreement (IOA)
4. Experiments
4.1. Experimental Setting
4.2. Experimental Results
4.2.1. Quantitative Results
4.2.2. Temporal Patterns from Gaussian Process Regression
4.2.3. Spatial Patterns from Gaussian Process Regression
4.2.4. Ablation Study
5. Conclusions
Author Contributions
Funding
Conflicts of Interest
Abbreviations
PM | Particulate Matter |
CO | Carbon Monoxide |
NO2 | Nitrogen Dioxide |
SO2 | Sulfur Dioxide |
O3 | Ozone |
UV | Ultraviolet |
RMSE | Root Mean Squared Error |
IOA | Index Of Agreement |
GP | Gaussian Process |
GPR | Gaussian Process Regression |
SVGP | Stochastic Variational Gaussian Process |
RBF | Radial Basis Function |
LR | Linear Regression |
AR | Auto-Regressive Model |
MA | Moving Average Model |
ARMA | Autoregressive Moving Average Model |
ARIMA | Auto-Regressive Integrated Moving Average |
VAR | Vector Auto-Regressive Model |
VMA | Vector Moving Average Model |
VARMA | Vector Auto-Regressive Moving Average Model |
VARIMA | Vector Auto-Regressive Integrated Moving Average Model |
GLM | Generalized Linear Model |
NN | Neural Network |
PLS | Partial Least Square |
LSTM | Long Short-Term Memory |
FC | Fully Connected |
CNN | Convolutional Neural Network |
MLP | Multi-Layer Perceptron |
UFP | Ultra-Fine Particle |
CPSO | Chaotic Particle Swarm Optimization |
ANN | Artifial Neural Network |
AOD | Aerosol Optical Depth |
MODIS | Moderate Resolution Imaging Spectroradiometer |
eXGB | eXtreme Gradient Boosting |
ME | Ministry of Environment |
MDPI | Multidisciplinary Digital Publishing Institute |
References
- Heo, J. Important sources and chemical species of ambient fine particles related to adverse health effects. AGUFM 2017, 2017, A24B-05. [Google Scholar]
- Lee, D.; Choi, J.-Y.; Myoung, J.; Kim, O.; Park, J.; Shin, H.-J.; Ban, S.-J.; Park, H.-J.; Nam, K. Analysis of a Severe PM2. 5 Episode in the Seoul Metropolitan Area in South Korea from 27 February to 7 March 2019: Focused on Estimation of Domestic and Foreign Contribution. Atmosphere 2019, 10, 756. [Google Scholar] [CrossRef] [Green Version]
- Oh, H.R.; Ho, C.H.; Koo, Y.S.; Baek, K.G.; Yun, H.Y.; Hur, S.K.; Shim, J.S. Impact of Chinese air pollutants on a record-breaking PMs episode in the Republic of Korea for 11–15 January 2019. Atmos. Environ. 2020, 223, 117262. [Google Scholar] [CrossRef]
- Park, H.; Wonhyuk, L.; Hyungna, O. Cross-Border Spillover Effect of Particulate Matter Pollution between China and Korea. Korean Econ. Rev. 2020, 36, 227–248. [Google Scholar]
- Park, E.H.; Heo, J.; Kim, H.; Yi, S.M. Long term trends of chemical constituents and source contributions of PM2. 5 in Seoul. Chemosphere 2020, 126371. [Google Scholar] [CrossRef]
- Choi, J.; Park, R.J.; Lee, H.M.; Lee, S.; Jo, D.S.; Jeong, J.I.; Lim, C.S. Impacts of local vs. trans-boundary emissions from different sectors on PM2. 5 exposure in South Korea during the KORUS-AQ campaign. Atmos. Environ. 2019, 203, 196–205. [Google Scholar] [CrossRef]
- Chudnovsky, A.A.; Koutrakis, P.; Kloog, I.; Melly, S.; Nordio, F.; Lyapustin, A.; Schwartz, J. Fine particulate matter predictions using high resolution Aerosol Optical Depth (AOD) retrievals. Atmos. Environ. 2014, 89, 189–198. [Google Scholar] [CrossRef] [Green Version]
- Garcia, J.M.; Teodoro, F.; Cerdeira, R.; Coelho, L.M.R.; Kumar, P.; Carvalho, M.G. Developing a methodology to predict PM10 concentrations in urban areas using generalized linear models. Environ. Technol. 2016, 37, 2316–2325. [Google Scholar] [CrossRef] [Green Version]
- Zhang, T.; Liu, P.; Sun, X.; Zhang, C.; Wang, M.; Xu, J.; Huang, L. Application of an advanced spatiotemporal model for PM2. 5 prediction in Jiangsu Province, China. Chemosphere 2020, 246, 125563. [Google Scholar] [CrossRef]
- Lal, B.; Sanjaya, S.T. Prediction of dust concentration in open cast coal mine using artificial neural network. Atmos. Pollut. Res. 2012, 3, 211–218. [Google Scholar] [CrossRef] [Green Version]
- Lu, W.; Yu, W. Prediction of particulate matter at street level using artificial neural networks coupling with chaotic particle swarm optimization algorithm. Build. Environ. 2014, 78, 111–117. [Google Scholar]
- Zhou, S.; Li, W.; Qiao, J. Prediction of PM2.5 concentration based on recurrent fuzzy neural network. In Proceedings of the 2017 36th Chinese Control Conference (CCC), Dalian, China, 26–28 July 2017. [Google Scholar]
- Park, S.; Kim, M.; Kim, M.; Namgung, H.G.; Kim, K.T.; Cho, K.H.; Kwon, S.B. Predicting PM10 concentration in Seoul metropolitan subway stations using artificial neural network (ANN). J. Hazard. Mater. 2018, 341, 75–82. [Google Scholar] [CrossRef] [PubMed]
- Shtein, A.; Kloog, I.; Schwartz, J.; Silibello, C.; Michelozzi, P.; Gariazzo, C.; Stafoggia, M. Estimating Daily PM2. 5 and PM10 over Italy Using an Ensemble Model. Environ. Sci. Technol. 2019, 54, 120–128. [Google Scholar] [PubMed]
- Zhao, J.; Deng, F.; Cai, Y.; Chen, J. Long short-term memory-Fully connected (LSTM-FC) neural network for PM2.5 concentration prediction. Chemosphere 2019, 220, 486–492. [Google Scholar] [CrossRef] [PubMed]
- Zamani Joharestani, M.; Cao, C.; Ni, X.; Bashir, B.; Talebiesf, S. PM2.5 Prediction Based on Random Forest, XGBoost, and Deep Learning Using Multisource Remote Sensing Data. Atmosphere 2019, 10, 373. [Google Scholar] [CrossRef] [Green Version]
- Pak, U.; Ma, J.; Ryu, U.; Ryom, K.; Juhyok, U.; Pak, K.; Pak, C. Deep learning-based PM2. 5 prediction considering the spatiotemporal correlations: A case study of Beijing, China. Sci. Total. Environ. 2020, 699, 133561. [Google Scholar] [CrossRef]
- Cheng, Y.; Li, X.; Li, Z.; Jiang, S.; Jiang, X. Fine-grained air quality monitoring based on gaussian process regression. In Proceedings of the International Conference on Neural Information Processing, Kuching, Malaysia, 3–6 November 2014. [Google Scholar]
- Reggente, M.; Peters, J.; Theunis, J.; Van Poppel, M.; Rademaker, M.; Kumar, P.; De Baets, B. Prediction of ultrafine particle number concentrations in urban environments by means of Gaussian process regression based on measurements of oxides of nitrogen. Environ. Model. Softw. 2014, 6, 135–150. [Google Scholar] [CrossRef] [Green Version]
- Liu, H.; Yang, C.; Huang, M.; Wang, D.; Yoo, C. Modeling of subway indoor air quality using Gaussian process regression. J. Hazard. Mater. 2018, 359, 266–273. [Google Scholar] [CrossRef]
- Reinsel, G.C. Vector Arma Time Series Models and Forecasting. In Elements of Multivariate Time Series Analysis; Springer: New York, NY, USA, 1993; pp. 21–51. [Google Scholar]
- Hensman, J.; Nicolò, F.; Neil, D.L. Gaussian processes for Big data. In Proceedings of the Twenty-Ninth Conference on Uncertainty in Artificial Intelligence, Bellevue, WA, USA, 11–15 August 2013; pp. 282–290. [Google Scholar]
- Suits, D.B. Use of dummy variables in regression equations. J. Am. Stat. Assoc. 1957, 52, 548–551. [Google Scholar] [CrossRef]
- Romanillos, G.; Javier, G. Cyclists do better. Analyzing urban cycling operating speeds and accessibility. Int. J. Sustain. Transp. 2020, 14, 448–464. [Google Scholar] [CrossRef]
- Lee, I.; Julie, C. Formalizing the HRM and firm performance link: The S-curve hypothesis. Int. J. Hum. Resour. Manag. 2020, 1–32. [Google Scholar] [CrossRef]
- Liu, H.; Zhu, D.; Chao, C. A hybrid framework for forecasting PM2. 5 concentrations using multi-step deterministic and probabilistic strategy. Air Qual. Atmos. Health 2019, 12, 785–795. [Google Scholar] [CrossRef]
- Wu, H.; Hui, L.; Zhu, D. PM2.5 concentrations forecasting using a new multi-objective feature selection and ensemble framework. Atmos. Pollut. Res. 2020, 11, 1187–1198. [Google Scholar] [CrossRef]
- Liu, Y.; Wang, J.; Zhao, X.; Wang, J.; Wang, X.; Hou, L.; Bai, Z. Characteristics, Secondary Formation and Regional Contributions of PM2. 5 Pollution in Jinan during Winter. Atmosphere 2020, 11, 273. [Google Scholar] [CrossRef] [Green Version]
Previous Research | Independent Variables | Dependent Variables | |||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Complexity | Research | Methodology | Location | Time | CO | NO2 | O3 | SO2 | Temp- erature | Rain- fall | Wind Direction | Wind Speed | Topo- graphic | Traffic Volume | Ultra Violet | Power Plant | PM |
Linear Model | Chudnovsky et. al [7] | AOD Retrieval + Regression | 🗸 | 🗸 | 🗸 | 🗸 | PM2.5 | ||||||||||
Garcia et al. [8] | Generalized Linear Model | 🗸 | 🗸 | 🗸 | 🗸 | 🗸 | 🗸 | 🗸 | 🗸 | 🗸 | 🗸 | PM10 | |||||
Zhang et al. [9] | Spatio-temporal Land-use Regression | 🗸 | 🗸 | 🗸 | 🗸 | 🗸 | PM2.5 | ||||||||||
Neural Network Model | Lal et al. [10] | Vanilla ANN | 🗸 | 🗸 | 🗸 | 🗸 | 🗸 | 🗸 | 🗸 | PM10 PM2.5 | |||||||
Lu et al. [11] | ANN + CPSO Algorithm | 🗸 | 🗸 | 🗸 | 🗸 | 🗸 | 🗸 | PM10 PM1 | |||||||||
Zhou et al. [12] | Recurren Fuzzy NN | 🗸 | 🗸 | 🗸 | 🗸 | 🗸 | 🗸 | 🗸 | 🗸 | 🗸 | 🗸 | PM2.5 | |||||
Park et al. [13] | Vanilla ANN | 🗸 | 🗸 | PM10 | |||||||||||||
Shtein et al. [14] | Ensemble model | 🗸 | 🗸 | PM10 PM2.5 | |||||||||||||
Zhao et al. [15] | LSTM-FC | 🗸 | 🗸 | 🗸 | 🗸 | 🗸 | 🗸 | 🗸 | 🗸 | 🗸 | 🗸 | PM2.5 | |||||
Zamani et al. [16] | Random Forest + eXGB + Deep NN | 🗸 | 🗸 | 🗸 | 🗸 | 🗸 | PM2.5 | ||||||||||
Pak et al. [17] | CNN-LSTM | 🗸 | 🗸 | 🗸 | 🗸 | 🗸 | 🗸 | 🗸 | 🗸 | 🗸 | 🗸 | PM2.5 | |||||
Nonlinear and Nonparametric Regression Model | Cheng et al. (2014) [18] | Gaussian Process Regression | 🗸 | 🗸 | PM2.5 | ||||||||||||
Reggente et al. [19] | Gaussian Process Regression | 🗸 | 🗸 | 🗸 | 🗸 | 🗸 | 🗸 | PM0.1 | |||||||||
Liu et al. [20] | Gaussian Process Regression | 🗸 | 🗸 | 🗸 | 🗸 | 🗸 | PM2.5 | ||||||||||
Nonlinear and Nonparametric Regression Model | Ours | Gaussian Process Regression | 🗸 | 🗸 | 🗸 | 🗸 | 🗸 | 🗸 | 🗸 | 🗸 | 🗸 | 🗸 | 🗸 | 🗸 | 🗸 | 🗸 | PM10 PM2.5 |
Type | Variable | Information | Unit |
---|---|---|---|
Location | Latitude | Degree | |
Longitude | Degree | ||
Time | Day of Year | Year/Month/Day | |
Hour of Day | h | ||
Meteorological Information | Temperature | °C | |
Precipitation | mm/h | ||
Wind Direction | Categorical | ||
Wind Speed | m/s | ||
Topographic Information | Topographic Categories | Categorical | |
Traffic Information | Agent Traffic Volume | Vehicles/Hr | |
Air Quality Information | Sulfur Dioxide (SO2) | ppm | |
Carbon Monoxide (CO) | ppm | ||
Nitrogen Dioxide (NO2) | ppm | ||
Ozone (O3) | ppm | ||
Ultraviolet Information | UVA Max | MJ/m2 | |
UVA Sum | MJ/m2 | ||
UVB Max | KJ/m2 | ||
UVB Sum | KJ/m2 | ||
Power Plant | Usage of Thermal Power Plant | % |
Kernel | Information | Variables | Kernel Type 1 (Matérn) | Kernel Type 2 (RBF) | Kernel Type 3 (Matérn + RBF) |
---|---|---|---|---|---|
Latitude, Longitude | Matérn 3/2 (M32) | RBF | RBF | ||
Day of Year | Periodic | Periodic | Periodic | ||
Hour of Day | Periodic | Periodic | Periodic | ||
Temperature | Matérn 3/2 (M32) | RBF | RBF | ||
Precipitation | Matérn 3/2 (M32) | RBF | Matérn 3/2 (M32) | ||
Wind Direction | Matérn 3/2 (M32) | RBF | RBF | ||
Wind Speed | Matérn 3/2 (M32) | RBF | RBF | ||
Topographic Categories | Matérn 3/2 (M32) | RBF | RBF | ||
Agent Traffic Volume | Matérn 3/2 (M32) | RBF | RBF | ||
Sulfur Dioxide () | Matérn 3/2 (M32) | RBF | Matérn 3/2 (M32) | ||
Carbon Monoxide () | Matérn 3/2 (M32) | RBF | Matérn 3/2 (M32) | ||
Nitrogen Dioxide () | Matérn 3/2 (M32) | RBF | Matérn 3/2 (M32) | ||
Ozone () | Matérn 3/2 (M32) | RBF | Matérn 3/2 (M32) | ||
Ultraviolet | Matérn 3/2 (M32) | RBF | RBF | ||
Usage of Thermal Power Plant | Matérn 3/2 (M32) | RBF | RBF |
The Number of Missing Instances in Each Air Quality Data (%) | Total Instance | ||||||
---|---|---|---|---|---|---|---|
Year | PM10 | PM2.5 | |||||
2018 | 46,079 (4.16%) | 55,436 (5.00%) | 71,456 (6.44%) | 50,039 (4.51%) | 68,782 (6.20%) | 131,734 (11.88%) | 1,108,992 |
2017 | 31,978 (3.00%) | 42,871 (4.02%) | 58,319 (5.46%) | 39,153 (3.67%) | 47,856 (4.48%) | 428,544 (40.15%) | 1,067,304 |
LV2 Code | LV2 Name | LVL Code | LVL Name | Dummy Variable |
---|---|---|---|---|
110 | Residential Area | 1 | Urban Area | [1, 0, 0, 0] |
120 | Industrial Area | |||
130 | Commercial Area | |||
140 | Amusement Facility Area | |||
150 | Traffic Area | |||
160 | Public Facilities Area | |||
610 | Mining Area | |||
620 | Artificial Area | |||
210 | Rice Paddy Area | 2 | Grassland Area | [0, 1, 0, 0] |
220 | Farming Area | |||
230 | House Farming Area | |||
240 | Orchard Area | |||
250 | Other Farming Area | |||
410 | Natural Grassland Area | |||
420 | Golf Course Area | |||
430 | Other Grassland Area | |||
310 | Broad-leaf Forest Area | 3 | Forest Area | [0, 0, 1, 0] |
320 | Coniferous Forest Area | |||
330 | Mixed Forest Area | |||
510 | Inland Wetland Area | 4 | Water Area | [0, 0, 0, 1] |
520 | Coastal Wetland Area | |||
710 | Fresh Water Area | |||
720 | Sea Water Area | |||
999 | Unknown Area | 5 | Unknown Area | [0, 0, 0, 0] |
Model | Model Specification | PM10 | PM2.5 | ||
---|---|---|---|---|---|
RMSE (g/m3) | IOA | RMSE (g/m3) | IOA | ||
Linear Regression () | 22.19 ± 4.65 (14.43) | 0.56 ± 0.05 (0.15) | 22.04 ± 5.08 (15.78) | 0.55 ± 0.05 (0.15) | |
26.17 ± 5.07 (15.73) | 0.26 ± 0.03 (0.08) | 29.30 ± 6.63 (20.57) | 0.21 ± 0.05 (0.16) | ||
26.03 ± 4.75 (14.73) | 0.29 ± 0.03 (0.08) | 28.41 ± 6.29 (19.53) | 0.22 ± 0.05 (0.15) | ||
26.44 ± 4.36 (13.54) | 0.29 ± 0.03 (0.09) | 28.09 ± 5.85 (18.14) | 0.23 ± 0.05 (0.14) | ||
25.80 ± 5.01 (15.56) | 0.26 ± 0.03 (0.09) | 27.59 ± 5.78 (17.94) | 0.21 ± 0.04 (0.12) | ||
25.92 ± 5.00 (15.53) | 0.27 ± 0.03 (0.09) | 27.65 ± 5.75 (17.84) | 0.22 ± 0.04 (0.12) | ||
26.06 ± 4.99 (15.49) | 0.28 ± 0.03 (0.09) | 27.71 ± 5.72 (17.75) | 0.23 ± 0.04 (0.12) | ||
33.40 ± 4.46 (13.84) | 0.30 ± 0.04 (0.13) | 29.05 ± 6.15 (19.1) | 0.30 ± 0.06 (0.18) | ||
45.78 ± 6.95 (21.57) | 0.33 ± 0.04 (0.13) | 28.11 ± 5.83 (18.1) | 0.30 ± 0.05 (0.17) | ||
51.92 ± 7.45 (23.13) | 0.35 ± 0.04 (0.11) | 32.09 ± 6.56 (20.35) | 0.32 ± 0.05 (0.15) | ||
45.65 ± 7.01 (21.77) | 0.15 ± 0.02 (0.05) | 46.08 ± 7.02 (21.8) | 0.10 ± 0.01 (0.04) | ||
45.55 ± 7.01 (21.74) | 0.20 ± 0.02 (0.06) | 45.73 ± 6.99 (21.68) | 0.14 ± 0.02 (0.06) | ||
45.89 ± 6.98 (21.67) | 0.22 ± 0.02 (0.06) | 45.55 ± 6.95 (21.58) | 0.16 ± 0.02 (0.06) | ||
23.89 ± 4.55 (14.12) | 0.53 ± 0.05 (0.17) | 29.60 ± 4.75 (14.73) | 0.41 ± 0.05 (0.15) | ||
23.56 ± 4.96 (15.4) | 0.56 ± 0.05 (0.16) | 31.11 ± 5.40 (16.75) | 0.42 ± 0.05 (0.15) | ||
21.04 ± 4.66 (14.47) | 0.59 ± 0.05 (0.15) | 25.60 ± 4.97 (15.41) | 0.49 ± 0.05 (0.14) | ||
21.12 ± 4.65 (14.44) | 0.59 ± 0.05 (0.15) | 25.56 ± 4.96 (15.38) | 0.49 ± 0.05 (0.14) | ||
21.20 ± 4.65 (14.42) | 0.59 ± 0.05 (0.15) | 25.52 ± 4.95 (15.36) | 0.49 ± 0.05 (0.14) | ||
47.79 ± 10.26 (31.84) | 0.43 ± 0.05 (0.17) | 46.62 ± 6.19 (19.22) | 0.31 ± 0.04 (0.13) | ||
53.82 ± 14.01 (43.47) | 0.46 ± 0.04 (0.13) | 54.28 ± 7.86 (24.38) | 0.33 ± 0.05 (0.15) | ||
60.49 ± 13.31 (41.3) | 0.45 ± 0.04 (0.12) | 55.35 ± 14.15 (43.9) | 0.35 ± 0.05 (0.14) | ||
50.65 ± 7.26 (22.53) | 0.19 ± 0.02 (0.06) | 44.93 ± 6.54 (20.3) | 0.16 ± 0.02 (0.05) | ||
46.55 ± 7.07 (21.94) | 0.23 ± 0.02 (0.06) | 43.54 ± 7.18 (22.27) | 0.20 ± 0.03 (0.08) | ||
45.86 ± 7.44 (23.1) | 0.25 ± 0.02 (0.07) | 45.65 ± 6.89 (21.38) | 0.19 ± 0.02 (0.06) | ||
Gaussian Process Regression | GPR - (Matérn) | 21.10 ± 4.29 (13.33) | 0.61 ± 0.05 (13.93) | 21.96 ± 4.97 (15.42) | 0.58 ± 0.05 (0.15) |
GPR - (RBF) | 21.13 ± 4.28 (13.29) | 0.59 ± 0.05 (13.78) | 19.16 ± 4.71 (14.63) | 0.61 ± 0.04 (0.14) | |
GPR - (Matérn + RBF) | 20.97 ± 4.40 (13.67) | 0.60 ± 0.05 (13.92) | 21.92 ± 4.89 (15.16) | 0.57 ± 0.05 (0.15) |
© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Jang, J.; Shin, S.; Lee, H.; Moon, I.-C. Forecasting the Concentration of Particulate Matter in the Seoul Metropolitan Area Using a Gaussian Process Model. Sensors 2020, 20, 3845. https://doi.org/10.3390/s20143845
Jang J, Shin S, Lee H, Moon I-C. Forecasting the Concentration of Particulate Matter in the Seoul Metropolitan Area Using a Gaussian Process Model. Sensors. 2020; 20(14):3845. https://doi.org/10.3390/s20143845
Chicago/Turabian StyleJang, JoonHo, Seungjae Shin, Hyunjin Lee, and Il-Chul Moon. 2020. "Forecasting the Concentration of Particulate Matter in the Seoul Metropolitan Area Using a Gaussian Process Model" Sensors 20, no. 14: 3845. https://doi.org/10.3390/s20143845
APA StyleJang, J., Shin, S., Lee, H., & Moon, I.-C. (2020). Forecasting the Concentration of Particulate Matter in the Seoul Metropolitan Area Using a Gaussian Process Model. Sensors, 20(14), 3845. https://doi.org/10.3390/s20143845