Forecasting Daily of Surface Ozone Concentration in the Grand Casablanca Region Using Parametric and Nonparametric Statistical Models
Abstract
:1. Introduction
2. Materials and Methods
2.1. Study Area and Data Collection
2.2. Modelling Approach
2.3. Model Evaluation
- The coefficient of determination denoted . This statistic (Equation (1)) provides a measure of the proportion of the variance in the response variable that is predictable from the explanatory variables. It gives some information about the goodness of fit of a model. It is ranges from 0 to 1: the closer its value is to 1 the better the model is.
- The Root Mean Squared Error () is the standard deviation of the residuals (prediction errors). This is computed according to the following expression (Equation (2)):The smallest value of this criterion corresponds to the best goodness of fit of the model.To assess the predictive capacity of the models, we use the RMSE criterion calculated from the observed data of summer 2015 named (Equation (3)):Obviously, the best predictive model corresponds to the smallest .In the same way, the of prevision based on the real forecasted meteorological data set on 2015 named is defined as follows (Equation (4)):
3. Results
3.1. Data Preparation
3.2. Internal Validation: Goodness of Fit
3.3. External Validation: Performance Evaluation
- , by testing the models on observed meteorological data.
- , by testing the models on the forecasted meteorological data.
3.4. Selected Forecast Model
4. Discussion
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
Abbreviations
GCR | Grand Casablanca Region |
GDM | General Directorate of Meteorology |
ALADIN | Limited Area Dynamic Adaptation International Development |
GADM | Global Administrative Areas |
MLR | Multiple Linear Regression |
SPLS | Sparse Partial Least Square |
LASSO | Least Absolute Shrinkage and Selection Operator |
CART | Classification and Regression Tree |
RF | Random Forest |
VIF | Variance Inflation Factor |
Appendix A. Parametric Models
Appendix A.1. Multiple Linear Regression Model (MLR Model)
Appendix A.2. Sparse PLS Regression (SPLS)
Appendix A.3. Lasso Regression
Appendix B. Nonparametric Models
Appendix B.1. Classification and Regression Tree (CART)
Appendix B.2. Bagging
Appendix B.3. Random Forests (RF)
Appendix C. Comparison Table of Parametric and Nonparametric Models
Appendix C.1. Training Period: (2014 –2015) (Case 1)
Models/Criteria | |||||
---|---|---|---|---|---|
SPLS | 0.827 | 9.299 | 12.77 | 13.98 | 8 |
Lasso | 0.801 | 9.233 | 12.77 | 13.63 | 14 |
CART | 0.803 | 9.909 | 13.69 | 13.81 | 24 |
Bagging | 0.802 | 8.704 | 13.44 | 14.04 | 24 |
RF | 0.774 | 9.579 | 16.80 | 17.40 | 24 |
Appendix C.2. Training Period: (2013–2015) (Case 2)
Models/Criteria | |||||
---|---|---|---|---|---|
SPLS | 0.761 | 11.40 | 9.595 | 8.92 | 6 |
Lasso | 0.751 | 10.94 | 9.151 | 7.63 | 13 |
CART | 0.757 | 11.48 | 12.41 | 12.62 | 24 |
Bagging | 0.734 | 10.30 | 10.39 | 10.19 | 24 |
RF | 0.701 | 11.78 | 10.21 | 8.40 | 24 |
Appendix D
Test | p-Value | |
---|---|---|
Normality (Shapiro–Wilk normality) | Residuals normality | 0.049 |
Homoscedasticity (Studentized Breusch–Pagan) | Homoscedasticity | 0.8361 |
Autocorrelation (D–W Autocorrelation ) | = 0 | 0.314 |
Linearity (Harvey–Collier ) | Nonlinear relation | 0.004 |
References
- Liu, J.C.; Peng, R.D. Health effect of mixtures of ozone, nitrogen dioxide, and fine particulates in 85 US counties. Air Qual Atmos Health 2018, 11, 311–324. [Google Scholar] [CrossRef]
- Lin, X.; Yuan, Z.; Yang, L.; Luo, H.; Li, W. Impact of extreme meteorological events on ozone 346in the pearl river delta, China. Aerosol Air Qual. Res. 2019. [Google Scholar] [CrossRef]
- Wang, T.; Xue, L.; Brimblecombe, P.; Fat Lam, Y.; Li, L.; Zhang, L. Ozone pollution in China: A review of concentrations, meteorological influences, chemical precursors, and effects. Sci. Total Environ. 2017, 575, 1582–15961. [Google Scholar] [CrossRef] [PubMed]
- Khomsi, K.; Chelhaoui, Y.; Alilou, S.; Souri, R.; Najmi, H.; Souhaili, Z. Concurrent heatwaves and extreme Ozone (O3) episodes: Combined atmospheric patterns and impact on human health. Earth Space Sci. Open Arch. 2020, 16, 2020. [Google Scholar] [CrossRef]
- World Population Prospects United Nations Population Estimates and Projections of Major Urban Agglomerations. (2019 Revision). Available online: https://worldpopulationreview.com/world-cities/casablanca-population (accessed on 15 April 2021).
- Yang, L.; Xie, D.; Yuan, Z.; Huang, Z.; Wu, H.; Han, J.; Liu, L. Quantification of regional ozone pollution characteristics and its temporal evolution: Insights from the identification of the impacts of meteorological conditions and emissions. Atmosphere 2021, 12, 279. [Google Scholar] [CrossRef]
- Fang, C.; Wang, L.; Wang, J. Analysis of the Spatial–Temporal Variation of the Surface Ozone Concentration and Its Associated Meteorological Factors in Changchun. Environments 2019, 6, 46. [Google Scholar] [CrossRef]
- Anenberg, S.C.; Horowitz, L.W.; Tong, D.Q.; West, J.J. An estimate of the global burden of anthropogenic ozone and fine particulate matter on premature human mortality using atmospheric modeling. Environ. Health Perspect. 2010, 118, 1189–1195. [Google Scholar] [CrossRef]
- Green, R.; Broadwin, R.; Malig, B.; Basu, R.; Gold, E.B.; Qi, L.; Sternfeld, B.; Bromberger, J.T.; Greendale, G.A.; Kravitz, H.M.; et al. Long- and short-term exposure to air pollution and inflammatory/hemostatic markers in midlife women. Epidemiology 2016, 27, 211–220. [Google Scholar] [CrossRef]
- Van Eijkeren, J.C.; Freijer, J.I.; Van Bree, L. A model for the effect of health of repeated exposure to ozone. Environ. Model Softw. 2002, 17, 553–562. [Google Scholar]
- Leelossy, Á.; Molnár, F.; Izsák, F.; Havasi, Á.; Lagzi, I.; Mészáros, R. Dispersion modeling of air pollutants in the atmosphere: A review. Cent. Eur. J. Geosci. 2014, 6, 257–278. [Google Scholar] [CrossRef]
- Zhang, J.; Ding, W. Prediction of Air Pollutants Concentration Based on an Extreme Learning Machine: The Case of Hong Kong. Int. J. Environ. Res. Public Health 2017, 14, 114. [Google Scholar] [CrossRef] [PubMed]
- Thompson, M.L.; Reynolds, J.; Cox, L.; Guttorp, P.; Sampson, P.D. A review of statistical methods for the meteorological adjustment of tropospheric ozone. Atmos. Environ. 2001, 35, 617–630. [Google Scholar] [CrossRef]
- Sousa, S.I.V.; Martins, F.G.; Alvim-Ferraz, M.C.M.; Pereira, M.C. Multiple Linear Regression and Artificial Neural Networks Based on Principal Components to Predict Ozone Concentrations. Environ. Modell. Softw. 2007, 22, 97–103. [Google Scholar] [CrossRef]
- Zhang, Y.; Bocquet, M.; Mallet, V.; Seigneur, C.; Baklanov, A. Real-time air quality forecasting, part I: History, techniques, and current status. Atmos. Environ. 2012, 60, 632–655. [Google Scholar] [CrossRef]
- Ben Ishak, A.; Ben Daoud, M.; Trabelsi, A. Ozone Concentration Forecasting Using Statistical Learning Approaches. J. Mater. Environ. Sci. 2017, 8, 4532–4543. [Google Scholar] [CrossRef]
- Zhan, Y.; Luo, Y.; Deng, X.; Grieneisen, M.L.; Zhang, M.; Di, B. Spatiotemporal prediction of daily ambient ozone levels across China using random forest for human exposure assessment. Environ. Pollut. 2018, 233, 464–473. [Google Scholar] [CrossRef] [PubMed]
- Lei, M.T.; Monjardino, J.; Mendes, L.; Gonçalves, D.; Ferreira, F. Macao air quality forecast using statistical methods. Air Qual. Atmos. Health 2019, 12, 1049–1057. [Google Scholar] [CrossRef]
- Jahn, S.; Hertig, E. Statistical modelling of combined ozone-temperature events in Europe. In Proceedings of the EGU General Assembly 2020, Online, 4–8 May 2020. EGU2020-1314. [Google Scholar] [CrossRef]
- Allu, S.K.; Srinivasan, S.; Maddala, R.K.; Reddy, A.; Anupoju, G.R. Seasonal ground level ozone prediction using multiple linear regression (MLR) model. Model. Earth Syst. Environ. 2020, 6, 1981–1989. [Google Scholar] [CrossRef]
- Iglesias-Gonzalez, S.; Huertas-Bolanos, M.E.; Hernandez-Paniagua, I.Y.; Mendoza, A. Explicit Modeling of Meteorological Explanatory Variables in Short-Term Forecasting of Maximum Ozone Concentrations via a Multiple Regression Time Series Framework. Atmosphere 2020, 11, 1304. [Google Scholar] [CrossRef]
- Oufdou, H.; Bellanger, L.; Bergam, A.; El Ghaziri, A.; Khomsi, K.; Qannari, E. Comparison of Different Regularized and Shrinkage Regression Methods to Predict Daily Tropospheric Ozone Concentration in the Grand Casablanca Area. Adv. Pure Math. 2018, 8, 793–812. [Google Scholar] [CrossRef]
- Bai, L.; Wang, J.; MaID, X.; Lu, H. Air Pollution Forecasts: An Overview. Int. J. Environ. Res. Public Health 2018, 15, 780. [Google Scholar] [CrossRef] [PubMed]
- World Urbanization Prospects—United Nations Population Estimates and Projections of Major Urban Agglomerations. Available online: https://worldpopulationreview.com/world-cities/casablanca-population (accessed on 15 April 2021).
- Wold, H. Estimation of Principal Components and Related Models by Iterative Least Squares. In Multivariate Analysis; Krishnaiah, P.R., Ed.; Academic Press: New York, NY, USA, 1966; pp. 391–420. [Google Scholar]
- Hastie, T.; Tibshirani, R.; Friedman, J. The Element of Statistical Learning: Data Mining, Inference, and Prediction; Springer: Berlin, Germany, 2009; ISBN 978-0-387-84858-7. [Google Scholar]
- Tibshirani, R. Regression shrinkage and selection via the lasso: A retrospective. J. R. Stat. Soc. Ser. B Stat. Method. 2011, 73, 273–282. [Google Scholar] [CrossRef]
- Breiman, L.; Friedman, J.H.; Olshen, R.A.; Stone, C.J. Classification and Regression Trees; Chapman & Hall: New York, NY, USA, 1984. [Google Scholar]
- Breiman, L. Bagging Predictors. Mach. Learn. 1996, 24, 123–140. [Google Scholar] [CrossRef]
- Breiman, L. Random Forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
- Souza, A.; Aristones, F.; Pavão, H.; Fernandes, W. Development of a Short-Term Ozone Prediction Tool in Campo Grande-MS-Brazil Area Based on Meteorological Variables. Open J. Air Pollut. 2014, 3, 42–51. [Google Scholar] [CrossRef]
- Robeson, S.M.; Steyn, D.G. Evaluation and comparison of statistical forecast models for daily maximum ozone concentrations. Almos. Environ. 1990, 246, 303–312. [Google Scholar] [CrossRef]
- Li, H.; Zhu, Y.; Zhao, Y.; Chen, T.; Jiang, Y.; Shan, Y.; Liu, Y.; Mu, J.; Yin, X.; Wu, D.; et al. Évaluation de la performance des capteurs de qualité de l’air à faible coût dans une station de haute montagne avec des conditions météorologiques complexes. Atmosphere 2020, 11, 212. [Google Scholar] [CrossRef]
- Kovac-Andric, E.; Brana, J.; Gvozdic, V. Impact of Meteorological Factors on Ozone Concentrations Modelled by Time Series Analysis and Multivariate Statistical Methods. EcologicalInformatics 2009, 4, 117–122. [Google Scholar] [CrossRef]
- Chaloulakou, A.; Assimacopoulos, D.; Lekkas, T. Forecasting Daily Maximum Ozone Concentrations in the Athens Basin. Environ. Monit. Assess. 1999, 56, 97–112. [Google Scholar] [CrossRef]
- Di Carlo, P.; Pitari, G.; Mancini, E.; Gentile, S.; Pichelli, E.; Visconti, G. Evolution of Surface Ozone in Central Italy Based on Observations and Statistical Model. J. Geophys. Res. D 2007, 112, 10316. [Google Scholar] [CrossRef]
- Barrero, M.A.; Grimalt, J.O.; Canton, L. Prediction of Daily Ozone Concentration Maxima in the Urban Atmosphere. Chemom. Intell. Lab. Syst. 2006, 80, 67–76. [Google Scholar] [CrossRef]
- Marzuki, I.; Al-Mahfoodh, N.; Samsuri, A.M. Development of Ozone Prediction Model in Urban Area. Int. J. Innov. Technol. Explor. Eng. 2019, 8. [Google Scholar] [CrossRef]
- Scheifinger, H.; Stohl, A.; Kromp-Kolb, H.; Spangl, W. A statistical method for predicting daily maximum ozone concentrations. Gefahrstaffe, Reinhaltung der Luft 1996, 56, 133–137. [Google Scholar]
- Ryan, W.F. Forecasting severe ozone episodes in the Baltimore metropolitan area. Atmos. Environ. 1995, 29, 2387–2399. [Google Scholar] [CrossRef]
- Genuer, R.; Poggi, J.M.; Tuleau, C. Variable selection using random forests. Pattern Recognit. Lett. Elsevier 2010, 31, 2225–2236. [Google Scholar] [CrossRef]
- Gómez-Losada, A.; Asencio-Cortés, G.; Martínez-Álvarez, F.; Riquelme, J.C. A novel approach to forecast urban surface-level ozone considering heterogeneous locations and limited information. Environ. Modell. Softw. 2018, 110, 52–61. [Google Scholar] [CrossRef]
- Stafoggia, M.; Johansson, C.; Glantz, P.; Renzi, M.; Shtein, A.; de Hoogh, K.; Kloog, I.; Davoli, M.; Michelozzi, P.; Bellander, T. A Random Forest Approach to Estimate Daily Particulate Matter, Nitrogen Dioxide, and Ozone at Fine Spatial Resolution in Sweden. Atmosphere 2020, 11, 239. [Google Scholar] [CrossRef]
- Geisser, S. The predictive sample reuse method with applications. J. Am. Statist. Assoc. 1975, 70, 320–328. [Google Scholar] [CrossRef]
- James, G.; Witten, D.; Hastie, T.; Tibshirani, R. An Introduction to Statistical Learning; Springer: New York, NY, USA, 2013. [Google Scholar]
- Beretta, L.; Santaniello, A. Nearest neighbor imputation algorithms: A critical evaluation. BMC Med. Inf. Decis. Mak. 2016, 16, 74. [Google Scholar] [CrossRef]
- Jolliffe, I.T. Principal Component Analysis, 2nd ed.; Springer: New York, NY, USA, 2002. [Google Scholar]
- Alvim-Ferraz, M.C.; Sousa, S.I.; Pereira, M.C.; Martins, F.G. Contribution of anthropogenic pollutants to the increase of tropospheric ozone levels in the Oporto Metropolitan Area, Portugal since the 19th century. Environ. Pollut. 2006, 140, 516–524. [Google Scholar] [CrossRef]
- Bekesiene, S.; Meidute-Kavaliauskiene, I.; Vasiliauskiene, V. Accurate Prediction of Concentration Changes in Ozone as an Air Pollutant by Multiple Linear Regression and Artificial Neural Networks. Mathematics 2021, 9, 356. [Google Scholar] [CrossRef]
- Lei, M.T.; Monjardino, J.; Mendes, L.; Gonçalves, D.; Ferreira, F. Statistical Forecast of Pollution Episodes in Macao during National Holiday and COVID-19. Int. J. Environ. Res. Public Health 2020, 17, 5124. [Google Scholar] [CrossRef] [PubMed]
- Pandya, S.; Ghayvat, H.; Sur, A.; Awais, M.; Kotecha, K.; Saxena, S.; Jassal, N.; Pingale, G. Pollution Weather Prediction System: Smart Outdoor Pollution Monitoring and Prediction for Healthy Breathing and Living. Sensors 2020, 20, 5448. [Google Scholar] [CrossRef] [PubMed]
- Altman, N.; Krzywinski, M. Ensemble methods: Bagging and random forests. Nat. Methods 2017, 14, 933–934. [Google Scholar] [CrossRef]
- Cutler, A.; Cutler, R.; Stevens, J.R. Random Forests. Chapter 5: Ensemble Machine Learning: Methods and Applications; Springer: Berlin/Heidelberg, Germany, 2012. [Google Scholar] [CrossRef]
Abbreviation | Variable | Unit |
---|---|---|
TMPMAX | Maximal temperature | C |
TMPMIN | Minimal temperature | C |
TMPMOY | Average temperature | C |
RRQUOT | Total precipitation | Mm |
DRINSQ | Sunshine duration | Heure |
HUMREL06h | Relative humidity at 6 h | % |
HUMREL12h | Relative humidity at 12 h | % |
HUMREL18h | Relative humidity at 18 h | % |
PRESTN06h | Pressure at the station level at 6 h | HPA |
PRESTN12h | Pressure at the station level at 12 h | HPA |
PRESTN18h | Pressure at the station level at 18 h | HPA |
FFVM06h | Wind force at 6 h | m/s |
FFVM12h | Wind force at 12 h | m/s |
FFVM18h | Wind force at 18 h | m/s |
DDVM06h | Wind direction at 6 h | Degree |
DDVM12h | Wind direction at 12 h | Degree |
DDVM18h | Wind direction at 18 h | Degree |
Vx06 | Horizontal wind at 6 h | m/s |
Vx12 | Horizontal wind at 12 h | m/s |
Vx18 | Horizontal wind at 18 h | m/s |
Vy06 | Vertical wind at 6 h | m/s |
Vy12 | Vertical wind at 12 h | m/s |
Vy18 | Vertical wind at 18 h | m/s |
O concentrations of previous day | µg/m3 | |
Ozone concentrations | µg/m3 |
Variable | Min | Max | Mean | St. Dev | VIF |
---|---|---|---|---|---|
TMPMAX | 16.2 | 37.5 | 24.5 | 3.09 | 4104.12 |
TMPMIN | 8.20 | 23.50 | 18.35 | 3.02 | 3888.51 |
TMPMOY | 12.40 | 29.90 | 21.45 | 2.88 | 14,178.12 |
RRQUOT | 0.00 | 19.30 | 0.39 | 1.98 | 1.78 |
DRINSQ | 0.00 | 13.30 | 9.72 | 2.79 | 1.66 |
HUMREL06h | 50.00 | 100.0 | 87.42 | 8.00 | 2.03 |
HUMREL12h | 34.00 | 95.00 | 68.32 | 8.78 | 2.15 |
HUMREL18h | 28.00 | 97.00 | 75.66 | 9.66 | 2.13 |
PRESTN06h | 9997.7 | 1017.3 | 1008.2 | 2.97 | 16.07 |
PRESTN12h | 997.7 | 1016.5 | 1008.9 | 2.91 | 46.49 |
PRESTN18h | 999 | 1016 | 1008 | 2.88 | 18.12 |
FFVM06h | 0.00 | 4.00 | 1.55 | 0.80 | 1.58 |
FFVM12h | 0.00 | 6.00 | 3.58 | 0.98 | 3.17 |
FFVM18h | 0.00 | 7.00 | 3.46 | 1.04 | 2.79 |
DDVM06degre | 0.00 | 360.0 | 176.4 | 117.87 | 1.64 |
DDVM12hDEG | 0.00 | 360.0 | 227.3 | 141.63 | 2.65 |
DDVM18hDEG | 0.00 | 360.0 | 189.2 | 152.21 | 2.77 |
Vx06 | −2.95 | 3.46 | −0.05 | 1.06 | 2.71 |
Vx12 | −5.91 | 3.94 | −0.59 | 1.98 | 4.48 |
Vx18 | −5.91 | 4.50 | −0.10 | 1.84 | 5.21 |
Vy06 | −4.00 | 4.00 | 0.08 | 1.38 | 1.86 |
Vy12 | −3.06 | 6.00 | 2.75 | 1.39 | 4.25 |
Vy18 | −5.36 | 6.00 | 2.79 | 1.36 | 4.50 |
10.00 | 130.0 | 52.83 | 25.66 | 1.08 |
Models/Criteria | Nb Variables | ||||
---|---|---|---|---|---|
SPLS | 0.857 | 9.576 | 11.89 | 13.61 | 7 |
Lasso | 0.828 | 9.555 | 11.58 | 13.02 | 12 |
CART | 0.852 | 9.523 | 14.16 | 13.83 | 24 |
Bagging | 0.831 | 9.342 | 12.87 | 12.65 | 24 |
RF | 0.771 | 9.914 | 13.36 | 12.85 | 24 |
Selected model | 0.856 | 9.60 | 11.78 | 12.55 | 5 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Oufdou, H.; Bellanger, L.; Bergam, A.; Khomsi, K. Forecasting Daily of Surface Ozone Concentration in the Grand Casablanca Region Using Parametric and Nonparametric Statistical Models. Atmosphere 2021, 12, 666. https://doi.org/10.3390/atmos12060666
Oufdou H, Bellanger L, Bergam A, Khomsi K. Forecasting Daily of Surface Ozone Concentration in the Grand Casablanca Region Using Parametric and Nonparametric Statistical Models. Atmosphere. 2021; 12(6):666. https://doi.org/10.3390/atmos12060666
Chicago/Turabian StyleOufdou, Halima, Lise Bellanger, Amal Bergam, and Kenza Khomsi. 2021. "Forecasting Daily of Surface Ozone Concentration in the Grand Casablanca Region Using Parametric and Nonparametric Statistical Models" Atmosphere 12, no. 6: 666. https://doi.org/10.3390/atmos12060666
APA StyleOufdou, H., Bellanger, L., Bergam, A., & Khomsi, K. (2021). Forecasting Daily of Surface Ozone Concentration in the Grand Casablanca Region Using Parametric and Nonparametric Statistical Models. Atmosphere, 12(6), 666. https://doi.org/10.3390/atmos12060666