Analysis and Prediction of PM2.5 Pollution in Madrid: The Use of Prophet–Long Short-Term Memory Hybrid Models
Abstract
:1. Introduction
- The main contributions of this paper are:
- Development of a hybrid Prophet–LSTM model for PM2.5 prediction.
- Application of the model in a complex urban environment, like Madrid.
- Integration of meteorological and air quality data for enhanced prediction accuracy.
- Identification of the gap in the recent literature on PM2.5 prediction.
- Comparative evaluation with other predictive models demonstrating its superior performance.
2. State of the Art
2.1. Relevance of PM2.5 in Scientific Studies
2.2. European Regulations on PM2.5
- Annual limit:
- Cannual = 25 μg/m3
- plan is set to reduce this limit progressively to align with the WHO guidelines: Cannual,reduced = 10 μg/m3
- Daily limit:
- Cdaily = 50 μg/m3
- The directive restricts the daily limit to 50 µg/m3, allowing no more than 35 exceedances per year to control acute exposure:
- Nexceedances ≤ 35
- where Nexceedances is the number of times the daily limit is exceeded in a year.
2.3. Measures Implemented by the Madrid City Council to Reduce PM2.5 Levels
2.4. Focus on Predictive Models for PM2.5
3. Methodology
- Flexibility and Structure: CRISP-DM provides a flexible framework that can be adapted to different types of data mining projects while maintaining a clear structure throughout its phases. This allows us to effectively address the diverse needs and challenges that may arise in an air quality analysis project.
- Comprehensive Understanding of Business and Data: The initial stages of CRISP-DM, Business Understanding, and Data Understanding ensure that the project’s objectives are aligned with business needs and that the data are thoroughly understood before proceeding to modeling. This is crucial for an accurate and relevant analysis of air quality data.
- Data Preparation and Modeling: The methodology emphasizes rigorous data preparation, including data cleaning, construction, integration, and formatting. This meticulous approach ensures that the data are in the best possible shape for modeling. The modeling phase allows the application of various modeling techniques, ensuring the selection of the most appropriate one for the project’s specific data.
- Effective Evaluation and Deployment: The evaluation phase of CRISP-DM ensures that the developed model meets business objectives before implementation. The deployment phase facilitates the integration of the model into the real environment, ensuring that the analysis results are used for informed decision making.
- Wide Acceptance and Support: CRISP-DM is a widely accepted and used methodology in the industry, providing access to a wide range of resources, tools, and communities of practice. This facilitates the implementation and continuous support of the project.
3.1. Business Understanding
- Conducting an Initial Analysis and Understanding of the Downloaded Data: Verify the integrity of the downloaded data by checking its completeness and confirming that the recorded values are within expected ranges. This step includes removing outliers and records with significant missing data. Subsequently, integrate the air quality data from different years into a single structured dataset. This preparation process facilitates the joint analysis and comparison of measurements over time.
- Performing a Descriptive and Temporal and Spatial Trend Analysis of PM2.5: Conduct a descriptive analysis to understand the initial distribution and characteristics of the collected PM2.5 data. Additionally, analyze how PM2.5 levels vary over time and across different districts of Madrid, identifying seasonal patterns and differences between urban and suburban areas. This allows for the implementation of more effective control measures tailored to the specific needs of each district.
- Developing Predictive Models for PM2.5: Build predictive models to estimate PM2.5 levels for future dates and under different emissions scenarios. Predictive models are valuable tools for planning and decision making, enabling authorities to anticipate high pollution episodes and take proactive actions.
- Evaluating the Effectiveness of Current Implemented Measures: Assess the effectiveness of current measures implemented to reduce PM2.5 levels and informed by the analysis results. Ensuring that implemented measures are effective and based on solid scientific data will optimize mitigation efforts.
- Communicating Results to Stakeholders: Present the analysis results clearly and comprehensibly to decision makers in the Madrid City Council, as well as to other stakeholders. Effective communication of the results is essential to ensure that the analysis conclusions and recommendations are understood and applied.
3.2. Data Understanding
- Urban background: Representative of the general urban population’s exposure.
- Traffic: Located in such a way that the pollution level is mainly influenced by emissions from a nearby street or road, while avoiding measuring very small microenvironments in the immediate vicinity.
- Suburban: Located on the outskirts of the city, where the highest ozone levels are found.
Simulation Environment
3.3. Data Preparation
3.4. Modeling
3.4.1. Data Analysis Methodology
- Obtain the time-series data for PM2.5.
- Split the time series into training and testing sets.
- Apply the Prophet model to the time series:
- Decompose the series into trend, seasonality, and special events components.
- Obtain the residuals from Prophet’s prediction.
- Use the residuals as inputs for the LSTM model:
- Configure the LSTM network layers (input, LSTM, and output layers).
- Train the LSTM model using Prophet’s residuals.
- Combine Prophet’s predictions with LSTM’s residual predictions:
- Predict trend and seasonality with Prophet.
- Predict the residuals using LSTM.
- Sum the results to obtain the final prediction.
- Evaluate the model performance using error metrics (e.g., MSE, RMSE).
Implementation of the Prophet–LSTM Hybrid Model for the Prediction of PM2.5 Levels
- Trend (T(t)) can be linear:
- 2.
- Seasonality (S(t)) is modeled using Fourier terms:
- 3.
- Special events (H(t)) are modeled as additive effects:
- Forget Gate (ft):
- 2.
- Input Gate (it):
- Memory Cell Candidates ():
- 4.
- Memory Cell Update (Ct):
- 5.
- Output Gate (ot):
- 6.
- Hidden State (ht):
3.4.2. Data Analysis
- Measurement date.
- PM2.5 levels.
- Monitoring station.
- Madrid district.
Descriptive Analysis
Trend Analysis
Spatial Analysis
Predictive Analysis
- The Prophet part () makes predictions by taking into account trends and repeating patterns (like seasonal or daily changes) in pollution levels over time. In other words, Prophet handles the changes that follow regular patterns.
- The LSTM part () focuses on capturing variations or fluctuations that do not follow predictable patterns. This component tries to detect more complex changes, like unexpected spikes or nonlinear relationships, which the Prophet model alone might miss.
4. Conclusions
5. Future Work
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- World Health Organization (WHO). Air Quality Guidelines: Global Update 2023; WHO Regional Office for Europe: Copenhagen, Denmark, 2023. [Google Scholar]
- World Health Organization (WHO). Ambient (Outdoor) Air Pollution. 2021. Available online: https://www.who.int/news-room/fact-sheets/detail/ambient-(outdoor)-air-quality-and-health (accessed on 2 July 2024).
- World Health Organization (WHO). Global Health Observatory Data Repository. 2021. Available online: https://www.who.int/data/gho/data/themes/topics/topic-details/GHO/ambient-air-pollution (accessed on 2 July 2024).
- Brook, R.D.; Rajagopalan, S.; Pope, C.A., III; Brook, J.R.; Bhatnagar, A.; Diez-Roux, A.V.; Holguin, F.; Hong, Y.; Luepker, R.V.; Mittleman, M.A.; et al. Particulate Matter Air Pollution and Cardiovascular Disease: An Update to the Scientific Statement From the American Heart Association. Circulation 2020, 141, 2331–2378. [Google Scholar] [CrossRef] [PubMed]
- Hu, J.; Jia, Y.; Jia, Z.-H.; He, C.-B.; Shi, F.; Huang, X.-H. Prediction of PM2.5 Concentration Based on Deep Learning for High-Dimensional Time Series. Appl. Sci. 2024, 14, 8745. [Google Scholar] [CrossRef]
- World Health Organization (WHO). Air Quality Guidelines: Global Update 2021. Particulate Matter, Ozone, Nitrogen Dioxide, and Sulfur Dioxide; WHO Regional Office for Europe: Copenhagen, Denmark, 2021. [Google Scholar]
- Faustini, A.; Rapp, R.; Forastiere, F. Nitrogen Dioxide and Mortality: Review and Meta-Analysis of Long-term Studies. Eur. Respir. J. 2020, 56, 744–753. [Google Scholar]
- Mills, I.C.; Atkinson, R.W.; Kang, S.; Walton, H.; Anderson, H.R. Quantitative Systematic Review of the Associations between Short-term Exposure to Nitrogen Dioxide and Mortality and Hospital Admissions. BMJ Open 2021, 5, e006946. [Google Scholar] [CrossRef] [PubMed]
- World Health Organization (WHO). Health Risks of Ozone from Long-Range Transboundary Air Pollution; WHO: Geneva, Switzerland, 2021. [Google Scholar]
- Jerrett, M.; Burnett, R.T.; Pope, C.A., III; Ito, K.; Thurston, G.; Krewski, D.; Shi, Y.; Calle, E.; Thun, M. Long-term Ozone Exposure and Mortality. N. Engl. J. Med. 2021, 384, 1085–1095. [Google Scholar] [CrossRef] [PubMed]
- Turner, M.C.; Jerrett, M.; Pope, C.A., III; Krewski, D.; Gapstur, S.M.; Diver, W.R.; Beckerman, B.S.; Marshall, J.D.; Su, J.; Crouse, D.L.; et al. Long-term Ozone Exposure and Mortality in a Large Prospective Study. Am. J. Respir. Crit. Care Med. 2022, 203, 1134–1142. [Google Scholar] [CrossRef] [PubMed]
- International Agency for Research on Cancer (IARC). IARC Monographs on the Evaluation of Carcinogenic Risks to Humans. Volume 100F: Chemical Agents and Related Occupations; International Agency for Research on Cancer: Lyon, France, 2020. [Google Scholar]
- Smith, M.T. Advances in Understanding Benzene Health Effects and Susceptibility. Annu. Rev. Public Health 2020, 41, 133–148. [Google Scholar] [CrossRef] [PubMed]
- World Health Organization (WHO). Carbon Monoxide. 2021. Available online: https://www.who.int/publications/i/item/9241540737 (accessed on 12 July 2024).
- Weaver, L.K. Clinical Practice. Carbon Monoxide Poisoning. N. Engl. J. Med. 2020, 382, 1217–1225. [Google Scholar]
- IQAir. 2023 World Air Quality Report. 2023. Available online: https://www.iqair.com/sg/newsroom/waqr-2023-pr (accessed on 18 July 2024).
- Normativa Europea. Ministerio para la Transición Ecológica y el Reto Demográfico. Available online: https://www.miteco.gob.es/content/dam/miteco/images/es/Cap2_Marco%20legal_tcm30-187880.pdf (accessed on 18 July 2024).
- Directiva (UE) 2024/825 del Parlamento Europeo y del Consejo, de 28 de Febrero de 2024. Boletín Oficial del Estado. Available online: https://www.boe.es/doue/2024/825 (accessed on 18 July 2024).
- Ayuntamiento de Madrid. Plan de Calidad del Aire y Cambio Climático (Plan A). Available online: https://transparencia.madrid.es/portales/transparencia/es/Transparencia-por-sectores/Medio-ambiente/Aire/Plan-de-calidad-del-aire-y-cambio-climatico-Plan-A-2017-2020/?vgnextfmt=default&vgnextoid=fab664457127f510VgnVCM1000001d4a900aRCRD&vgnextchannel=33d9508929a56510VgnVCM1000008a4a900aRCRD (accessed on 22 July 2024).
- Ayuntamiento de Madrid. Ordenanza de Movilidad Sostenible. Available online: https://sede.madrid.es/FrameWork/generacionPDF/ANM2023_152.pdf?idNormativa=de1d9bdbdfd8d810VgnVCM2000001f4a900aRCRD&nombreFichero=ANM2023_152&cacheKey=10 (accessed on 24 July 2024).
- Ayuntamiento de Madrid. Carta de Servicios de Arbolado Urbano. Available online: https://www.madrid.es/portales/munimadrid/es/Inicio/Medio-ambiente/Parques-y-jardines/Cartas-de-servicios/Carta-de-Servicios-de-Arbolado-Urbano/?vgnextfmt=default&vgnextoid=85f4e1d27fd5d610VgnVCM1000001d4a900aRCRD&vgnextchannel=c99679ed268fe410VgnVCM1000000b205a0aRCRD (accessed on 18 July 2024).
- Ayuntamiento de Madrid. Políticas de Reducción de Emisiones de Calefacción. Available online: https://sede.madrid.es/FrameWork/generacionPDF/boam9608_1135.pdf?numeroPublicacion=9608&idSeccion=317a7f14fddae810VgnVCM2000001f4a900aRCRD&nombreFichero=boam9608_1135&cacheKey=88&guid=40f9cd1f5dd9e810VgnVCM1000001d4a900aRCRD (accessed on 14 July 2024).
- Ayuntamiento de Madrid. Red de Estaciones de Vigilancia de Calidad del Aire. Available online: https://airedemadrid.madrid.es/portales/calidadaire/es/Bases-de-datos-y-publicaciones/Bases-de-datos-de-calidad-del-aire/En-tiempo-real/?vgnextfmt=default&vgnextchannel=650a89e859517710VgnVCM1000001d4a900aRCRD (accessed on 14 July 2024).
- Ayuntamiento de Madrid. Portal de Datos Abiertos. Available online: https://datos.madrid.es/portal/site/egob (accessed on 14 July 2024).
- Mertins, K.; Heisig, P.; Vorbeck, J. Knowledge Management: Concepts and Best Practices; Springer: Berlin/Heidelberg, Germany, 2021. [Google Scholar]
- Mainka, A.; Żak, M. Synergistic or Antagonistic Health Effects of Long- and Short-Term Exposure to Ambient NO2 and PM2.5: A Review. Int. J. Environ. Res. Public Health 2022, 19, 14079. [Google Scholar] [CrossRef] [PubMed]
- Reche, C.; Tobias, A.; Viana, M. Vehicular Traffic in Urban Areas: Health Burden and Influence of Sustainable Urban Planning and Mobility. Atmosphere 2022, 13, 598. [Google Scholar] [CrossRef]
- Wang, R.Y.; Strong, D.M. Beyond accuracy: What data quality means to data consumers. J. Manag. Inf. Syst. 1996, 12, 5–33. [Google Scholar] [CrossRef]
- Kim, W.; Choi, B.; Hong, E.K.; Kim, S.K.; Lee, D. A taxonomy of dirty data. Data Min. Knowl. Discov. 2003, 7, 81–99. [Google Scholar] [CrossRef]
- Smith, A.; Jones, B. The Use of CSV in Data Science Workflows. J. Data Sci. 2021, 19, 1–12. [Google Scholar]
- McKinney, W. Data Structures for Statistical Computing in Python. In Proceedings of the 9th Python in Science Conference, Austin, TX, USA, 28 June–3 July 2010; pp. 51–56. [Google Scholar]
- Van Rossum, G.; Drake, F.L. Python 3 Reference Manual; CreateSpace: Scotts Valley, CA, USA, 2009. [Google Scholar]
- Peng, R.D. Reproducible Research in Computational Science. Science 2011, 334, 1226–1227. [Google Scholar] [CrossRef] [PubMed]
- Smith, J.; Brown, L. Efficiency of Data Formats: A Comparative Study of CSV and XLSX in Data Processing. J. Data Sci. 2021, 15, 123–135. [Google Scholar]
- Ahmed, K.; Smith, A. The Role of CSV in Data Analysis Workflows. Data Sci. J. 2020, 14, 78–90. [Google Scholar]
- Taylor, S.J.; Letham, B. Forecasting at scale. Am. Stat. 2018, 72, 37–45. [Google Scholar] [CrossRef]
- Hochreiter, S.; Schmidhuber, J. Long Short-Term Memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef] [PubMed]
- Brownlee, J. Long Short-Term Memory Networks with Python: Develop Sequence Prediction Models with Deep Learning; Machine Learning Mastery: Vermont, VIC, Australia, 2017. [Google Scholar]
- Sezer, O.B.; Gudelek, M.U.; Ozbayoglu, A.M. Financial Time Series Forecasting with Deep Learning: A Systematic Literature Review: 2005–2019. Appl. Soft Comput. 2020, 90, 106181. [Google Scholar] [CrossRef]
- Abuqaddom, I.; Mahafzah, B.A.; Faris, H. Oriented stochastic loss descent algorithm to train very deep multi-layer neural networks without vanishing gradients. Knowl. Based Syst. 2021, 230, 107391. [Google Scholar] [CrossRef]
Title | Authors | Year |
---|---|---|
Short-term impact of particulate matter (PM2.5) on respiratory mortality in Madrid | Guaita, R., Pichiule, M., Mate, T., Linares, C., Diaz, J. | 2011 |
Spatial and temporal variations in PM10 and PM2.5 across the Madrid metropolitan area in 1999–2008 | Salvador, P., Artíñano, B., Viana, M.M., … González-Fernández, I., Alonsoa, R. | 2011 |
Short-term effect of fine particulate matter (PM2.5) on daily mortality due to diseases of the circulatory system in Madrid (Spain) | Maté, T., Guaita, R., Pichiule, M., Linares, C., Díaz, J. | 2010 |
Short-term effect of PM2.5 on daily hospital admissions in Madrid (2003–2005) | Linares, C., Díaz, J. | 2010 |
Short-term impact of particulate matter (PM2.5) on daily mortality among the over-75 age group in Madrid (Spain) | Jiménez, E., Linares, C., Rodríguez, L.F., Bleda, M.J., Díaz, J. | 2009 |
Impact of particulate matter with diameter of less than 2.5 microns [PM2.5] on daily hospital admissions in 0–10-year olds in Madrid, Spain [2003–2005] | Linares, C., Díaz, J. | 2009 |
Influence of traffic on the PM10 and PM2.5 urban aerosol fractions in Madrid (Spain) | Artíñano, B., Salvador, P., Alonso, D.G., Querol, X., Alastuey, A. | 2004 |
Anthropogenic and natural influence on the PM10 and PM2.5 aerosol in Madrid (Spain). Analysis of high-concentration episodes | Artíñano, B., Salvador, P., Alonso, D.G., Querol, X., Alastuey, A. | 2003 |
European Policies on PM2.5 | Measures by the Madrid City Council to Reduce PM2.5 |
---|---|
PM2.5 Limits: Directive 2008/50/EC sets an annual limit of 25 µg/m3, with a plan to progressively reduce it to 10 µg/m3. It also sets a daily limit of 50 µg/m3, not to be exceeded more than 35 times a year. | Plan A: The Madrid City Council implemented the Air Quality and Climate Change Plan of Madrid, known as Plan A, which includes various measures to improve air quality. |
Air Quality Plans: Member states must develop and implement air quality plans with specific measures to reduce PM2.5 emissions, such as traffic restrictions, industrial emission controls, and promotion of clean energy. | Madrid Central: Creation of a low-emission zone that restricts access to polluting vehicles in the city center, allowing only electric, hybrid, and clean environmental label vehicles. |
Update and Monitoring: Directive (EU) 2024/825 introduces stricter regulations and new obligations for member states regarding the assessment and management of air quality, incorporating new pollutants and adjusting evaluation and monitoring methods. | Fleet Renewal: Incentives for the acquisition of electric vehicles, installation of charging points, and modernization of the public transport fleet with electric and compressed natural gas (CNG) buses. |
National Transposition: In Spain, Directive 2008/50/EC has been transposed through Royal Decree 102/2011, which adapts European regulations to the national context. | Expansion of Green Areas: Increasing green spaces and reforesting the city to mitigate air pollution and improve the environment for residents. |
Reduction of Heating Emissions: Policies to promote more efficient and less polluting heating systems in buildings, with grant programs and technical advice. | |
Monitoring and Evaluation: Installation of new measurement stations and updating existing ones to improve the surveillance of PM2.5 levels and other pollutants. |
Stages | CRISP-DM | KDD | SEMMA |
---|---|---|---|
Business Understanding | Understand business objectives and requirements from a data perspective | Understand high-level business objectives | Identify business objectives and needs |
Data Understanding | Collect initial data, describe data, explore data, and verify data quality | Collect data and perform preliminary analysis | Collect data and perform exploratory analysis |
Data Preparation | Select data, clean data, construct data, integrate data, and format data | Data cleaning and transformation | Data preprocessing and transformation |
Modeling | Select modeling techniques, design test, build models, and evaluate models | Develop, test, and refine models | Create, validate, and evaluate models |
Evaluation | Evaluate results, review process, and determine next steps | Evaluate models and outcomes | Assess model performance and review process |
Deployment | Plan deployment, monitor and maintain models, and generate final report | Implement solutions and continuous monitoring | Deploy models and monitor results |
Hybrid Models | Strengths | Limitations |
---|---|---|
SARIMA-LSTM | Captures explicit seasonality and nonlinear patterns. | Complex to implement and tune and requires many parameters. |
PROPHET–LSTM | Easy to use, robust to missing data, and outliers. | Less accurate in complex nonlinear dynamics. |
ETS-LSTM | Captures errors, trends, and seasonality, flexible. | Similar to SARIMA in complexity and less common in combination with LSTM. |
Techniques | Strengths | Limitations |
---|---|---|
Temporal Convolutional Networks (TCNs) | Excellent for capturing long-term dependencies, scalable. | Requires large amounts of data and computationally intensive. |
XGBoost | High performance, handles missing data well, and robust to overfitting. | Can be slow to train on very large datasets, and complex to tune. |
Dynamic Time Warping (DARTS) | Effective for time-series alignment and similarity measurement. | Computationally expensive and less effective with noisy data. |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Cáceres-Tello, J.; Galán-Hernández, J.J. Analysis and Prediction of PM2.5 Pollution in Madrid: The Use of Prophet–Long Short-Term Memory Hybrid Models. AppliedMath 2024, 4, 1428-1452. https://doi.org/10.3390/appliedmath4040076
Cáceres-Tello J, Galán-Hernández JJ. Analysis and Prediction of PM2.5 Pollution in Madrid: The Use of Prophet–Long Short-Term Memory Hybrid Models. AppliedMath. 2024; 4(4):1428-1452. https://doi.org/10.3390/appliedmath4040076
Chicago/Turabian StyleCáceres-Tello, Jesús, and José Javier Galán-Hernández. 2024. "Analysis and Prediction of PM2.5 Pollution in Madrid: The Use of Prophet–Long Short-Term Memory Hybrid Models" AppliedMath 4, no. 4: 1428-1452. https://doi.org/10.3390/appliedmath4040076
APA StyleCáceres-Tello, J., & Galán-Hernández, J. J. (2024). Analysis and Prediction of PM2.5 Pollution in Madrid: The Use of Prophet–Long Short-Term Memory Hybrid Models. AppliedMath, 4(4), 1428-1452. https://doi.org/10.3390/appliedmath4040076