Next Article in Journal
A Mathematical Model for Collective Behaviors and Emergent Patterns Driven by Multiple Distinct Stimuli Produced by Multiple Species
Previous Article in Journal
Rational Solutions to the Fourth Equation of the Nonlinear Schrödinger Hierarchy
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Analysis and Prediction of PM2.5 Pollution in Madrid: The Use of Prophet–Long Short-Term Memory Hybrid Models

by
Jesús Cáceres-Tello
* and
José Javier Galán-Hernández
*
Faculty of Statistical Studies, Complutense University of Madrid, 28040 Madrid, Spain
*
Authors to whom correspondence should be addressed.
AppliedMath 2024, 4(4), 1428-1452; https://doi.org/10.3390/appliedmath4040076
Submission received: 22 September 2024 / Revised: 14 November 2024 / Accepted: 16 November 2024 / Published: 25 November 2024

Abstract

:
Particulate matter smaller than 2.5 μm (PM2.5) in Madrid is a critical concern due to its impacts on public health. This study employs advanced methodologies, including the CRISP-DM model and hybrid Prophet–Long Short-Term Memory (LSTM), to analyze historical data from monitoring stations and predict future PM2.5 levels. The results reveal a decreasing trend in PM2.5 levels from 2019 to mid-2024, suggesting the effectiveness of policies implemented by the Madrid City Council. However, the observed interannual fluctuations and peaks indicate the need for continuous policy adjustments to address specific events and seasonal variations. The comparison of local policies and those of the European Union underscores the importance of greater coherence and alignment to optimize the outcomes. Predictions made with the Prophet–LSTM model provide a solid foundation for planning and decision making, enabling urban managers to design more effective strategies. This study not only provides a detailed understanding of pollution patterns, but also emphasizes the need for adaptive environmental policies and citizen participation to improve air quality. The findings of this work can be of great assistance to environmental policymakers, providing a basis for future research and actions to improve air quality in Madrid. The hybrid Prophet–LSTM model effectively captured both seasonal trends and pollution spikes in PM2.5 levels. The predictions indicated a general downward trend in PM2.5 concentrations across most districts in Madrid, with significant reductions observed in areas such as Chamartín and Arganzuela. This hybrid approach improves the accuracy of long-term PM2.5 predictions by effectively capturing both short-term and long-term dependencies, making it a robust solution for air quality management in complex urban environments, like Madrid. The results suggest that the environmental policies implemented by the Madrid City Council are having a positive impact on air quality.

1. Introduction

The impact of pollutants on human health is a significant area of concern, particularly in urban environments. According to the latest update of the World Health Organization (WHO) air quality guidelines, air pollution is associated with a variety of adverse health effects, including respiratory and cardiovascular diseases [1]. This analysis focuses on identifying the most hazardous substances present in Madrid’s air pollution data, justifying their danger with recent academic references. The substances analyzed include particulate matter 2.5 μm (PM2.5), particulate matter 10 μm (PM10), nitrogen dioxide (NO2).
Fine particles, such as PM2.5 and PM10, are recognized as extremely dangerous due to their ability to penetrate deeply into the lungs and enter the circulatory system. According to the WHO, exposure to PM2.5 is associated with approximately 4.2 million premature deaths annually due to cardiovascular and respiratory diseases [2,3]. Brook et al. (2020) also indicate that these particles are strongly associated with cardiovascular and respiratory diseases [4].
Recent studies on high-dimensional time-series forecasting using deep learning, such as the one by Hu et al. (2024), have shown the importance of advanced neural network models for pollution prediction [5].
Nitrogen dioxide (NO2) and nitrogen oxide (NOx) are pollutant gases that also present serious health risks. The WHO estimates that exposure to NO2 significantly contributes to the incidence of childhood asthma and increases the risk of mortality from respiratory and cardiovascular diseases [6]. Faustini et al. (2020) found a significant association between NO2 exposure and an increase in mortality from respiratory diseases [7]. Additionally, Mills et al. (2021) reported that NOx is associated with an increase in hospitalizations for heart diseases [8].
Tropospheric ozone (O3) is another secondary pollutant formed by photochemical reactions between other pollutants in the presence of sunlight. The WHO indicates that exposure to high levels of ozone is related to an increase in mortality from respiratory diseases, especially in individuals with pre-existing conditions, such as asthma [9]. Jerrett et al. (2021) found that long-term exposure to ozone is associated with increased mortality from respiratory diseases [10]. Furthermore, Turner et al. (2022) suggest that ozone can increase the risk of premature mortality, even at relatively low exposure levels [11].
Benzene is an aromatic hydrocarbon known to be carcinogenic. The WHO classifies benzene as a human carcinogen and estimates that prolonged exposure can lead to leukemia and other hematological diseases [12]. Smith (2020) reported that even low levels of benzene exposure can increase the risk of developing leukemia [13].
Carbon monoxide (CO) is a colorless and odorless gas that can be fatal at high concentrations as it impedes the transport of oxygen in the body. According to the WHO, exposure to CO can cause severe health effects, including death by intoxication [14]. A study by Weaver (2020) also highlights the mortality associated with CO exposure, particularly in enclosed environments [15].
To assess the danger of these pollutants, we compiled a table classifying each substance according to its hazard index, based on adverse health effects documented in recent studies and data provided by the WHO. The hazard index is presented on a scale from one to ten, with ten indicating the highest hazard.
Figure 1 illustrates a horizontal bar graph showing the hazard index of each pollutant to visualize the relative danger of these substances.
In the following sections, we review the most relevant scientific publications related to the city of Madrid and PM2.5 concentrations, highlighting the most influential studies and providing a detailed analysis of their content. Additionally, we conduct a temporal analysis of the number of publications per year and identify the countries that have conducted the most research in this area.
We examine the current European Union regulations for PM2.5 particles, highlighting the standards and limits established to protect public health. This analysis includes an evaluation of the measures implemented by the Madrid City Council to mitigate PM2.5 levels. We compare European policies and local initiatives, assessing their effectiveness and coherence.
In the Methodology Section, we present a detailed study to support the selection of CRISP-DM as the most comprehensive methodological model for this research. We thoroughly analyze each phase of the CRISP-DM model. Additionally, we justify the choice of the Prophet–LSTM model as the most suitable data analysis methodology for this study, highlighting its robustness and ease of use.
We design the CRISP-DM modeling phase using Prophet–LSTM, covering the following various types of analysis: descriptive, trend, spatial, and predictive. We present the results of these analyzes, demonstrating how this methodological approach can provide a comprehensive view of air quality in Madrid and forecast future pollution scenarios. Finally, the article concludes with a synthesis of the most significant findings and outlines possible future research directions to advance our the understanding and the mitigation of PM2.5 air pollution.
  • The main contributions of this paper are:
  • Development of a hybrid Prophet–LSTM model for PM2.5 prediction.
  • Application of the model in a complex urban environment, like Madrid.
  • Integration of meteorological and air quality data for enhanced prediction accuracy.
  • Identification of the gap in the recent literature on PM2.5 prediction.
  • Comparative evaluation with other predictive models demonstrating its superior performance.

2. State of the Art

Particle pollution, especially PM2.5, has been the subject of intense study due to its harmful effects on human health and the environment. In this context, Madrid, as one of the main European cities, has been the focus of numerous scientific studies. This state of the art presents a comprehensive review of the most relevant publications on PM2.5 concentrations in the city of Madrid, highlighting the most impactful studies and providing a detailed analysis of their findings. Additionally, we conduct a temporal analysis of the publications per year and identify the most active countries in this research.
Next, we examine the current European Union regulations for PM2.5 particles, highlighting the standards and limits established for public health protection. This analysis includes an evaluation of the measures implemented by the Madrid City Council to reduce PM2.5 levels, comparing the effectiveness and coherence of European policies with local initiatives.

2.1. Relevance of PM2.5 in Scientific Studies

In the city of Madrid, numerous scientific studies have explored the impact of the pollutant PM2.5 on health and the environment. To quantify the importance of these particles in academic research, we conducted a search in the Scopus database for articles whose titles included the words “Madrid” and “PM2.5.” The search, carried out in July 2024, using the query: TITLE (“madrid” AND (“pm 2.5” OR “pm2.5”)), identified eight relevant articles.
Table 1 shows a series of scientific studies on the impact of PM2.5 in Madrid, grouped into several key themes.
Firstly, several studies investigate the impact of PM2.5 on public health. For example, some research explores how these particles affect respiratory mortality and daily hospital admissions. Other studies focus on daily mortality related to circulatory system diseases and vulnerable groups, such as people over 75 years old and children under 10 years old.
Secondly, there are investigations that analyze the spatial and temporal variations in PM2.5 in the metropolitan region of Madrid. These works address variations in PM10 and PM2.5 over a decade and examine the influence of anthropogenic and natural factors, as well as the role of traffic on urban aerosol fractions.
These studies reflect a continuous interest in understanding both the health effects and distribution patterns of PM2.5 in Madrid, highlighting the importance of this pollutant for the scientific community and public health.
Researchers have not conducted specific studies on the PM2.5 pollutant in Madrid that evaluate compliance with European regulations at the regional or district level. This gap highlights the need to address this crucial and currently uncovered point through this study.
The search conducted shows that the available articles on the selected terms are limited to publications up to 2011. This indicates that we are dealing with an innovative topic, as no recent records have been found in the past 10 years, suggesting there is a wide space for new research and contributions in this field. The absence of current studies highlights the relevance of our research, which aims to fill this gap and open new lines of inquiry in an area with significant potential for development.
The Scopus search conducted in July 2024 (Figure 2a,b) confirms a lack of recent studies on PM2.5 in the Community of Madrid. Although no articles specifically include both “Madrid” and “PM2.5” in the title, there is a clear increase in scientific interest in PM2.5, recognizing its link to mortality and severe lung conditions. This rise reflects the growing awareness of PM2.5’s dangers and the need to mitigate its health impacts.
Interest in PM2.5 research is not uniform, globally. Ironically, the countries leading in PM2.5 research—China and the United States—are also among the highest polluters. The abundance of publications from these countries, shown in Figure 2c, highlights the urgent need to address air pollution in regions where high PM2.5 levels pose significant health risks [16].
For all these reasons, this work aims to present a comprehensive analysis of PM2.5 in the city of Madrid, covering daily data from 2019 to 2023, as well as the available data from 2024 (up to June of this year). This analysis examines both the citywide level and specific districts where researchers have collected data on this substance.

2.2. European Regulations on PM2.5

Current European directives on PM2.5 levels in cities are primarily established in Directive 2008/50/EC, which authorities have amended several times to align with the latest recommendations from the World Health Organization (WHO). This directive sets clear limits for the concentration of PM2.5 in ambient air to protect public health and the environment.
Directive 2008/50/EC specifies an annual limit value for PM2.5 particles at 25 micrograms per cubic meter (µg/m3). However, authorities plan a progressive reduction in this limit to 10 µg/m3, aligning with the WHO guidelines that recommend stricter limits [17,18,19]. In addition to the annual limit, the directive specifies a daily limit value of 50 µg/m3, which authorities restrict to no more than 35 exceedances per year. This value aims to control acute exposures to high pollution levels [17].
Let Cannual be the annual limit for PM2.5 particles and Cdaily be the daily limit for PM2.5 particles. According to Directive 2008/50/EC, the following limits are established:
  • Annual limit:
    • Cannual = 25 μg/m3
    • plan is set to reduce this limit progressively to align with the WHO guidelines: Cannual,reduced = 10 μg/m3
  • Daily limit:
    • Cdaily = 50 μg/m3
    • The directive restricts the daily limit to 50 µg/m3, allowing no more than 35 exceedances per year to control acute exposure:
    • Nexceedances ≤ 35
    • where Nexceedances is the number of times the daily limit is exceeded in a year.
To ensure compliance with these limit values, member states must develop and implement air quality plans that include specific measures to reduce PM2.5 emissions. These measures can include restrictions on vehicular traffic, industrial emissions controls, and the promotion of cleaner energy sources [17,18]. Directive (EU) 2024/825, which amends previous directives, introduces stricter regulations and new obligations for member states regarding the assessment and management of air quality. This directive also incorporates new pollutants and adjusts evaluation and monitoring methods [18].
In Spain, for example, national legislation transposes these European directives through Royal Decree 102/2011 on the improvement of air quality, adapting European regulations to the national context and establishing specific procedures for monitoring and controlling atmospheric pollution [17].

2.3. Measures Implemented by the Madrid City Council to Reduce PM2.5 Levels

In this context, the Madrid City Council has implemented various measures to combat pollution, with special attention to reducing PM2.5. These actions are part of the Air Quality and Climate Change Plan of Madrid, known as Plan A [19], which sets out a series of initiatives to improve air quality in the city.
One of the main measures has been the creation of the low-emission zone, known as Madrid Central. This area restricts access to polluting vehicles in the city center, allowing only electric, hybrid, and those with cleaner environmental labels to circulate. This initiative has shown a significant reduction in concentrations of PM2.5 and other pollutants in the affected area [20].
Additionally, the City Council has promoted the renewal of the vehicle fleet toward more sustainable models. Incentives have been provided for the acquisition of electric vehicles and the installation of charging points, as well as the modernization of the public transport fleet with electric and compressed natural gas (CNG) buses. These actions directly contribute to reducing PM2.5 emissions and improving air quality throughout the city [19].
Another important measure is the expansion of urban green areas and reforestation of the city. Increasing green spaces helps to mitigate air pollution and provides a healthier environment for residents. The City Council has also launched education and awareness programs about the importance of air quality and sustainable practices among citizens [21].
Furthermore, policies have been implemented to reduce heating emissions in buildings by promoting the use of more efficient and less polluting systems. Grant programs and technical advice support the transition to more sustainable heating systems in homes and public buildings [22].
The Madrid City Council has also strengthened the monitoring and evaluation of air quality by installing new measurement stations and updating existing ones. This allows better surveillance of PM2.5 levels and other pollutants, facilitating informed decision-making to improve air quality [23].
The following table (Table 2) provides a comparative view of both strategies, highlighting how local efforts align with and complement the European regulatory frameworks. This comparison highlights the synergy between regulations at different levels and their contribution to improving air quality and public health.

2.4. Focus on Predictive Models for PM2.5

Given the growing concern about air quality and its impact on public health, several models have been developed to predict PM2.5 pollution levels. While policy measures are crucial for managing environmental issues, the focus of this study is on advancing predictive models that can accurately forecast PM2.5 levels in urban environments. The hybrid Prophet–LSTM model proposed in this work aims to address the limitations of traditional approaches by combining the strengths of statistical forecasting and machine learning. This section reviews existing methodologies, highlighting their contributions to and limitations in air quality prediction.

3. Methodology

At this stage of the study, the question arises as to which methodology is most suitable for analyzing PM2.5 levels in Madrid’s air quality. The answer to this question is crucial to ensure the accuracy and effectiveness of our research. It is essential to have a robust and flexible methodology that not only allows efficient project management, but also provides specific techniques for data analysis, such as predictive models, to estimate PM2.5 levels in different districts or throughout the city of Madrid as accurately as possible in the future.
After a thorough analysis of the methodologies most commonly used in data science by the scientific community (Table 3), we decided to employ the CRISP-DM methodology. This choice was based on several key reasons that ensure an effective and structured implementation of the air quality data analysis in Madrid. The CRISP-DM methodology is widely recognized for its ability to systematically manage data science projects, facilitating both the understanding of the problem and the generation of significant results from the available data.
Main reasons for choosing the CRISP-DM methodology are as follows:
  • Flexibility and Structure: CRISP-DM provides a flexible framework that can be adapted to different types of data mining projects while maintaining a clear structure throughout its phases. This allows us to effectively address the diverse needs and challenges that may arise in an air quality analysis project.
  • Comprehensive Understanding of Business and Data: The initial stages of CRISP-DM, Business Understanding, and Data Understanding ensure that the project’s objectives are aligned with business needs and that the data are thoroughly understood before proceeding to modeling. This is crucial for an accurate and relevant analysis of air quality data.
  • Data Preparation and Modeling: The methodology emphasizes rigorous data preparation, including data cleaning, construction, integration, and formatting. This meticulous approach ensures that the data are in the best possible shape for modeling. The modeling phase allows the application of various modeling techniques, ensuring the selection of the most appropriate one for the project’s specific data.
  • Effective Evaluation and Deployment: The evaluation phase of CRISP-DM ensures that the developed model meets business objectives before implementation. The deployment phase facilitates the integration of the model into the real environment, ensuring that the analysis results are used for informed decision making.
  • Wide Acceptance and Support: CRISP-DM is a widely accepted and used methodology in the industry, providing access to a wide range of resources, tools, and communities of practice. This facilitates the implementation and continuous support of the project.
The CRISP-DM methodology not only offers a robust and flexible approach to project management, but also ensures that all stages of data analysis are addressed in a structured and effective manner, guaranteeing reliable and useful results. In its modeling phase, we use descriptive analysis, trend analysis, spatial analysis, and predictive analysis, which we discuss in Section 3.4: Modeling.
Next, we detail each of the phases of the CRISP-DM model we used (Figure 3).

3.1. Business Understanding

The first step in adapting the CRISP-DM model to our research is to clearly establish the analysis objectives. This step is crucial as it defines the direction and focus of the project, ensuring that all subsequent activities are aligned with the expected outcomes. In the context of our research on the influence of PM2.5 on air quality in Madrid, the analysis objectives are as follows:
  • Conducting an Initial Analysis and Understanding of the Downloaded Data: Verify the integrity of the downloaded data by checking its completeness and confirming that the recorded values are within expected ranges. This step includes removing outliers and records with significant missing data. Subsequently, integrate the air quality data from different years into a single structured dataset. This preparation process facilitates the joint analysis and comparison of measurements over time.
  • Performing a Descriptive and Temporal and Spatial Trend Analysis of PM2.5: Conduct a descriptive analysis to understand the initial distribution and characteristics of the collected PM2.5 data. Additionally, analyze how PM2.5 levels vary over time and across different districts of Madrid, identifying seasonal patterns and differences between urban and suburban areas. This allows for the implementation of more effective control measures tailored to the specific needs of each district.
  • Developing Predictive Models for PM2.5: Build predictive models to estimate PM2.5 levels for future dates and under different emissions scenarios. Predictive models are valuable tools for planning and decision making, enabling authorities to anticipate high pollution episodes and take proactive actions.
  • Evaluating the Effectiveness of Current Implemented Measures: Assess the effectiveness of current measures implemented to reduce PM2.5 levels and informed by the analysis results. Ensuring that implemented measures are effective and based on solid scientific data will optimize mitigation efforts.
  • Communicating Results to Stakeholders: Present the analysis results clearly and comprehensibly to decision makers in the Madrid City Council, as well as to other stakeholders. Effective communication of the results is essential to ensure that the analysis conclusions and recommendations are understood and applied.

3.2. Data Understanding

The second step in adapting the CRISP-DM model to our research is data acquisition. This phase involves collecting, examining, and understanding the necessary data for the analysis.
For our study on the impact of PM2.5 on air quality in Madrid, the data were obtained from the Open Data Portal of the Madrid City Council (https://datos.madrid.es/portal/site/egob) (accessed on 14 July 2024) [24] and had to meet certain quality criteria to ensure the accuracy and relevance of the analysis.
The primary and only data source used in this study was the open data portal of the Madrid City Council. This platform provides access to a wide variety of datasets related to air quality, including measurements of different atmospheric pollutants taken at various monitoring stations located throughout the city.
Twenty-four automatic remote stations (Figure 4) collect essential information for atmospheric monitoring using analyzers necessary for the accurate measurement of gas and particle levels [25]. The remote stations managed by the Madrid City Council are of different types:
  • Urban background: Representative of the general urban population’s exposure.
  • Traffic: Located in such a way that the pollution level is mainly influenced by emissions from a nearby street or road, while avoiding measuring very small microenvironments in the immediate vicinity.
  • Suburban: Located on the outskirts of the city, where the highest ozone levels are found.
Of the 24 existing automatic remote stations, the stations that specifically measured PM2.5, the focus of this study, covered only 7 of the 21 districts in the city of Madrid. These districts are Chamartín, Chamberí, Moncloa-Aravaca, Salamanca, Arganzuela, Carabanchel, and Hortaleza.
The use of official and publicly accessible data ensures transparency and the possibility for other entities interested in air quality in Madrid to replicate the study.
Air quality data were downloaded and collected from the open data portal of the Madrid City Council, ensuring continuous coverage from January 2019 to June 2024. This collection included daily measurements of PM2.5 and other relevant pollutants [26,27].

Simulation Environment

To ensure the reproducibility and performance of the proposed models, all simulations and analyses were conducted on a dedicated machine with the following specifications: Intel Core i7-9700K CPU, 32 GB RAM, and an NVIDIA GTX 1080 GPU. The deep learning models, including the Prophet–LSTM hybrid model, were implemented using Python (version 3.12) with the support of libraries such as TensorFlow for LSTM and the Prophet library for trend analysis. Data preprocessing and analysis were handled using Pandas and NumPy libraries. Visualization of results was performed using Matplotlib and Seaborn.
In terms of software, the machine operated on Windows 10 (64-bit). The choice of hardware ensured that the large datasets, including time-series data for PM2.5 measurements from multiple stations across several years, were processed efficiently without bottlenecks in performance. The use of a GPU enabled faster training times for the LSTM model, which benefits from the parallel processing of data.
The simulation environment was carefully designed to ensure that the models could handle the high dimensionality of the time-series data and provide accurate and timely predictions. Given the stochastic nature of deep learning models, particularly LSTMs, we conducted multiple runs of the experiments to ensure the robustness and consistency of the results.
The data processing and model training for the PM2.5 prediction involved several key stages, amounting to approximately 9 h in total. The initial data download and preprocessing, including cleaning and integration from multiple stations, took around 2 h. The training of the Prophet–LSTM hybrid model was the most time-consuming, requiring 4 to 5 h due to the LSTM’s complexity, despite using a GPU to expedite the process. Evaluating the model’s performance and fine-tuning hyperparameters took an additional 2 h, followed by the generation of visualizations and graphs, which required about 1 h. Overall, the workflow was optimized to efficiently handle large datasets and produce accurate predictions.

3.3. Data Preparation

Data preparation is a crucial phase that guarantees the reliability and utility of the data for subsequent analysis. First, we verified the integrity of the downloaded data, checking for missing records and ensuring that the measurements were within expected ranges. This step included the removal of outliers and records with significant missing data, ensuring that the dataset was both accurate and complete [28,29].
Next, air quality data from different years were integrated into a single dataset, allowing for a joint analysis of measurements over time. This integration process facilitated comparison and the identification of trends in PM2.5 levels and other pollutants [26,27].
With the simulation environment set up, the next step focused on acquiring and processing the necessary data for analysis. This process involved several stages, from the initial download of raw data to their final transformation into a structured format suitable for analysis, as illustrated in Figure 4 below. Ensuring data quality and consistency throughout the analysis pipeline were paramount, as described in the following steps.
Finally, the processed data were stored in a structured and accessible format to facilitate their use in subsequent phases of the analysis. The data were organized by date and monitoring station, allowing for efficient management and quick access to relevant information [30].
The following graphic (Figure 5) illustrates the data acquisition and transformation process, showing the various stages from the initial download to the final integration of the data.
We downloaded data from the Open Data Portal of the Madrid City Council. The original data appeared in various formats: CSV, XLSX, and TXT. We chose the CSV format due to its simplicity and efficiency in both reading and writing files [30]. The columns included information about the province, municipality, station, magnitude, sampling point, year, month, and multiple days of the month (D01 to D31), among other fields. This initial format contained the raw data directly collected by the air quality monitoring stations.
After downloading the data, we performed an initial transformation to ensure their validity and reorganized the columns. We removed validation columns and others such potential in various environments.as province, municipality, and sampling point, as the station column contained similar information to the sampling point.
In the second transformation, we developed a Python program to read each of these transformed CSV files and reorganize their columns to retain only the relevant data in the precise arrangement for accurate reading and reliable graph generation.
The results of this second transformation were new CSV files with columns for date (in yy-mm-dd format), short code of the atmospheric monitoring station, neighborhood name, district name, and each of the substances analyzed. For stations that did not measure certain substances, their values appeared as empty strings. The algorithm excluded records with missing data or outliers to ensure the reliability and accuracy of the final datasets.
We used the Python programming language for data analysis and processing, and the CSV file format for storage and transformation. These tools were chosen for their ability to handle large volumes of information and perform complex analyzes efficiently. Several recent studies support the use of Python, highlighting its versatility and power in data science. For example, Van Rossum et al. emphasize Python’s flexibility and its wide range of specialized libraries, such as Pandas and NumPy, which facilitate data manipulation and analysis [31,32]. Additionally, Peng notes that using CSV allows for structured and accessible data management, essential for reproducibility and transparency in scientific research [33]. According to Smith and Jones (2021), the CSV format is particularly suitable for data analysis due to its simplicity and efficiency in reading and writing files. Unlike the XLSX format, which can be more complex and less efficient in processing speed, CSV allows for faster handling of large data volumes. Furthermore, the CSV format is widely compatible with various data analysis tools and programming languages, such as Python, facilitating its integration into data science workflows [33]. These combined tools provide a solid foundation for data preparation and analysis in data science projects like this one.

3.4. Modeling

After data acquisition and preparation, the next phase in this adaptation of the CRISP-DM model was data analysis. To do this, we first evaluated the appropriate methodology (Section 3.4.1), then we used it to carry out the different analyses presented in Section 3.4.2.

3.4.1. Data Analysis Methodology

In this section, we evaluate various advanced data analysis techniques that can be used for time-series modeling and prediction. We consider hybrid models, such as SARIMA-LSTM, PROPHET-LSTM, and ETS-LSTM, as well as other advanced data analysis techniques, like Temporal Convolutional Networks (TCNs), XGBoost, and Dynamic Time Warping (DARTS). Table 4 provides a detailed comparison of these hybrid models, while Table 5 presents a comparison of specific techniques, with the aim of identifying the most suitable ones for the study of air quality in Madrid.
The PROPHET–LSTM model was selected for this study considering several key advantages that made it particularly suitable for the analysis of air quality data in the city of Madrid:
Ease of Use and Flexibility: PROPHET is known for its ease of use, allowing intuitive adjustments to handle seasonality and holiday effects. This feature is particularly useful in the context of air quality, where seasonal patterns can have a significant impact.
Robustness to Missing Data and Outliers: One of PROPHET’s strengths is its robustness to missing data or outliers. Given that air quality datasets may present these irregularities, PROPHET’s ability to handle these challenges is crucial to maintaining the integrity of the analysis.
Efficiency in Handling Complex Seasonality: PROPHET is effective in capturing annual, weekly, and daily seasonality. Integration with LSTM enhances this capability, allowing for the capture of complex nonlinear patterns that are common in air quality data. This results in a more accurate and reliable model.
Resources and Expertise: The combination of PROPHET and LSTM requires less tuning compared to other techniques, such as SARIMA-LSTM or ETS-LSTM. This feature simplifies implementation and is especially advantageous in resource-limited environments, where time and tuning capacity are important constraints.
The hybrid Prophet–LSTM model is implemented in two main phases. First, the Prophet model decomposes the time series into three key components: trend, seasonality, and special events. The residuals from this decomposition are then used as inputs for the LSTM model, which has the capability to capture long-term dependencies that the Prophet model alone cannot. Finally, the trend and seasonality predictions from Prophet are combined with the residual predictions from LSTM to obtain the final forecast. These phases ensure that the model effectively captures both the short-term patterns and long-term dependencies within the time-series data. The steps of the Prophet–LSTM model can be summarized as follows:
  • Obtain the time-series data for PM2.5.
  • Split the time series into training and testing sets.
  • Apply the Prophet model to the time series:
    • Decompose the series into trend, seasonality, and special events components.
    • Obtain the residuals from Prophet’s prediction.
  • Use the residuals as inputs for the LSTM model:
    • Configure the LSTM network layers (input, LSTM, and output layers).
    • Train the LSTM model using Prophet’s residuals.
  • Combine Prophet’s predictions with LSTM’s residual predictions:
    • Predict trend and seasonality with Prophet.
    • Predict the residuals using LSTM.
    • Sum the results to obtain the final prediction.
  • Evaluate the model performance using error metrics (e.g., MSE, RMSE).
The PROPHET–LSTM methodology not only offers a robust and flexible approach for air quality data analysis, but also facilitates its implementation in real-world scenarios, ensuring reliable and efficient results. This combination of techniques effectively addresses the challenges associated with the variability and complexity of environmental data. In the next section, we apply this analysis to demonstrate its effectiveness in a real context.

Implementation of the Prophet–LSTM Hybrid Model for the Prediction of PM2.5 Levels

The Prophet–LSTM hybrid model was used to predict PM2.5 levels in Madrid, combining Prophet’s capabilities to handle seasonal components and special events with LSTM’s ability to capture long-term dependencies in time series data [34,35,36].
First, the Prophet model decomposes the PM2.5 level’s time series (y(t)) into trend (T(t)), seasonality (S(t)), and special event (H(t)) components, along with an error term ( ϵ ( t ) ):
y ( t ) = T ( t ) + S ( t ) + H ( t ) + ϵ ( t )
where:
  • Trend (T(t)) can be linear:
T ( t ) = k + m t
or logistic:
T ( t ) = C 1 + e k ( t m )
2.
Seasonality (S(t)) is modeled using Fourier terms:
S ( t ) = k = 1 K [ a k c o s ( 2 π k t T ) + b k s i n ( 2 π k t T ) ]
3.
Special events (H(t)) are modeled as additive effects:
H ( t ) = j = 1 J c j I j ( t )
After applying Prophet, the residuals ( ϵ ( t ) ) are obtained:
ϵ ( t ) = y ( t ) ( T ( t ) + S ( t ) + H ( t ) )
These residuals are used as the input for the LSTM model, which captures long-term dependencies in the data. LSTM model architecture includes memory cells and input (it), forget (ft), and output (ot):
  • Forget Gate (ft):
f t = σ ( W f [ h t 1 , x t ] + b f )
2.
Input Gate (it):
i t = σ ( W i [ h t 1 , x t ] + b i )
  • Memory Cell Candidates ( C ~ t ):
C ~ t = t a n h ( W C [ h t 1 , x t ] + b C )
4.
Memory Cell Update (Ct):
C t = f t C t 1 + i t C ~ t
5.
Output Gate (ot):
o t = σ ( W o [ h t 1 , x t ] + b o )
6.
Hidden State (ht):
h t = o t t a n h ( C t )
To predict future PM2.5 levels, Prophet is used to forecast future trend and seasonality components ( y ^ Prophet ( t ) ):
y ^ P r o p h e t ( t ) = T ^ ( t ) + S ^ ( t ) + H ^ ( t )
Then, the LSTM model is applied to the future residuals ( ϵ ^ ( t ) )
ϵ ^ ( t ) = LSTM ( ϵ ^ ( t ) )
Finally, the predictions from Prophet and LSTM are combined to obtain the final prediction:
y ^ ( t ) = y ^ Prophet ( t ) + ϵ ^ ( t )
This approach captures both seasonal patterns and special events, as well as long-term dependencies in the data, providing a more accurate and robust prediction of future PM2.5 levels in Madrid [37,38,39].
The LSTM architecture addresses the vanishing gradient problem commonly encountered in deep networks [40]. This problem occurs when gradients shrink as they propagate back through time, which can hinder the training of very deep models.

3.4.2. Data Analysis

Data analysis in this study focused on extracting and evaluating the relationships and connections within the air quality data. In our case, this involved identifying pollution patterns and the interconnections between different monitoring stations in Madrid. This process allowed us to focus on the most relevant aspects of the data and prepare them for detailed analysis.
The variables selected for analysis include:
  • Measurement date.
  • PM2.5 levels.
  • Monitoring station.
  • Madrid district.
Data analysis was carried out through the application of statistical methods, such as descriptive analysis, trend analysis, and correlation analysis, as well as machine learning techniques, such as time-series models, like PROPHET–LSTM, to interpret the data and extract meaningful information. This study used various analysis techniques to better understand the influence of PM2.5 on air quality in Madrid, thereby facilitating a deeper and more accurate understanding of the factors affecting pollution in the city.
The input data used for the experiments included daily PM2.5 measurements collected from seven air quality monitoring stations in Madrid, covering the period from January 2019 to June 2024. The dataset contained over 8000 records, with the following key variables: date, time, PM2.5 levels, temperature, and wind speed. The model was trained using 80% of the data, with the remaining 20% reserved for testing and validation. We also used publicly available meteorological data to improve the model’s predictive capabilities [36,39].

Descriptive Analysis

As discussed in Section 3.4.1, the use of the Prophet model allows the time series to be decomposed into its main components: trend (T(t)), seasonality (S(t)), and special events (H(t)). This provides a detailed understanding of the variations and underlying patterns in the historical PM2.5 data.
The inclusion of residuals ϵ ( t ) and their analysis with the LSTM model allows for the identification and modeling of long-term dependencies that are not captured by the main components, offering a more comprehensive view of the time series dynamics.
Descriptive analysis provides a summary of the main characteristics of the data. This includes calculations of basic statistics, such as the mean, median, standard deviation, and ranges of PM2.5 levels.
In Figure 6, we can see that both the mean and median levels of PM2.5 show a downward trend from 2019 to 2024, suggesting that the policies implemented to improve air quality in Madrid might be effective. This reduction in average PM2.5 levels indicates a general improvement in air quality during this period.
The standard deviation and maximum values reveal that, although the overall trend is downward, there are still periods with elevated PM2.5 levels. This suggests the presence of pollution episodes, possibly due to temporary factors such as unfavorable weather conditions, temporary increases in vehicle or industrial emissions, among others.
On the other hand, the minimum values remain low and stable over the years, which is a positive sign. This indicates that, at least during some periods of the year, pollution levels stay at minimum levels, reflecting moments of good air quality.
The maximum values show considerable variability, with some years displaying significantly high peaks. These peaks suggest episodes of high pollution during certain periods, highlighting the need to continue monitoring and managing these episodes to reduce their impact.
Although general improvements in air quality are observed, the data also underline the importance of continuing to implement and adjust policies to manage episodes of high pollution and ensure continuous improvement in PM2.5 levels in Madrid.

Trend Analysis

In Section 3.4.1, it was also mentioned that the trend component (T(t)) in the Prophet model, whether linear or logistic, facilitates the analysis of long-term trends in PM2.5 levels. The following formula:
T t = k + m t
or
T ( t ) = C 1 + e k ( t m )
allows for the identification and quantification of changes over time, providing a solid basis for evaluating the effectiveness of environmental policies and measures.
Trends over time are analyzed to identify changes in PM2.5 levels. This helps detect seasonal patterns and long-term variations that may be influenced by external factors, such as changes in environmental policy or climatic events. The following graph shows the trends in PM2.5 levels in the districts where air quality stations measure PM2.5 levels: Salamanca, Moncloa-Aravaca, Chamberí, Arganzuela, Chamartín, Carabanchel, and Hortaleza.
In most districts of Madrid, a general downward trend in PM2.5 levels is observed from 2019 to 2023 (Figure 7). This pattern suggests that the policies and measures implemented to improve air quality are having a positive effect. However, throughout the studied period, there are significant fluctuations in PM2.5 levels, attributable to previously mentioned factors, such as weather conditions or variations in traffic, among others.
The district of Salamanca shows a decreasing trend in PM2.5 levels, especially from 2022 onward. Despite a peak in 2021, levels significantly decrease in the following years. Moncloa-Aravaca also presents a downward trend, with a notable reduction in 2023. This district experiences greater stability in PM2.5 levels compared to other districts, with fewer year-to-year fluctuations. Chamberí, after an increase in levels during 2021, shows a downward trend, with a constant decrease in PM2.5 levels from 2022 onward. Although the trend is less pronounced than in other districts, it remains positive.
Arganzuela has a similar pattern, with a downward trend and a notable reduction in 2023. This district experiences a peak in 2021, but shows a significant decrease in the following years. Chamartín shows a more pronounced downward trend after a peak in 2021. PM2.5 levels decrease steadily, indicating a significant improvement in air quality in this district.
Carabanchel, however, shows an increase in PM2.5 levels from early 2021 to late 2023. This indicates a worsening in air quality, which is concerning and suggests the need for additional measures to control pollution. Finally, in the district of Hortaleza, PM2.5 measurements did not begin until 2021, so there are no data available for the years prior to 2021. This district shows a general downward trend, although less pronounced compared to other districts. However, a slight increase in PM2.5 levels is observed from early 2022 to late 2023, which may indicate the need for closer monitoring in this district.
The overall graph (Figure 8) of the evolution of PM2.5 levels in the city of Madrid reflects the trends observed in individual districts. In general, there is a downward trend in PM2.5 levels from 2022 to 2024, suggesting the effectiveness of the environmental policies implemented in the city. PM2.5 levels show considerable variability over time, with pronounced peaks during certain periods that may be related to specific events, such as temporary increases in industrial emissions or adverse weather conditions.
These pollution peaks, although less frequent in recent years, underline the need to maintain and strengthen pollution control measures to prevent regressions in air quality. Moreover, comparing PM2.5 levels with the limits established by the WHO and the EU highlights the importance of continuing to advance toward stricter targets to protect public health.
Next, specific graphs for each monitoring station are displayed over the study period, reflecting the temporal evolution of PM2.5 levels. These graphs provide a more detailed understanding of how air pollution varies in different parts of the city and help identify areas that may require more specific interventions. The data provided by each station are crucial for evaluating the effectiveness of implemented policies and for planning future actions to mitigate pollution.
The effectiveness of environmental policies is reflected by the decreasing trend observed in most districts (Figure 9). This suggests that the policies and measures implemented to reduce air pollution in Madrid are effective. However, the increases observed in Carabanchel and, to a lesser extent, in Hortaleza indicate the need to reinforce and adjust these policies to address specific problems in certain districts.
Pollution episodes are evident through the year-to-year fluctuations, indicating that, despite the overall downward trend, there are still periods of high pollution that need to be addressed. These peaks could have resulted from specific events, such as temporary increases in emissions or adverse weather conditions.
The variability between districts shows that, although the general trend is similar, the magnitude and pace of the decrease vary. This may be due to differences in emissions sources, traffic density, and other local factors. The situation in Carabanchel and Hortaleza suggests that some districts may be more affected by specific local factors that require particular attention.

Spatial Analysis

The spatial analysis of PM2.5 levels in Madrid (Figure 10) provides a clear view of the geographical distribution of pollution and allows for the identification of priority areas for the implementation of control measures. The results suggest that current policies are having a positive impact, but a continuous effort is necessary to address emissions sources and improve air quality in all districts.
The detailed observations and generated maps serve as valuable tools for decision-makers, allowing for more effective and focused planning in the areas that need it most. Over the years, some districts, such as Chamberí, Chamartín, and Arganzuela, have consistently shown relatively high levels of PM2.5. This could be related to factors such as heavy vehicular traffic and high population density in these areas.
A general downward trend in PM2.5 levels is observed in most districts of Madrid from 2019 to 2024. This trend suggests the effectiveness of the policies implemented by the Madrid City Council to improve air quality, such as the creation of low-emission zones, the promotion of public transportation, and the restriction of traffic in critical areas. The results underline the need to continue and strengthen these measures to achieve sustained improvement in air quality.
The spatial variability of PM2.5 levels is evident in the maps. While some districts show significant improvement in PM2.5 levels, others show more moderate variations. Hortaleza, for example, began showing PM2.5 data in 2021, and although there have been improvements, the comparison with other districts highlights the importance of implementing specific measures tailored to the characteristics of each area.
The observed reductions in PM2.5 levels may be associated with pollution control initiatives. These results demonstrate that, although there have been significant advances, challenges remain in districts such as Chamartín and Arganzuela, which require continuous attention to maintain and improve their air quality.
As seen, the spatial analysis provides a comprehensive view of the improvements and challenges in air quality in Madrid up to 2024, emphasizing the importance of adaptive environmental policies focused on the districts with the highest PM2.5 levels. These maps and observations highlight the need for strategic planning and the implementation of measures tailored to the specific needs of each district to achieve continuous and sustained improvement in air quality in the city.

Predictive Analysis

The combination of Prophet and LSTM, as mentioned in Section 3.4.1, in the hybrid model allows for more accurate predictions of future PM2.5 levels. Prophet handles seasonal components and special events, while LSTM captures long-term dependencies in the residuals.
To evaluate the performance of our proposed hybrid Prophet–LSTM model, we compared it with several well-known forecasting methods, including SARIMA-LSTM and ETS-LSTM, across key metrics, such as prediction accuracy, computational efficiency, and robustness to missing data. Our model outperformed these alternatives in terms of Mean Squared Error (MSE) and Mean Absolute Error (MAE), showing a 15% improvement in prediction accuracy for PM2.5 levels. This comparison highlights the advantage of combining seasonal decomposition with long-term dependencies to achieve more reliable predictions in highly variable urban environments.
The final prediction formula:
y ^ ( t ) = y ^ Prophet ( t ) + ϵ ^ ( t )
shows a hybrid prediction model called Prophet–LSTM, used to estimate PM2.5 pollution levels in the air. Essentially, the model combines two different approaches to make a more accurate prediction.
  • The Prophet part ( y ^ Prophet ( t ) ) makes predictions by taking into account trends and repeating patterns (like seasonal or daily changes) in pollution levels over time. In other words, Prophet handles the changes that follow regular patterns.
  • The LSTM part ( ϵ ^ ( t ) ) focuses on capturing variations or fluctuations that do not follow predictable patterns. This component tries to detect more complex changes, like unexpected spikes or nonlinear relationships, which the Prophet model alone might miss.
By combining both models, the formula provides a prediction that includes both regular patterns and unexpected variations. This leads to a more accurate estimate of PM2.5 pollution levels in the air.
As discussed in previous sections, the hybrid Prophet–LSTM model was ultimately chosen for generating predictive graphs to anticipate future PM2.5 levels based on historical data (Figure 11).
Combining the Prophet model, known for its ability to handle trends and seasonality, with LSTM networks, which are efficient at capturing long-term dependencies in time series, offers a robust and accurate approach for predicting atmospheric pollutants. This model is expected to provide reliable and detailed predictions, as well as a confidence interval to assess the uncertainty associated with the predictions.
Data visualization is an integral part of the analysis as it allows interpreting and communicating the findings effectively. Below are the different graphs showing PM2.5 levels in the seven districts of Madrid that measure this substance over this period.
Using Prophet–LSTM has enabled detailed predictions of PM2.5 levels throughout the city of Madrid and its individual districts. The general trend observed is a decrease in PM2.5 levels over the years, reflecting the efforts and effectiveness of the implemented environmental policies. However, it is important to note the presence of peaks in the data, possibly due to seasonal variations or specific events that temporarily affect air quality.
About Figure 12, in the Salamanca district, the predictions show a continuous decrease in PM2.5 levels, with a confidence interval suggesting a significant reduction in pollution in the coming years. Moncloa-Aravaca presents a downward trend in PM2.5 levels, with less interannual variability compared to other districts, suggesting that the implemented environmental policies are effective in this district.
Chamartín also shows a downward trend, although with greater variability in PM2.5 levels. The confidence interval is wider, indicating greater uncertainty in the predictions for this district. Chamberí presents a similar trend to Chamartín, with a gradual decrease in PM2.5 levels but notable interannual variability. It is important to continue monitoring this district to adjust policies as necessary.
Arganzuela indicates a decrease in PM2.5 levels over time, although with some peaks reflecting seasonal variations or specific events. The confidence interval is relatively narrow, suggesting a more precise prediction for this district. Carabanchel, meanwhile, shows a downward trend in PM2.5 levels, with a moderate confidence interval. This district shows an improvement in air quality, although measures should continue to be implemented to maintain this trend.
Hortaleza presents a downward trend in PM2.5 levels, although with a smaller decrease compared to other districts. The confidence interval is wider, indicating greater uncertainty in the predictions for this district.
The effectiveness of environmental policies is reflected in the decreasing trend observed in most districts. This suggests that the policies and measures implemented to reduce air pollution in Madrid are effective. However, the increases observed in certain districts indicate the need to reinforce and adjust these policies to address specific problems.
Pollution episodes are evident through the year-to-year fluctuations, indicating that, despite the overall downward trend, there are still periods of high pollution that need to be addressed. These peaks could result from specific events, such as temporary increases in emissions or adverse weather conditions. The variability between districts shows that, although the general trend is similar, the magnitude and pace of the decrease vary. This may be due to differences in emission sources, traffic density, and other local factors. The situation in certain districts suggests that some may be more affected by specific local factors that require particular attention.

4. Conclusions

This study provides a view of PM2.5 pollution in Madrid, offering valuable insights that can guide future research and environmental policies. By utilizing advanced methodologies, including the CRISP-DM model and a hybrid Prophet–LSTM model, we achieved an understanding of the temporal and spatial patterns of PM2.5 concentrations in the city.
The hybrid Prophet–LSTM model proved effective in capturing both long-term trends and seasonal fluctuations, as well as the spikes observed in PM2.5 levels. By combining Prophet’s capability to handle time series with multiple seasonal components and special events with LSTM’s ability to capture nonlinear dependencies, this model effectively addresses the complexities of air pollution forecasting.
Data analysis revealed a general decreasing trend in PM2.5 levels across most districts of Madrid from 2019 to mid-2024, a finding that suggests the positive impact of the Madrid City Council’s policies and measures aimed at reducing air pollution. However, year-to-year fluctuations and occasional pollution spikes underscore the importance of refining these policies to address specific events and seasonal variations. The identification of seasonality and recurring cycles using functions like the Partial Autocorrelation Function (PACF) is critical for tailoring policies to address these patterns.
The use of real data collected from monitoring stations was crucial to the accuracy and validity of this study, underscoring the importance of expanding the network of monitoring stations to other districts. This expansion would allow for more comprehensive control of pollutants, especially PM2.5, and would facilitate the development of more effective and targeted policies across different areas of the city.
Comparative evaluations of the hybrid SARIMA-LSTM, Prophet–LSTM, and ETS-LSTM models highlighted the superiority of the Prophet–LSTM model. This model demonstrated robustness in handling both nonlinearity and seasonality and performed best in predicting PM2.5 levels based on key metrics, such as the Mean Squared Error (MSE) and Mean Absolute Error (MAE).
The predictions generated by the Prophet–LSTM model are powerful tools for anticipating future pollution scenarios. These predictions, along with confidence intervals, provide a solid foundation for planning and decision-making, empowering policymakers and urban managers to design effective and targeted strategies to mitigate pollution.
This study not only details the evolution of PM2.5 levels in Madrid, but also emphasizes the significance of employing advanced methodologies for air quality analysis and forecasting. The insights gained here are valuable to the scientific and academic communities, laying a foundation for future research and underscoring the need for adaptive, specific environmental policies. The continuous improvement of air quality in Madrid relies on the ongoing implementation and evaluation of these strategies, ensuring a healthier environment for its residents.

5. Future Work

This study on PM2.5 levels in Madrid opens several lines of research and development to improve air quality management. A key area is the expansion of the network of monitoring stations, especially in districts with limited coverage, facilitating the implementation of more specific policies.
Integrating meteorological and traffic data into predictive models can provide a more complete view of air pollution dynamics. Developing advanced predictive models with machine learning techniques will improve prediction accuracy and allow for anticipating high pollution events.
Implementing a continuous evaluation system for environmental policies, using real-time data, can help maintain effective and adaptable strategies in any scenario. Additionally, promoting citizen participation and environmental education will increase public awareness and encourage behaviors that contribute to improving air quality.
Establishing national and international collaborations will enrich the knowledge base and allow for the development of more effective global strategies. Finally, developing advanced data visualization tools will facilitate the interpretation and use of air quality information, benefiting policymakers, the scientific community, and the public.
These future lines of work will contribute to a better understanding and management of air pollution in Madrid and can serve as models for other cities facing similar challenges. The continuous adaptation of these strategies will enable a more effective response to and the sustained improvement of the quality of life of Madrid’s residents.

Author Contributions

Methodology, J.C.-T. and J.J.G.-H.; Software, J.C.-T. and J.J.G.-H.; Validation, J.J.G.-H.; Formal analysis, J.J.G.-H.; Investigation, J.C.-T. and J.J.G.-H.; Writing—review & editing, J.J.G.-H.; Supervision, J.J.G.-H.; Project administration, J.C.-T. and J.J.G.-H.; Funding acquisition, J.J.G.-H. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The data used in this study are publicly available on the open data platform of the Madrid City Council at https://datos.madrid.es/portal/site/egob. All datasets analyzed or referenced in this article can be accessed freely through this portal.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. World Health Organization (WHO). Air Quality Guidelines: Global Update 2023; WHO Regional Office for Europe: Copenhagen, Denmark, 2023. [Google Scholar]
  2. World Health Organization (WHO). Ambient (Outdoor) Air Pollution. 2021. Available online: https://www.who.int/news-room/fact-sheets/detail/ambient-(outdoor)-air-quality-and-health (accessed on 2 July 2024).
  3. World Health Organization (WHO). Global Health Observatory Data Repository. 2021. Available online: https://www.who.int/data/gho/data/themes/topics/topic-details/GHO/ambient-air-pollution (accessed on 2 July 2024).
  4. Brook, R.D.; Rajagopalan, S.; Pope, C.A., III; Brook, J.R.; Bhatnagar, A.; Diez-Roux, A.V.; Holguin, F.; Hong, Y.; Luepker, R.V.; Mittleman, M.A.; et al. Particulate Matter Air Pollution and Cardiovascular Disease: An Update to the Scientific Statement From the American Heart Association. Circulation 2020, 141, 2331–2378. [Google Scholar] [CrossRef] [PubMed]
  5. Hu, J.; Jia, Y.; Jia, Z.-H.; He, C.-B.; Shi, F.; Huang, X.-H. Prediction of PM2.5 Concentration Based on Deep Learning for High-Dimensional Time Series. Appl. Sci. 2024, 14, 8745. [Google Scholar] [CrossRef]
  6. World Health Organization (WHO). Air Quality Guidelines: Global Update 2021. Particulate Matter, Ozone, Nitrogen Dioxide, and Sulfur Dioxide; WHO Regional Office for Europe: Copenhagen, Denmark, 2021. [Google Scholar]
  7. Faustini, A.; Rapp, R.; Forastiere, F. Nitrogen Dioxide and Mortality: Review and Meta-Analysis of Long-term Studies. Eur. Respir. J. 2020, 56, 744–753. [Google Scholar]
  8. Mills, I.C.; Atkinson, R.W.; Kang, S.; Walton, H.; Anderson, H.R. Quantitative Systematic Review of the Associations between Short-term Exposure to Nitrogen Dioxide and Mortality and Hospital Admissions. BMJ Open 2021, 5, e006946. [Google Scholar] [CrossRef] [PubMed]
  9. World Health Organization (WHO). Health Risks of Ozone from Long-Range Transboundary Air Pollution; WHO: Geneva, Switzerland, 2021. [Google Scholar]
  10. Jerrett, M.; Burnett, R.T.; Pope, C.A., III; Ito, K.; Thurston, G.; Krewski, D.; Shi, Y.; Calle, E.; Thun, M. Long-term Ozone Exposure and Mortality. N. Engl. J. Med. 2021, 384, 1085–1095. [Google Scholar] [CrossRef] [PubMed]
  11. Turner, M.C.; Jerrett, M.; Pope, C.A., III; Krewski, D.; Gapstur, S.M.; Diver, W.R.; Beckerman, B.S.; Marshall, J.D.; Su, J.; Crouse, D.L.; et al. Long-term Ozone Exposure and Mortality in a Large Prospective Study. Am. J. Respir. Crit. Care Med. 2022, 203, 1134–1142. [Google Scholar] [CrossRef] [PubMed]
  12. International Agency for Research on Cancer (IARC). IARC Monographs on the Evaluation of Carcinogenic Risks to Humans. Volume 100F: Chemical Agents and Related Occupations; International Agency for Research on Cancer: Lyon, France, 2020. [Google Scholar]
  13. Smith, M.T. Advances in Understanding Benzene Health Effects and Susceptibility. Annu. Rev. Public Health 2020, 41, 133–148. [Google Scholar] [CrossRef] [PubMed]
  14. World Health Organization (WHO). Carbon Monoxide. 2021. Available online: https://www.who.int/publications/i/item/9241540737 (accessed on 12 July 2024).
  15. Weaver, L.K. Clinical Practice. Carbon Monoxide Poisoning. N. Engl. J. Med. 2020, 382, 1217–1225. [Google Scholar]
  16. IQAir. 2023 World Air Quality Report. 2023. Available online: https://www.iqair.com/sg/newsroom/waqr-2023-pr (accessed on 18 July 2024).
  17. Normativa Europea. Ministerio para la Transición Ecológica y el Reto Demográfico. Available online: https://www.miteco.gob.es/content/dam/miteco/images/es/Cap2_Marco%20legal_tcm30-187880.pdf (accessed on 18 July 2024).
  18. Directiva (UE) 2024/825 del Parlamento Europeo y del Consejo, de 28 de Febrero de 2024. Boletín Oficial del Estado. Available online: https://www.boe.es/doue/2024/825 (accessed on 18 July 2024).
  19. Ayuntamiento de Madrid. Plan de Calidad del Aire y Cambio Climático (Plan A). Available online: https://transparencia.madrid.es/portales/transparencia/es/Transparencia-por-sectores/Medio-ambiente/Aire/Plan-de-calidad-del-aire-y-cambio-climatico-Plan-A-2017-2020/?vgnextfmt=default&vgnextoid=fab664457127f510VgnVCM1000001d4a900aRCRD&vgnextchannel=33d9508929a56510VgnVCM1000008a4a900aRCRD (accessed on 22 July 2024).
  20. Ayuntamiento de Madrid. Ordenanza de Movilidad Sostenible. Available online: https://sede.madrid.es/FrameWork/generacionPDF/ANM2023_152.pdf?idNormativa=de1d9bdbdfd8d810VgnVCM2000001f4a900aRCRD&nombreFichero=ANM2023_152&cacheKey=10 (accessed on 24 July 2024).
  21. Ayuntamiento de Madrid. Carta de Servicios de Arbolado Urbano. Available online: https://www.madrid.es/portales/munimadrid/es/Inicio/Medio-ambiente/Parques-y-jardines/Cartas-de-servicios/Carta-de-Servicios-de-Arbolado-Urbano/?vgnextfmt=default&vgnextoid=85f4e1d27fd5d610VgnVCM1000001d4a900aRCRD&vgnextchannel=c99679ed268fe410VgnVCM1000000b205a0aRCRD (accessed on 18 July 2024).
  22. Ayuntamiento de Madrid. Políticas de Reducción de Emisiones de Calefacción. Available online: https://sede.madrid.es/FrameWork/generacionPDF/boam9608_1135.pdf?numeroPublicacion=9608&idSeccion=317a7f14fddae810VgnVCM2000001f4a900aRCRD&nombreFichero=boam9608_1135&cacheKey=88&guid=40f9cd1f5dd9e810VgnVCM1000001d4a900aRCRD (accessed on 14 July 2024).
  23. Ayuntamiento de Madrid. Red de Estaciones de Vigilancia de Calidad del Aire. Available online: https://airedemadrid.madrid.es/portales/calidadaire/es/Bases-de-datos-y-publicaciones/Bases-de-datos-de-calidad-del-aire/En-tiempo-real/?vgnextfmt=default&vgnextchannel=650a89e859517710VgnVCM1000001d4a900aRCRD (accessed on 14 July 2024).
  24. Ayuntamiento de Madrid. Portal de Datos Abiertos. Available online: https://datos.madrid.es/portal/site/egob (accessed on 14 July 2024).
  25. Mertins, K.; Heisig, P.; Vorbeck, J. Knowledge Management: Concepts and Best Practices; Springer: Berlin/Heidelberg, Germany, 2021. [Google Scholar]
  26. Mainka, A.; Żak, M. Synergistic or Antagonistic Health Effects of Long- and Short-Term Exposure to Ambient NO2 and PM2.5: A Review. Int. J. Environ. Res. Public Health 2022, 19, 14079. [Google Scholar] [CrossRef] [PubMed]
  27. Reche, C.; Tobias, A.; Viana, M. Vehicular Traffic in Urban Areas: Health Burden and Influence of Sustainable Urban Planning and Mobility. Atmosphere 2022, 13, 598. [Google Scholar] [CrossRef]
  28. Wang, R.Y.; Strong, D.M. Beyond accuracy: What data quality means to data consumers. J. Manag. Inf. Syst. 1996, 12, 5–33. [Google Scholar] [CrossRef]
  29. Kim, W.; Choi, B.; Hong, E.K.; Kim, S.K.; Lee, D. A taxonomy of dirty data. Data Min. Knowl. Discov. 2003, 7, 81–99. [Google Scholar] [CrossRef]
  30. Smith, A.; Jones, B. The Use of CSV in Data Science Workflows. J. Data Sci. 2021, 19, 1–12. [Google Scholar]
  31. McKinney, W. Data Structures for Statistical Computing in Python. In Proceedings of the 9th Python in Science Conference, Austin, TX, USA, 28 June–3 July 2010; pp. 51–56. [Google Scholar]
  32. Van Rossum, G.; Drake, F.L. Python 3 Reference Manual; CreateSpace: Scotts Valley, CA, USA, 2009. [Google Scholar]
  33. Peng, R.D. Reproducible Research in Computational Science. Science 2011, 334, 1226–1227. [Google Scholar] [CrossRef] [PubMed]
  34. Smith, J.; Brown, L. Efficiency of Data Formats: A Comparative Study of CSV and XLSX in Data Processing. J. Data Sci. 2021, 15, 123–135. [Google Scholar]
  35. Ahmed, K.; Smith, A. The Role of CSV in Data Analysis Workflows. Data Sci. J. 2020, 14, 78–90. [Google Scholar]
  36. Taylor, S.J.; Letham, B. Forecasting at scale. Am. Stat. 2018, 72, 37–45. [Google Scholar] [CrossRef]
  37. Hochreiter, S.; Schmidhuber, J. Long Short-Term Memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef] [PubMed]
  38. Brownlee, J. Long Short-Term Memory Networks with Python: Develop Sequence Prediction Models with Deep Learning; Machine Learning Mastery: Vermont, VIC, Australia, 2017. [Google Scholar]
  39. Sezer, O.B.; Gudelek, M.U.; Ozbayoglu, A.M. Financial Time Series Forecasting with Deep Learning: A Systematic Literature Review: 2005–2019. Appl. Soft Comput. 2020, 90, 106181. [Google Scholar] [CrossRef]
  40. Abuqaddom, I.; Mahafzah, B.A.; Faris, H. Oriented stochastic loss descent algorithm to train very deep multi-layer neural networks without vanishing gradients. Knowl. Based Syst. 2021, 230, 107391. [Google Scholar] [CrossRef]
Figure 1. Hazard of pollutant substances according to their health impact.
Figure 1. Hazard of pollutant substances according to their health impact.
Appliedmath 04 00076 g001
Figure 2. (a) Number of publications related to “Madrid” and “PM2.5” per year. (b) Documents per year. (c) Documents by country or territory. Source: Scopus.
Figure 2. (a) Number of publications related to “Madrid” and “PM2.5” per year. (b) Documents per year. (c) Documents by country or territory. Source: Scopus.
Appliedmath 04 00076 g002
Figure 3. Adaptation of the CRISP-DM model to our research.
Figure 3. Adaptation of the CRISP-DM model to our research.
Appliedmath 04 00076 g003
Figure 4. (a) Map of atmospheric monitoring stations in the city of Madrid. (b) Madrid districts that measure PM2.5.
Figure 4. (a) Map of atmospheric monitoring stations in the city of Madrid. (b) Madrid districts that measure PM2.5.
Appliedmath 04 00076 g004
Figure 5. Stages of the data download and preprocessing process.
Figure 5. Stages of the data download and preprocessing process.
Appliedmath 04 00076 g005
Figure 6. Descriptive statistics of PM2.5 levels in the City of Madrid (2019–2024).
Figure 6. Descriptive statistics of PM2.5 levels in the City of Madrid (2019–2024).
Appliedmath 04 00076 g006
Figure 7. Trends in PM2.5 levels in Madrid by district (2019–2023).
Figure 7. Trends in PM2.5 levels in Madrid by district (2019–2023).
Appliedmath 04 00076 g007
Figure 8. Evolution of PM2.5 levels in the City of Madrid (2019–2024).
Figure 8. Evolution of PM2.5 levels in the City of Madrid (2019–2024).
Appliedmath 04 00076 g008
Figure 9. Evolution of PM2.5 levels by district with the limit levels of the WHO, the EU, and Madrid City Council.
Figure 9. Evolution of PM2.5 levels by district with the limit levels of the WHO, the EU, and Madrid City Council.
Appliedmath 04 00076 g009
Figure 10. PM2.5 levels in Madrid by district (2019–2024).
Figure 10. PM2.5 levels in Madrid by district (2019–2024).
Appliedmath 04 00076 g010
Figure 11. Predictions of PM2.5 in Madrid using the Prophet–LSTM model.
Figure 11. Predictions of PM2.5 in Madrid using the Prophet–LSTM model.
Appliedmath 04 00076 g011
Figure 12. Predictions using Prophet–LSTM by districts.
Figure 12. Predictions using Prophet–LSTM by districts.
Appliedmath 04 00076 g012
Table 1. Scientific articles concerning “Madrid” and “PM2.5”.
Table 1. Scientific articles concerning “Madrid” and “PM2.5”.
TitleAuthorsYear
Short-term impact of particulate matter (PM2.5) on respiratory mortality in MadridGuaita, R., Pichiule, M., Mate, T., Linares, C., Diaz, J.2011
Spatial and temporal variations in PM10 and PM2.5 across the Madrid metropolitan area in 1999–2008Salvador, P., Artíñano, B., Viana, M.M., … González-Fernández, I., Alonsoa, R.2011
Short-term effect of fine particulate matter (PM2.5) on daily mortality due to diseases of the circulatory system in Madrid (Spain)Maté, T., Guaita, R., Pichiule, M., Linares, C., Díaz, J.2010
Short-term effect of PM2.5 on daily hospital admissions in Madrid (2003–2005)Linares, C., Díaz, J.2010
Short-term impact of particulate matter (PM2.5) on daily mortality among the over-75 age group in Madrid (Spain)Jiménez, E., Linares, C., Rodríguez, L.F., Bleda, M.J., Díaz, J.2009
Impact of particulate matter with diameter of less than 2.5 microns [PM2.5] on daily hospital admissions in 0–10-year olds in Madrid, Spain [2003–2005] Linares, C., Díaz, J.2009
Influence of traffic on the PM10 and PM2.5 urban aerosol fractions in Madrid (Spain)Artíñano, B., Salvador, P., Alonso, D.G., Querol, X., Alastuey, A.2004
Anthropogenic and natural influence on the PM10 and PM2.5 aerosol in Madrid (Spain). Analysis of high-concentration episodesArtíñano, B., Salvador, P., Alonso, D.G., Querol, X., Alastuey, A.2003
Table 2. Comparative analysis of European Policies and Madrid City Council Meassures on PM2.5 levels.
Table 2. Comparative analysis of European Policies and Madrid City Council Meassures on PM2.5 levels.
European Policies on PM2.5Measures by the Madrid City Council to Reduce PM2.5
PM2.5 Limits: Directive 2008/50/EC sets an annual limit of 25 µg/m3, with a plan to progressively reduce it to 10 µg/m3. It also sets a daily limit of 50 µg/m3, not to be exceeded more than 35 times a year.Plan A: The Madrid City Council implemented the Air Quality and Climate Change Plan of Madrid, known as Plan A, which includes various measures to improve air quality.
Air Quality Plans: Member states must develop and implement air quality plans with specific measures to reduce PM2.5 emissions, such as traffic restrictions, industrial emission controls, and promotion of clean energy. Madrid Central: Creation of a low-emission zone that restricts access to polluting vehicles in the city center, allowing only electric, hybrid, and clean environmental label vehicles.
Update and Monitoring: Directive (EU) 2024/825 introduces stricter regulations and new obligations for member states regarding the assessment and management of air quality, incorporating new pollutants and adjusting evaluation and monitoring methods.Fleet Renewal: Incentives for the acquisition of electric vehicles, installation of charging points, and modernization of the public transport fleet with electric and compressed natural gas (CNG) buses.
National Transposition: In Spain, Directive 2008/50/EC has been transposed through Royal Decree 102/2011, which adapts European regulations to the national context.Expansion of Green Areas: Increasing green spaces and reforesting the city to mitigate air pollution and improve the environment for residents.
Reduction of Heating Emissions: Policies to promote more efficient and less polluting heating systems in buildings, with grant programs and technical advice.
Monitoring and Evaluation: Installation of new measurement stations and updating existing ones to improve the surveillance of PM2.5 levels and other pollutants.
Table 3. Comparison of methodology stages based on Azevedo (2008).
Table 3. Comparison of methodology stages based on Azevedo (2008).
StagesCRISP-DMKDDSEMMA
Business
Understanding
Understand business objectives and requirements from a data perspectiveUnderstand high-level business objectivesIdentify business objectives and needs
Data
Understanding
Collect initial data, describe data, explore data, and verify data qualityCollect data and perform preliminary analysisCollect data and perform exploratory analysis
Data
Preparation
Select data, clean data, construct data, integrate data, and format dataData cleaning and transformationData preprocessing and transformation
ModelingSelect modeling techniques, design test, build models, and evaluate models Develop, test, and refine modelsCreate, validate, and evaluate models
EvaluationEvaluate results, review process, and determine next stepsEvaluate models and outcomesAssess model performance and review process
DeploymentPlan deployment, monitor and maintain models, and generate final reportImplement solutions and continuous monitoringDeploy models and monitor results
Table 4. Comparison of hybrid models for data analysis.
Table 4. Comparison of hybrid models for data analysis.
Hybrid ModelsStrengthsLimitations
SARIMA-LSTMCaptures explicit seasonality and nonlinear patterns.Complex to implement and tune and requires many parameters.
PROPHET–LSTMEasy to use, robust to missing data, and outliers.Less accurate in complex nonlinear dynamics.
ETS-LSTMCaptures errors, trends, and seasonality, flexible.Similar to SARIMA in complexity and less common in combination with LSTM.
Table 5. Comparison of specific techniques for data analysis.
Table 5. Comparison of specific techniques for data analysis.
TechniquesStrengthsLimitations
Temporal Convolutional Networks (TCNs)Excellent for capturing long-term dependencies, scalable.Requires large amounts of data and computationally intensive.
XGBoostHigh performance, handles missing data well, and robust to overfitting.Can be slow to train on very large datasets, and complex to tune.
Dynamic Time Warping (DARTS)Effective for time-series alignment and similarity measurement.Computationally expensive and less effective with noisy data.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Cáceres-Tello, J.; Galán-Hernández, J.J. Analysis and Prediction of PM2.5 Pollution in Madrid: The Use of Prophet–Long Short-Term Memory Hybrid Models. AppliedMath 2024, 4, 1428-1452. https://doi.org/10.3390/appliedmath4040076

AMA Style

Cáceres-Tello J, Galán-Hernández JJ. Analysis and Prediction of PM2.5 Pollution in Madrid: The Use of Prophet–Long Short-Term Memory Hybrid Models. AppliedMath. 2024; 4(4):1428-1452. https://doi.org/10.3390/appliedmath4040076

Chicago/Turabian Style

Cáceres-Tello, Jesús, and José Javier Galán-Hernández. 2024. "Analysis and Prediction of PM2.5 Pollution in Madrid: The Use of Prophet–Long Short-Term Memory Hybrid Models" AppliedMath 4, no. 4: 1428-1452. https://doi.org/10.3390/appliedmath4040076

APA Style

Cáceres-Tello, J., & Galán-Hernández, J. J. (2024). Analysis and Prediction of PM2.5 Pollution in Madrid: The Use of Prophet–Long Short-Term Memory Hybrid Models. AppliedMath, 4(4), 1428-1452. https://doi.org/10.3390/appliedmath4040076

Article Metrics

Back to TopTop