Next Article in Journal
Shallow Submarine CO2 Emissions in Coastal Volcanic Areas Implication for Global Carbon Budget Estimates: The Case of Vulcano Island (Italy)
Next Article in Special Issue
The Impacts of Gentrification on Air Pollutant Levels and Child Opportunity Index near New York City Schools
Previous Article in Journal
A Multidimensional Assessment of CO2-Intensive Economies Through the Green Economy Index Framework
Previous Article in Special Issue
Urban Source Apportionment of Potentially Toxic Elements in Thessaloniki Using Syntrichia Moss Biomonitoring and PMF Modeling
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Modeling Air Pollution in Metropolitan Lima: A Statistical and Artificial Neural Network Approach

by
Miguel Angel Solis Teran
1,
Felipe Leite Coelho da Silva
2,
Elías A. Torres Armas
3,
Natalí Carbo-Bustinza
4 and
Javier Linkolk López-Gonzales
5,*
1
Facultad de Ingeniería y Arquitectura, Universidad Peruana Unión, Lima 15468, Peru
2
Department of Mathematics, Federal Rural University of Rio de Janeiro, Seropédica 23890-000, Brazil
3
Instituto de Investigación de Estudios Estadísticos y Control de Calidad, Universidad Nacional Toribio Rodríguez de Mendoza, Chachapoyas 01001, Peru
4
E.P. Ingeniería Ambiental, Universidad Peruana Unión, Lima, Peru
5
Escuela de Posgrado, Universidad Peruana Unión, Lima 15468, Peru
*
Author to whom correspondence should be addressed.
Environments 2025, 12(6), 196; https://doi.org/10.3390/environments12060196
Submission received: 23 April 2025 / Revised: 27 May 2025 / Accepted: 4 June 2025 / Published: 10 June 2025
(This article belongs to the Special Issue Air Pollution in Urban and Industrial Areas III)

Abstract

Particulate matter is a mixture of fine dust and tiny droplets of liquid suspended in the air. PM10 is a pollutant composed of particles smaller than 10 µm. These particles are harmful to the respiratory system. The air quality in the region and capital Lima in the Republic of Peru has been investigated in recent years. In this context, statistical analyses of PM10 data with forecast models can contribute to planning actions that can improve air quality. The objective of this work is to perform a statistical analysis of the available PM10 data and evaluate the quality of time series classical models and neural networks for short-term forecasting. This study demonstrates that classical time series models, particularly ARIMA and SSA, achieve lower average forecast errors than LSTM across stations SMP, CRB, and ATE. This finding suggests that for data with seasonal patterns and relatively short time series, traditional models may be more efficient and robust. Although neural networks have the potential to capture more complex relationships and long-term dependencies, their performance may be limited by hyperparameter settings and intrinsic data characteristics.

1. Introduction

Currently, the generation of pollutants in the air experiences a significant increase, predominantly in the form of gases and suspended particles of sizes dangerously small for human health. There is epidemiological evidence that particulate matter is associated with risks of cardiovascular and respiratory mortality [1,2]. According to Sánchez [3], it is crucial to understand this phenomenon from a spatiotemporal framework by studying elements that provide measurable data to explain the interaction between matter and energy in the environment. Similarly, Mahmud [4] suggests the importance of examining different elements within a dynamic field, which changes rapidly due to the variability of its spatial and temporal components, thus allowing objective, explainable, and understandable research in the environmental field.
According to [5,6], one of the methods that allows the modeling of suspended xenobiotics is the spatial and temporal analysis of the effects of PM10, as well as the influence of meteorological variables such as temperature and wind on air quality. In [7], authors highlight that the understanding of these particles, due to their size, is worrying, especially due to their levels of permanence and invasion in the respiratory tract. In this context, ref. [8] makes favorable predictions through singular spectral analysis. This approach is considered a reliable stochastic process to draw a spatiotemporal visualization of PM10, a pollutant with a great impact on exposed populations.
Singular spectral analysis (SSA) has proven to be an effective tool in the prediction of air pollutants such as PM10 due to its ability to decompose complex time series into simpler components, thereby facilitating the identification of underlying patterns and trends in air quality data. According to study [9], the combination of SSA with machine learning techniques significantly improves the accuracy in prediction of airborne pollutants. Furthermore, the integrated SSA-ARIMA approach described in [10] enables multi-day forecasts with high confidence, applicable also to PM10. For its part, study [11] presents hybrid deep learning models that, combined with SSA, offer high performance in the long-term prediction of pollutants such as PM10 and PM2.5, highlighting the value of advanced techniques in the evaluation and control of air quality.
Two studies, that of Bodor et al. (2020) in Transylvania [12] and that of Shikhovtsev et al. (2023) in the South Baikal Region [13], coincide in highlighting the seasonal and spatial variability of PM10 concentrations, demonstrating the influence of both local climatic conditions and long-range transport on the dynamics of this atmospheric pollutant. In addition, the studied attempt to take the sensory perception of pollutants towards a current, agile, and dynamic quantification system, capable of being modeled and understood through statistical processes. On the other hand, ref. [14] states that there are frequent variability measures within the spatial components, subject to certain meteorological factors that can generate white noise. However, ref. [15] establishes that the time series comprise a dynamic system, defined as a linear combination of various oscillators, which through series projection in principal components developed in the time-frequency domain allows establishing dominant oscillation models and associating them with various events compromised with PM10 particulates. Ref. [16] mentions that singular spectral analysis (SSA) is a fundamental part of the probabilistic prediction of the energy process, autoregressive models to predict modular behaviors of matter. For [17,18], the usefulness of the SSA method accompanied by others that helped to decompose the data improved the spectral analysis of PM10 emission, considering its proportions and effects on the environment.
Finally, it is worth pointing out that few studies have been carried out in Peru related to suspended particulate pollutants. The last study related to pollutants smaller than 10 microns in [19,20] analyzed the spatial and temporal distribution of PM10 concentration in Metropolitan Lima in the period 2015–2017 with results above quality standards; however, no specific spectral studies have been carried out in order to trace sufficient frequencies to understand pollution within atmospheric patterns. In [21], the authors note that concentrations exhibit non-linear behavior and fluctuate strongly on space-time scales, which allows to visualize and describe widespread pollution and trace its permanence. In that sense, the objective of this research is to model and forecast the concentration of PM10 per hour based on artificial neural networks and traditional models.

2. Methodology

In this study, PM10 data from Peru were used. The National Meteorological and Hydrological Service uses particle samplers to collect PM10. These devices are part of the Automatic Air Quality Monitoring Network that SENAMHI operates to monitor air quality. The samplers are semi-manual and use 47 mm filters to capture airborne particles. This analysis considered five weather stations located in the region of the province of Lima. The study area comprises the capital of the Republic of Peru, the province of Lima (Figure 1). Lima is located at 77° W and 12° S of the South American continent. The data used covers the period of 2017–2018.
To evaluate the prediction accuracy of the models, the PM10 time series was divided into 10 training and test sets. From the first defined training set, a forecast of seven days ahead was carried out. Then, the training set received one more observation, and again the forecast was performed seven days ahead. Following this methodology, the steps mentioned above were carried out 10 times. In this work, the classical time series models were used, such as the Box–Jenkins and exponential smoothing models. Also, autoregressive neural network models (NNARs), multilayer perceptrons (MLPs), and long short-term memory (LSTM) models were used. Furthermore, the naive models, TBATS, dynamic linear model (DLM), and singular spectrum analysis (SSA) were adopted.

2.1. Box–Jenkins Models

The Box–Jenkins [22] methodology is widely used in analyzing parametric time series models. This methodology includes fitting integrated autoregressive moving average models, ARIMA ( p , d , q ) , to a data set using the autocorrelation functions between observations.
The model used was seasonal ARIMA (SARIMA), which incorporated the seasonality component into the data. The structure of the seasonal ARIMA model of order ( p , d , q ) × ( P , D , Q ) s is given by
ϕ ( B ) Φ ( B s ) Δ d Δ s D Z t = θ ( B ) Θ ( B s ) a t
where a t is white noise; ϕ ( B ) is the autoregressive operator of order p; θ ( B ) is the moving average operator of order q; Φ ( B s ) is the seasonal autoregressive operator of order P; Θ ( B s ) is the seasonal moving average operator of order Q; Δ d is the simple difference operator; Δ s D is the seasonal difference operator; s is the number of observations per year (period). The Box–Jenkins model was obtained using the algorithm proposed by Hyndman and Khandakar [23].

2.2. Exponential Smoothing Method

The exponential smoothing method was proposed in the 1950s in [24,25,26]. In this method, forecasts are generated from a weighted average of past observations, with the weight of observations decreasing exponentially as the observations age. Hyndman et al. [27] proposed a classification that depends on the time series’ error, trend, and seasonality. The error can be additive (A) or multiplicative (M), the trend can be additive (A), additive with damping ( A d ) or, in the case of non-existence, there can be none (N); in turn, seasonality can be additive (A), multiplicative (M), or there can be none (N). Therefore, each exponential smoothing model used in this work can be classified as ETS (Error, Trend, Seasonality). Furthermore, the ETS algorithm proposed by Hyndman [28] was used to adjust the exponential smoothing models.

2.3. Neural Network Autoregression

The Neural Network Autoregression (NNAR) model constitutes a linear generalization of classic autoregression models, in which the relationship between the calculated values of a time series and its current value is modeled using a neural network feedforward from a hidden layer. This paper uses the algorithm proposed in [23]. Formally, a NNAR ( p , k ) model can be expressed as
y t = f y t 1 , y t 2 , , y t p + ε t
where f ( . ) represents a non-linear function approximated by a neuronal network with k neurons in the hidden layer and y ε t is an error term with zero mean and constant variance. The neural network is trained using optimization algorithms such as backpropagation with gradient descent, minimizing loss function typically based on the average quadratic error [27,29]. The NNAR architecture is particularly useful in contexts where temporal dynamics present linear, seasonal, or wide-range behaviors that are not captured efficiently by linear models such as ARIMA. Furthermore, its implementation maintains the traditional autoregressive structure, which facilitates comparison and interpretation within the time series modeling framework [30].

2.4. Multi-Layer Perceptron

The MLP model or the feedforward neural network is a mathematical function mapping a sort of input values to output values. According to [31], the main objective of a feedforward neural network is to approximate any function f * defining a mapping Y = f ( x ; θ ) and learn the value of the parameter θ that makes the better function approximation. This consists of at least three node covers: an input layer, one or more hidden covers, and an output layer. Each node (except those at the input layer) applies a linear activation function, which allows the model to capture complex and non-linear relationships between the input and output variables. Formally, the MLP output is defined as
y ^ = f ( L ) W ( L ) f ( L 1 ) f ( 1 ) W ( 1 ) x + b ( 1 ) + + b ( L )
where L is the number of covers, W ( l ) and b ( l ) are the weights and biases of the l cover, and f ( l ) is the corresponding activation function. Commonly employed activation functions include ReLU, sigmoid, and tanh [31,32]. MLP training is carried out using a backpropagation algorithm in combination with optimization techniques such as stochastic gradient descent or more sophisticated variants such as Adam. Due to their flexibility and universal approximation capacity [33], MLPs are used as fundamental blocks in more complex architectures of deep learning.

2.5. Long Short-Term Memory

Long short-term memory (LSTM) networks are a variant of recurrent neural networks (RNNs) that effectively solve the problem of gradient fading when modeling long-term dependencies in data sequences [34]. Their architecture incorporates a memory cell and computer mechanisms to regulate the flow of information. The behavior of an LSTM unit can be expressed in compact form as
f t i t o t C ˜ t = σ σ σ tanh W h t 1 x t + b C t = f t C t 1 + i t C ˜ t , h t = o t tanh C t
where x t is the input at time t, h t 1 is the previous hidden state, C t is the state of the cell, σ denotes the sigmoid function, and ⊙ represents the product element by element. LSTMs are widely used in time series prediction tasks due to their ability to capture complex patterns and wide-ranging dependencies [35,36].

2.6. Dynamic Linear Model

Dynamic Linear Models (DLMs) form a class of Bayesian state-space models used for time series with structures that evolve over time [37]. Their formulation is based on two normal conditional distributions: y t θ t N F t θ t , V t for the observation model and θ t θ t 1 N G t θ t 1 , W t for the evolution model. Here, y t is the observation at t time, θ t is the latent state vector, F t is the regression vector, G t is the transition matrix, and V t , W t are the variance matrices associated with observation and evolution errors, respectively. This structure allows dynamic capture of changes in trend, seasonality, and other unobserved components, with estimation based on the Kalman filter and sequential Bayesian methods [38].

3. Results

In this section, an exploratory analysis is presented for PM10 data from meteorological stations in Peru. Then, the time series models are used to obtain short-term forecasting of PM10.

3.1. Exploratory Analysis

Figure 2 shows the PM10 concentration trajectory of the monitoring station in Peru. The HCH station presented the highest PM10 concentration followed by the ATE station. On the other hand, the CDM station had the lowest daily PM10 concentration rates (Figure 3). The highest daily PM10 concentration value occurred in April 2018 at the HCH meteorological station. Figure 3 shows the boxplot graph for the hourly and daily PM10 data. It can be seen in this figure that both data sets have discrepant observations. Figure 4 shows the daily time series of PM10 and the boundaries of the boxplot. This figure shows that the outliers occur in different months of the year.
Table 1 provides the main statistical measures of daily PM10 concentration for meteorological stations considered in this study. It can be seen in this table that the HCH station has the highest average concentration of PM10 and also greater variability, and the ATE station has the lowest average PM10. Hourly temperature and PM10 data at each station analyzed did not present a correlation (Figure 5). The HCH station presented the highest correlation between stations, being approximately 0.4, still a low value. Figure 6a presents the spatial correlation between weather stations considered in Peru. This figure shows that the highest correlation (0.63) occurred between the SMP and ATE stations. On the other hand, the spatial correlation between monitoring stations for the temperature variable is greater than 0.70 (Figure 6b). Therefore, there is a low spatial correlation between some air quality monitoring stations for variable PM10 compared to the temperature variable.

3.2. Model Performance Evaluation for PM10 Forecasting in Lima

Table 2 presents the RMSE and sMAPE accuracy measures for the forecast models in each test set and the average. Based on the average values of the RMSE and sMAPE metrics, the Box–Jenkins model presents the best prediction accuracy for PM10 data of SMP station. In this station, the models present a symmetric mean absolute percentage error measure lower than 10%, except the DLM model. In Table 3, the LSTM model presents the best predictive capacity for the data from the HCH station. For the CRB station data, the TBATS model provides better prediction results (see Table 4). Table 5 shows that the SSA model has the best prediction results for the ATE station data.
Finally, it is worth pointing out that few studies have been carried out in Peru related to suspended particulate pollutants. The last study related to pollutants smaller than 10 microns by Espinoza [19] analyzed the spatial and temporal distribution of PM10 concentration in Metropolitan Lima in the period of 2015–2017 with results above quality standards; however, no specific spectral studies have been carried out in order to trace sufficient frequencies to understand pollution within atmospheric patterns. The superior performance of classical models such as ARIMA compared to LSTM neural networks raises a relevant discussion about the suitability of modeling tools to the characteristics of the data. Although neural networks are designed to capture nonlinear relationships and long-term dependencies, their effectiveness depends on appropriate hyperparameter settings, a large amount of training data, and complex patterns in the time series. In this study, the available data show seasonal patterns and more predictable dependencies, characteristics that classical models efficiently handle. This highlights that the simplicity and adaptability of classical models can be advantages in scenarios with data limitations or well-defined patterns.
Analysis of PM10 concentrations at different monitoring stations in Metropolitan Lima reveals significant spatial heterogeneity, with Huachipa (HCH) recording the highest levels. This finding may be related to specific emission sources, such as industries and heavy traffic, underlining the need for targeted local policies. Furthermore, the observed temporal fluctuations, influenced by factors such as seasons and meteorological changes, indicate the importance of considering spatiotemporal dynamics when planning interventions.
The results of this study partially align with international research highlighting the usefulness of advanced tools such as SSA and hybrid models to predict pollutants. However, the lower effectiveness of neural networks in this case highlights the importance of adapting models to local particularities. This includes not only adjusting the techniques used, but also expanding the database to improve predictive capacity. Likewise, a more robust integration of factors such as environmental policies, urban growth, and human behavior could enrich the models and improve mitigation strategies. In this sense, the study raises the need for an interdisciplinary approach to effectively address the challenges of air pollution in Metropolitan Lima and other similar cities.

3.3. Limitations and Future Work

Despite the progress made in modeling air pollution in Metropolitan Lima using statistical approaches and artificial neural networks, the study has some limitations that should be considered. First, the study used a small number of monitoring stations, which limits the robustness of the models and the spatial representativeness of the results. Furthermore, the restricted geographic coverage prevents a broader assessment of pollutant behavior throughout the metropolitan area. Finally, the exclusion of relevant meteorological variables such as humidity and precipitation could have affected the accuracy of the estimates, given that these conditions directly influence the dispersion and concentration of air pollutants. In order to improve the accuracy and applicability of air pollution models in Metropolitan Lima, it is recommended to expand the network of monitoring stations to achieve greater spatial representativeness and better geographic coverage. This would allow for more accurate capture of local variations in pollution levels. It is also suggested that additional meteorological variables, such as relative humidity, precipitation, and atmospheric pressure, be incorporated, as these can significantly influence pollutant dynamics. The inclusion of these factors could enhance the predictive capacity of the models, especially in approaches based on artificial neural networks. Finally, the possibility of applying deep learning techniques with more complex architectures and larger data sets is proposed to explore deeper nonlinear patterns and improve the generalizability of the results.

4. Conclusions

This study demonstrates that classical time series models, particularly ARIMA and SSA, achieve lower average forecast errors than LSTM across stations SMP, CRB, and ATE. This finding suggests that for data with seasonal patterns and relatively short time series, traditional models may be more efficient and robust. Although neural networks have the potential to capture more complex relationships and long-term dependencies, their performance may be limited by hyperparameter settings and intrinsic data characteristics.
The high PM10 concentrations at HCH station may be driven by nearby industrial sources and dense vehicular traffic, warranting targeted monitoring and policy intervention. These results have important implications for public health, since PM10 particles are directly linked to respiratory and cardiovascular diseases. In addition, the levels recorded at several stations exceed international air quality standards, highlighting the urgent need to implement pollution mitigation policies in Metropolitan Lima. The use of advanced tools such as singular spectral analysis (SSA) and neural networks, together with classical models, provides valuable insight for air pollutant prediction. However, the results underline the importance of adapting these approaches to local conditions and increasing temporal and spatial resolution of measurements to enhance model reliability. Future research could explore hybrid models that combine the best of classical and modern approaches, as well as incorporate exogenous variables such as wind speed, relative humidity, precipitation, and traffic intensity to capture pollutant transport and accumulation dynamics. Furthermore, the findings may extend to other scenarios, such as those addressed in [39,40,41,42]. This will strengthen environmental management strategies and reduce the impact of pollution on public health.

Author Contributions

Conceptualization, M.A.S.T., F.L.C.d.S., E.A.T.A., N.C.-B. and J.L.L.-G.; methodology, M.A.S.T., F.L.C.d.S., E.A.T.A., N.C.-B. and J.L.L.-G.; software, M.A.S.T., F.L.C.d.S. and E.A.T.A.; validation, N.C.-B.; formal analysis, F.L.C.d.S. and J.L.L.-G.; investigation, M.A.S.T.; resources, F.L.C.d.S. and J.L.L.-G.; data curation, F.L.C.d.S. and J.L.L.-G.; writing—original draft preparation, M.A.S.T., F.L.C.d.S., E.A.T.A., N.C.-B. and J.L.L.-G.; writing—review and editing, M.A.S.T., F.L.C.d.S., E.A.T.A., N.C.-B. and J.L.L.-G.; visualization, N.C.-B. and J.L.L.-G.; supervision, N.C.-B. and J.L.L.-G.; project administration, N.C.-B. and J.L.L.-G.; funding acquisition, E.A.T.A. and J.L.L.-G. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data sets are available in the repository https://www.senamhi.gob.pe/site/descarga-datos/, accessed on 15 January 2025.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Hoek, G.; Krishnan, R.M.; Beelen, R.; Peters, A.; Ostro, B.; Brunekreef, B.; Kaufman, J.D. Long-term air pollution exposure and cardio-respiratory mortality: A review. Environ. Health 2013, 12, 43. [Google Scholar] [CrossRef] [PubMed]
  2. Chen, J.; Hoek, G. Long-term exposure to PM and all-cause and cause-specific mortality: A systematic review and meta-analysis. Environ. Int. 2020, 143, 105974. [Google Scholar] [CrossRef] [PubMed]
  3. Sánchez-Balseca, J.; Pérez-Foguet, A. Spatio-temporal air pollution modelling using a compositional approach. Heliyon 2020, 6, e04794. [Google Scholar] [CrossRef]
  4. Mahmud, H.; Shobnom, K.; Ali, M.R.; Muntakim, N.; Kulsum, U.; Baroi, D.S.; Ahmed, Z.; Rahman, M.M.; Hassan, M.Z. Micro-environmental dynamics of particulate (PM2.5 and PM10) air pollution in Rajshahi City: A spatiotemporal analysis. Manag. Environ. Qual. Int. J. 2024, 35, 1773–1797. [Google Scholar] [CrossRef]
  5. Akdi, Y.; Gölveren, E.; Ünlü, K.D.; Yücel, M.E. Modeling and forecasting of monthly PM2.5 emission of Paris by periodogram-based time series methodology. Environ. Monit. Assess. 2021, 193, 622. [Google Scholar] [CrossRef]
  6. Leng, S.; Gao, X.; Pei, T.; Zhang, G.; Chen, L.; Chen, X.; He, C.; He, D.; Li, X.; Lin, C.; et al. Tempo-Spatial Processes and Modelling of Environmental Pollutants. In The Geographical Sciences During 1986–2015: From the Classics to the Frontiers; Springer: Singapore, 2017; pp. 367–390. [Google Scholar]
  7. Thompson, J.E. Airborne particulate matter: Human exposure and health effects. J. Occup. Environ. Med. 2018, 60, 392–423. [Google Scholar] [CrossRef]
  8. Cekim, H.O. Forecasting PM10 concentrations using time series models: A case of the most polluted cities in Turkey. Environ. Sci. Pollut. Res. 2020, 27, 25612–25624. [Google Scholar] [CrossRef] [PubMed]
  9. Performance Analysis of Machine Learning Singular Spectrum Analysis for Forecasting Air Contamination. In Proceedings of the 2023 International Conference on System, Computation, Automation and Networking (ICSCAN), Puducherry, India, 17–18 November 2023. [CrossRef]
  10. Kumar, U. An integrated SSA-ARIMA approach to make multiple day ahead forecasts for the daily maximum ambient O3 concentration. Aerosol Air Qual. Res. 2015, 15, 208–219. [Google Scholar] [CrossRef]
  11. Cai, J.; Gu, C.; Fang, K.; Wang, L.; Lv, M. Long-term PM2.5 Concentration Prediction Using Hybrid Deep Learning Model. In Proceedings of the 2023 4th International Conference on Computers and Artificial Intelligence Technology (CAIT), Macau, Macao, 13–15 December 2023. [Google Scholar] [CrossRef]
  12. Bodor, Z.; Bodor, K.; Keresztesi, Á.; Szép, R. Major air pollutants seasonal variation analysis and long-range transport of PM10 in an urban environment with specific climate condition in Transylvania (Romania). Environ. Sci. Pollut. Res. 2020, 27, 38181–38199. [Google Scholar] [CrossRef]
  13. Shikhovtsev, M.Y.; Obolkin, V.; Khodzher, T.; Molozhnikova, Y.V. Variability of the ground concentration of particulate matter PM1–PM10 in the air basin of the Southern Baikal Region. Atmos. Ocean. Opt. 2023, 36, 655–662. [Google Scholar] [CrossRef]
  14. Li, B.; Rodell, M. Spatial variability and its scale dependency of observed and modeled soil moisture over different climate regions. Hydrol. Earth Syst. Sci. 2013, 17, 1177–1188. [Google Scholar] [CrossRef]
  15. Rojas, I.; Valenzuela, O.; Rojas, F.; Guillen, A.; Herrera, L.; Pomares, H.; Marquez, L.; Pasadas, M. Soft-computing techniques and ARMA model for time series prediction. Neurocomputing 2008, 71, 519–537. [Google Scholar] [CrossRef]
  16. Aguilar, S.; Castro Souza, R.; Pessanha, J.F.; Cyrino Oliveira, F.L. Hybrid methodology for modeling short-term wind power generation using conditional Kernel density estimation and singular spectrum analysis. Dyna 2017, 84, 145–154. [Google Scholar] [CrossRef]
  17. López-Gonzales, J.L.; Salas, R.; Velandia, D.; Canas Rodrigues, P. Air quality prediction based on singular spectrum analysis and artificial neural networks. Entropy 2024, 26, 1062. [Google Scholar] [CrossRef]
  18. Espinosa, F.; Bartolomé, A.B.; Hernández, P.V.; Rodriguez-Sanchez, M. Contribution of singular spectral analysis to forecasting and anomalies detection of indoors air quality. Sensors 2022, 22, 3054. [Google Scholar] [CrossRef]
  19. Espinoza Guillen, J.A. Evaluación Espacial y Temporal del Material Particulado PM10 y PM2.5 en Lima Metropolitana para el Periodo 2015–2017. Bachelor’s Thesis, Universidad Nacional Agraria La Molina, Lima, Peru, 2018. [Google Scholar]
  20. Silva, J.; Rojas, J.; Norabuena, M.; Molina, C.; Toro, R.A.; Leiva-Guzmán, M.A. Particulate matter levels in a South American megacity: The metropolitan area of Lima-Callao, Peru. Environ. Monit. Assess. 2017, 189, 635. [Google Scholar] [CrossRef]
  21. Aceves-Fernandez, M.A.; Pedraza-Ortega, J.C.; Sotomayor-Olmedo, A.; Ramos-Arreguín, J.M.; Vargas-Soto, J.E.; Tovar-Arriaga, S. Analysis of key features of non-linear behaviour using recurrence quantification. Case study: Urban Airborne pollution at Mexico city. Environ. Model. Assess. 2014, 19, 139–152. [Google Scholar] [CrossRef]
  22. Box, G.; Jenkins, G. Time Series Analysis: Forecasting and Control; Holden-Day series in time series analysis and digital processing; Holden-Day: Clarendon, Australia, 1970. [Google Scholar]
  23. Hyndman, R.J.; Khandakar, Y. Automatic time series forecasting: The forecast package for R. J. Stat. Softw. 2008, 27, 1–22. [Google Scholar] [CrossRef]
  24. Brown, R.G. Statistical Forecasting for Inventory Control; McGraw-Hill: New York, NY, USA, 1959. [Google Scholar]
  25. Holt, C.C. Forecasting seasonals and trends by exponentially weighted moving averages. Int. J. Forecast. 2004, 20, 5–10. [Google Scholar] [CrossRef]
  26. Winters, P.R. Forecasting sales by exponentially weighted moving averages. Manag. Sci. 1960, 6, 324–342. [Google Scholar] [CrossRef]
  27. Hyndman, R.; Athanasopoulos, G. Forecasting: Principles and Practice, 2nd ed.; OTexts: Melbourne, Australia, 2018. [Google Scholar]
  28. Hyndman, R.; Koehler, A.; Ord, K.; Snyder, R. Forecasting with Exponential Smoothing. The State Space Approach; Springer: Berlin/Heidelberg, Germany, 2008. [Google Scholar] [CrossRef]
  29. Zhang, G.P.; Patuwo, B.E.; Hu, M.Y. Forecasting with artificial neural networks: The state of the art. Int. J. Forecast. 1998, 14, 35–62. [Google Scholar] [CrossRef]
  30. Crone, S.F.; Kourentzes, N. Feature selection for time series prediction—A combined filter and wrapper approach for neural networks. Neurocomputing 2010, 73, 1923–1936. [Google Scholar] [CrossRef]
  31. Goodfellow, I.; Bengio, Y.; Courville, A. Deep Learning; MIT Press: Cambridge, MA, USA, 2016. [Google Scholar]
  32. Glorot, X.; Bordes, A.; Bengio, Y. Deep sparse rectifier neural networks. In Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, Fort Lauderdale, FL, USA, 11–13 April 2011; Volume 15, pp. 315–323. [Google Scholar]
  33. Hornik, K.; Stinchcombe, M.; White, H. Multilayer feedforward networks are universal approximators. Neural Netw. 1989, 2, 359–366. [Google Scholar] [CrossRef]
  34. Hochreiter, S.; Schmidhuber, J. Long short-term memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef]
  35. Gers, F.A.; Schmidhuber, J.; Cummins, F. Learning to forget: Continual prediction with LSTM. Neural Comput. 2002, 12, 2451–2471. [Google Scholar] [CrossRef] [PubMed]
  36. Greff, K.; Srivastava, R.K.; Koutník, J.; Steunebrink, B.R.; Schmidhuber, J. LSTM: A search space odyssey. IEEE Trans. Neural Networks Learn. Syst. 2017, 28, 2222–2232. [Google Scholar] [CrossRef]
  37. West, M.; Harrison, J. Bayesian Forecasting and Dynamic Models, 2nd ed.; Springer: Berlin/Heidelberg, Germany, 1997. [Google Scholar]
  38. Petris, G. An R Package for Dynamic Linear Models. J. Stat. Softw. 2010, 36, 1–16. [Google Scholar] [CrossRef]
  39. Gonzales, S.M.; Iftikhar, H.; López-Gonzales, J.L. Analysis and forecasting of electricity prices using an improved time series ensemble approach: An application to the Peruvian electricity market. Aims Math 2024, 9, 21952–21971. [Google Scholar] [CrossRef]
  40. Iftikhar, H.; Gonzales, S.M.; Zywiołek, J.; López-Gonzales, J.L. Electricity demand forecasting using a novel time series ensemble technique. IEEE Access 2024, 12, 88963–88975. [Google Scholar] [CrossRef]
  41. da Silva, K.L.S.; López-Gonzales, J.L.; Turpo-Chaparro, J.E.; Tocto-Cano, E.; Rodrigues, P.C. Spatio-temporal visualization and forecasting of PM10 in the Brazilian state of Minas Gerais. Sci. Rep. 2023, 13, 3269. [Google Scholar] [CrossRef]
  42. Cruz, A.R.H.D.L.; Ayuque, R.F.O.; Cruz, R.W.H.D.L.; Lopez-Gonzales, J.L.; Gioda, A. Air quality biomonitoring of trace elements in the metropolitan area of Huancayo, Peru using transplanted Tillandsia capillaris as a biomonitor. An. Acad. Bras. Ciências 2020, 92, e20180813. [Google Scholar] [CrossRef] [PubMed]
Figure 1. Map with study area and locations of the Lima air quality monitoring stations: Ate (ATE), Campo de Marte (CDM), Carabayllo (CRB), Huachipa (HCH), and San Martin de Porres (SMP). Scale 1 cm/7 km.
Figure 1. Map with study area and locations of the Lima air quality monitoring stations: Ate (ATE), Campo de Marte (CDM), Carabayllo (CRB), Huachipa (HCH), and San Martin de Porres (SMP). Scale 1 cm/7 km.
Environments 12 00196 g001
Figure 2. Average daily concentration of PM10 between 2017 and 2018 for monitoring station in Peru.
Figure 2. Average daily concentration of PM10 between 2017 and 2018 for monitoring station in Peru.
Environments 12 00196 g002
Figure 3. Boxplots of hourly (a) and daily (b) PM10 concentrations at five monitoring stations located in Peru. This figure presents a comparative analysis of PM10 levels recorded at stations SMP, HCH, CRB, CDM, and ATE. In panel (a), the boxplots reflect the distribution of hourly concentrations, revealing high temporal variability and the presence of numerous outliers, especially at HCH and SMP, suggesting specific episodes of intense pollution. In panel (b), the data aggregated at the daily level show a more contained dispersion, although high maximum values are still observed at some stations, again highlighting HCH and SMP as critical areas. This type of visualization makes it possible to identify differences in pollution patterns between stations, which is key to establishing local air quality management strategies.
Figure 3. Boxplots of hourly (a) and daily (b) PM10 concentrations at five monitoring stations located in Peru. This figure presents a comparative analysis of PM10 levels recorded at stations SMP, HCH, CRB, CDM, and ATE. In panel (a), the boxplots reflect the distribution of hourly concentrations, revealing high temporal variability and the presence of numerous outliers, especially at HCH and SMP, suggesting specific episodes of intense pollution. In panel (b), the data aggregated at the daily level show a more contained dispersion, although high maximum values are still observed at some stations, again highlighting HCH and SMP as critical areas. This type of visualization makes it possible to identify differences in pollution patterns between stations, which is key to establishing local air quality management strategies.
Environments 12 00196 g003
Figure 4. Daily PM10 time series with boxplot boundaries.
Figure 4. Daily PM10 time series with boxplot boundaries.
Environments 12 00196 g004
Figure 5. Scatterplot between temperature and PM10 at each meteorological station (SMP, HCH, CRB, CDM, ATE) in Peru.
Figure 5. Scatterplot between temperature and PM10 at each meteorological station (SMP, HCH, CRB, CDM, ATE) in Peru.
Environments 12 00196 g005
Figure 6. Spatial correlation between weather stations for PM10 (a) and temperature (b) in Peru.
Figure 6. Spatial correlation between weather stations for PM10 (a) and temperature (b) in Peru.
Environments 12 00196 g006
Table 1. Statistical measures for daily PM 10 ( μ g · m 3 ) concentration from weather stations.
Table 1. Statistical measures for daily PM 10 ( μ g · m 3 ) concentration from weather stations.
MeasuresSMPHCHCRBCDMATE
Minimum26.0122.2217.8425.1241.26
1st Qu.72.8986.2139.3844.3599.59
Median85.29126.6146.4051.46118.19
Mean86.05130.0348.6952.30121.56
3rd Qu.97.69166.5654.6357.51138.72
Maximum161.97435.15128.44136.16280.72
Standard deviation20.8858.6314.5012.2533.29
Table 2. Forecasting performance of the nine models applied to the PM10 ( μ g · m 3 ) data of the SMP station via RMSE and sMAPE metrics.
Table 2. Forecasting performance of the nine models applied to the PM10 ( μ g · m 3 ) data of the SMP station via RMSE and sMAPE metrics.
MetricSetsETSARIMATBATSNNARMLPDLMSSANAÏVELSTM
RMSE18.07312.1319.9918.80410.91439.7308.08115.6149.683
211.86213.69112.28413.87313.46041.70113.23212.47012.661
315.95115.12314.93019.77117.94113.32018.41215.57218.391
418.14116.36216.95422.17320.87413.70121.70117.23220.704
520.41015.96217.77125.40323.17138.76223.76122.96224.494
624.39015.51118.87228.31426.59182.10324.87237.53125.174
723.80415.34118.06326.94224.39415.91125.82221.82321.444
821.77315.00117.47324.88123.64027.77126.22315.38324.580
916.91312.88414.76318.96120.19066.44425.32310.75124.394
1013.83314.05215.02014.69120.08448.24320.80114.05423.583
Average17.51414.60315.61120.38320.12038.77420.82318.34320.511
sMAPE (%)11.8712.8012.0302.1932.9549.1841.8534.1502.621
22.7143.6432.9513.0943.09117.1703.1132.9812.964
34.1304.2133.9814.9504.4443.6914.6334.0804.701
45.2405.0415.0346.2605.6914.1945.8335.1215.611
56.2345.0815.5437.5736.90110.3227.0326.8907.292
67.5605.0226.0918.8638.36318.4407.62111.1527.802
77.4845.0445.7318.4407.6034.9548.0116.8536.734
86.9214.9825.6528.0117.58010.0918.0015.0927.804
95.5644.4234.9816.2326.51233.8847.9203.0717.743
104.7324.5544.9104.8916.27430.0326.7424.4747.473
Average5.2424.4804.6936.0545.94214.1906.0715.3826.074
Table 3. Forecasting performance of the nine models applied to the PM10 ( μ g · m 3 ) data of the HCH station via RMSE and sMAPE metrics.
Table 3. Forecasting performance of the nine models applied to the PM10 ( μ g · m 3 ) data of the HCH station via RMSE and sMAPE metrics.
MetricSetsETSARIMATBATSNNARMLPDLMSSANAÏVELSTM
RMSE175.22441.61437.84344.83144.08441.33247.85456.38421.922
261.50442.79139.08446.53140.50462.16358.98336.13221.102
343.87337.37434.54136.79330.84382.41357.58122.63120.662
419.68323.66420.93436.60315.354167.44243.67433.82415.742
524.71226.95425.17341.80417.40318.76131.74216.42120.302
624.45122.96422.12350.73416.93117.26218.21417.16116.104
742.72130.09328.90449.70421.15398.88214.40232.39117.152
853.26433.73332.88465.84229.55377.75311.16339.01317.994
926.48222.54321.04323.40417.013107.3839.19412.6928.743
1025.84324.07223.39236.62418.89411.32413.32112.40311.953
Average39.77330.58228.59443.28125.17368.47230.61327.90217.161
sMAPE (%)19.9635.7615.2146.1235.9425.4046.5647.7413.102
28.7246.1835.6126.6215.70111.8038.4435.1012.924
36.4535.5845.1215.4624.51319.2718.3743.5832.944
42.9633.1032.9434.6632.31239.5836.4725.7542.383
53.8743.9323.7325.6142.9122.9814.7212.3342.591
64.0713.3933.5216.9022.5212.5532.6132.4232.324
76.4514.5534.5137.5413.39412.2842.3545.0342.302
88.3745.5425.4519.8414.93111.1641.4716.4632.953
93.9423.6043.3923.4632.86131.1841.1941.9821.371
104.0213.8743.7315.3812.9841.7711.7512.1831.532
Average5.8814.5534.3246.1613.81413.8044.3934.2612.442
Table 4. Forecasting performance of the nine models applied to the PM10 ( μ g · m 3 ) data of the CRB station via RMSE and sMAPE metrics.
Table 4. Forecasting performance of the nine models applied to the PM10 ( μ g · m 3 ) data of the CRB station via RMSE and sMAPE metrics.
ETSARIMATBATSNNARMLPDLMSSANAÏVELSTM
RMSE15.6145.2945.3027.4017.90420.1045.6719.6317.751
26.3026.6426.5047.8147.57440.2826.6418.9119.224
37.1217.0726.8849.4319.1028.1347.7547.10211.162
47.2847.0426.84110.4219.34412.4917.7547.52112.614
57.7447.2626.88211.30410.02412.1948.2928.21412.284
610.3128.8728.57413.27412.65438.3248.86116.02114.074
79.5717.9626.43111.93411.4648.8619.1948.11213.574
89.1818.1927.85412.11411.7646.9729.5048.04215.804
94.3324.1823.5046.9646.87445.4348.2148.40111.964
104.3845.4245.0949.2948.76411.7416.9345.54212.574
Average7.1816.7926.3849.9949.54420.4547.8848.75412.104
SMAPE (%)12.8542.1342.1143.8644.0249.3742.6614.7413.961
22.4543.4443.2143.3143.09435.3942.9915.2644.484
33.4043.6943.6044.4944.1544.3543.5843.8845.454
43.5343.5343.4845.2844.6245.9443.6543.4846.284
53.9543.8233.6546.1645.2946.4024.1914.2516.463
65.7525.0424.8547.5437.17116.3144.7548.8747.874
75.3344.3223.6046.7046.4215.5424.9534.3517.653
85.2614.6414.3147.1326.9543.8515.0544.5018.924
92.4442.2221.9743.9413.84439.8544.5445.7416.891
102.6043.0622.9045.0024.3919.1413.9643.2917.221
Average3.7643.5913.3745.3414.99413.6144.0314.8416.523
Table 5. Forecasting performance of the nine models applied to the PM10 ( μ g · m 3 ) data of the ATE station via RMSE and sMAPE metrics.
Table 5. Forecasting performance of the nine models applied to the PM10 ( μ g · m 3 ) data of the ATE station via RMSE and sMAPE metrics.
ETSARIMATBATSNNARMLPDLMSSANAÏVELSTM
RMSE113.83113.78214.80212.97312.32115.75115.47412.30112.622
212.88113.23115.25212.76312.61231.47115.64214.32212.992
319.19117.41216.02318.60116.63429.03115.97417.69216.041
425.51122.16119.13223.15121.72132.96117.30223.60219.822
520.73318.92218.99119.45320.12360.59316.25220.95118.602
621.41219.35218.25119.77119.97325.04215.79119.18321.363
722.85422.07321.51222.80321.89627.72320.55322.22223.534
826.20223.58422.76323.60226.193101.28221.36437.99321.614
923.11420.40120.43219.65321.25462.06420.90321.14421.902
1020.71219.26419.77919.88319.585107.77421.86331.86521.674
Average20.64319.02418.69319.26519.23549.37518.11422.12419.013
SMAPE (%)12.4032.2622.3242.0831.8832.4222.1721.8731.952
22.1352.0642.3931.9652.0136.1432.1972.1641.902
33.3423.0422.5533.0862.7134.6742.4853.0522.594
44.8024.1933.3734.3144.0255.8933.0914.5063.484
53.4033.1043.4223.2133.43217.7822.9034.1133.004
63.8123.4013.1043.5643.5624.6132.6823.3243.903
74.3624.1233.8734.2144.0835.1623.6134.1464.553
84.8644.5214.4244.6135.01315.0743.7366.4434.071
94.4233.7233.7923.6744.09314.2943.8853.9444.092
103.7743.6023.6733.7243.56433.7424.0135.8033.963
Average3.7333.4043.2953.4433.44110.9833.0763.9333.352
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Solis Teran, M.A.; Leite Coelho da Silva, F.; Torres Armas, E.A.; Carbo-Bustinza, N.; López-Gonzales, J.L. Modeling Air Pollution in Metropolitan Lima: A Statistical and Artificial Neural Network Approach. Environments 2025, 12, 196. https://doi.org/10.3390/environments12060196

AMA Style

Solis Teran MA, Leite Coelho da Silva F, Torres Armas EA, Carbo-Bustinza N, López-Gonzales JL. Modeling Air Pollution in Metropolitan Lima: A Statistical and Artificial Neural Network Approach. Environments. 2025; 12(6):196. https://doi.org/10.3390/environments12060196

Chicago/Turabian Style

Solis Teran, Miguel Angel, Felipe Leite Coelho da Silva, Elías A. Torres Armas, Natalí Carbo-Bustinza, and Javier Linkolk López-Gonzales. 2025. "Modeling Air Pollution in Metropolitan Lima: A Statistical and Artificial Neural Network Approach" Environments 12, no. 6: 196. https://doi.org/10.3390/environments12060196

APA Style

Solis Teran, M. A., Leite Coelho da Silva, F., Torres Armas, E. A., Carbo-Bustinza, N., & López-Gonzales, J. L. (2025). Modeling Air Pollution in Metropolitan Lima: A Statistical and Artificial Neural Network Approach. Environments, 12(6), 196. https://doi.org/10.3390/environments12060196

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop