Evaluating Machine Learning Models for Particulate Matter Prediction Under Climate Change Scenarios in Brazilian Capitals

Bonifácio, Alicia da Silva; Tavella, Ronan Adler; Brum, Rodrigo de Lima; Silveira, Gustavo de Oliveira; Fernandes, Ronabson Cardoso; Scursone, Gabriel Fuscald; Machado, Ricardo Arend; Adamatti, Diana Francisca; da Silva Júnior, Flavio Manoel Rodrigues

doi:10.3390/atmos16091052

Open AccessArticle

Evaluating Machine Learning Models for Particulate Matter Prediction Under Climate Change Scenarios in Brazilian Capitals

by

Alicia da Silva Bonifácio

^1,*

,

Ronan Adler Tavella

^2,3,*

,

Rodrigo de Lima Brum

¹

,

Gustavo de Oliveira Silveira

¹

,

Ronabson Cardoso Fernandes

¹,

Gabriel Fuscald Scursone

¹

,

Ricardo Arend Machado

⁴

,

Diana Francisca Adamatti

⁴ and

Flavio Manoel Rodrigues da Silva Júnior

^5,*

¹

Faculty of Medicine, Federal University of Rio Grande, Rio Grande 96200-190, Brazil

²

Institute of Environmental, Chemical and Pharmaceutical Sciences, Federal University of São Paulo, Diadema 09972-270, Brazil

³

ARIES, Antimicrobial Resistance Institute of São Paulo, São Paulo 04039-001, Brazil

⁴

Center for Computational Science, University of Rio Grande, Rio Grande 996201-900, Brazil

⁵

Institute of Biological and Health Sciences, Federal University of Alagoas, Maceió 57073-620, Brazil

^*

Authors to whom correspondence should be addressed.

Atmosphere 2025, 16(9), 1052; https://doi.org/10.3390/atmos16091052

Submission received: 31 July 2025 / Revised: 29 August 2025 / Accepted: 3 September 2025 / Published: 5 September 2025

(This article belongs to the Special Issue Modeling and Monitoring of Air Quality: From Data to Predictions)

Download

Browse Figures

Versions Notes

Abstract

Air pollution, particularly particulate matter (PM₁, PM_2.5, and PM₁₀), poses a significant environmental health risk globally. This study evaluates the predictive performance of three machine learning algorithms, Support Vector Machine (SVM), K-Nearest Neighbors (KNN), and Random Forest (RF), for forecasting particulate matter concentrations in four Brazilian cities (Porto Alegre, Recife, Goiânia, and Belém), which share similar demographic and urbanization characteristics but differ in geographic and climatic conditions. Using data from the Copernicus Atmosphere Monitoring Service, daily concentrations of PM₁, PM_2.5, and PM₁₀ were modeled based on meteorological variables, including air temperature, relative humidity, wind speed, atmospheric pressure, and accumulated precipitation. The models were tested under two climate change scenarios (+2 °C and +4 °C temperature increases). The results indicate that RF consistently outperformed the other models, achieving low RMSE values, around 0.3 µg/m³, across all cities, regardless of their geographic and climatic differences. KNN showed stable performance under moderate temperature increases (+2 °C) but exhibited higher errors under more extreme warming, while SVM demonstrated higher sensitivity to temperature changes, leading to greater variability in bivariate contexts. However, in multivariate contexts, SVM adjusted better, improving its predictive performance by accounting for the combined influence of multiple meteorological variables. These findings underscore the importance of selecting suitable machine learning models, with RF proving to be the most robust approach for particulate matter prediction across diverse environmental contexts. This study contributes valuable insights for the development of region-specific air quality management strategies in the face of climate change.

Keywords:

air pollution; air quality; Brazil; climate change; machine learning; meteorological variables; particulate matter; predictive modeling

1. Introduction

Air pollution poses a severe threat to public health and economies worldwide [1]. Particulate matter (PM₁₀, PM_2.5 and PM₁—particulate matter with a diameter of 10, 2.5, and 1 micrometer or less) in particular is a leading environmental risk factor, contributing to, on the order of, 8.1 million premature deaths annually on a global scale [2,3]. The burden is especially high in low- and middle-income regions, and the economic costs are staggering—recent estimates value the global welfare losses due to air pollution at roughly USD 8 trillion in 2019 (≈6% of global GDP) [4,5]. Brazil is no exception to this trend [6]. Air pollution kills thousands of Brazilians each year, with one analysis attributing approximately 326,000 deaths in Brazil to ambient pollution over just the 2019–2021 period [7,8]. Beyond mortality, exposure to elevated particulate levels leads to increased hospitalizations and chronic health conditions, imposing significant public health and productivity costs nationally [9,10,11,12,13,14,15,16,17,18,19].

Climate change is expected to further complicate air quality management by altering the atmospheric processes that govern pollutant concentrations [20,21,22,23]. Rising temperatures and shifting meteorological patterns can influence the formation, dispersion, and removal of particulate matter [24,25,26,27,28]. For example, a warmer climate is likely to increase the frequency of heatwaves and droughts, conditions that exacerbate wildfires and dust storms, in turn elevating ambient PM levels [20,24]. Likewise, changes in atmospheric circulation can lead to pollutant accumulation episodes, while altered precipitation regimes affect the wet deposition of aerosols [24,27]. The net effect of climate change on particulate pollution is complex and region-specific, as multiple factors can act in opposing directions [21,29,30]. Nevertheless, scientific assessments consistently warn that, in many regions, climate-driven changes in weather are poised to worsen air quality by increasing concentrations of particulate matter and other pollutants [20,21]. This linkage between climate and air pollution dynamics motivates the need to develop reliable predictive tools that can account for evolving meteorological conditions.

In recent years, data-driven approaches, particularly machine learning (ML) techniques, have become increasingly important for modeling and forecasting air quality under changing climatic conditions [31,32,33,34,35,36,37,38,39]. Traditional deterministic models require many simplifying assumptions and often struggle to generalize across different geographies, especially as the domain size or complexity grows [31,37,39]. ML offers a complementary approach by learning patterns directly from historical environmental data, capturing nonlinear relationships between meteorological variables, emissions, and pollutant concentrations without the need to explicitly model every atmospheric process [38,39]. A wide range of ML methods have been applied to particulate matter prediction, from linear and polynomial regressions to deep neural networks [31,32,33,34,35,36,37,38,39,40], often achieving higher accuracy than conventional techniques when the data are sufficiently rich [34,35,40]. Among these, three algorithms stand out for their frequent use in environmental modeling: Support Vector Machines (SVM), K-Nearest Neighbors (KNN), and Random Forests (RF) [35,40]. These models have gained popularity in air pollution studies due to their robust predictive performance and ability to handle the inherent complexity and noise in atmospheric datasets. However, because each algorithm learns in a fundamentally different way, their predictive capabilities and sensitivities may vary under future climate scenarios, an important consideration that warrants detailed investigation.

SVM, KNN, and RF represent diverse machine learning architectures, each with distinct strengths that can influence predictive performance in air quality applications [33,35,37,40]. SVM is a kernel-based learning method that finds an optimal hyperplane to separate (or regress) data in a transformed feature space. This allows SVMs to model complex nonlinear relationships in air quality data with relatively high accuracy, and they have been successfully used to forecast pollutant concentrations under various conditions [37,40,41,42,43]. SVMs are generally resistant to overfitting when appropriately regularized and can handle noisy input, though their performance depends on choosing suitable kernel functions and parameters [40]. KNN, in contrast, is an instance-based algorithm that makes predictions based on the nearest observed cases in the feature space [40,44,45]. Its appeal lies in its simplicity and interpretability; for example, KNN can estimate tomorrow’s PMs by looking at days with similar meteorological conditions, and it requires no explicit training phase [37,40,44,45]. However, KNN can become computationally expensive as the dataset grows, and its accuracy may degrade with high-dimensional inputs or if the chosen number of neighbors (K) is suboptimal [37,40]. RF is an ensemble tree-based method that constructs a “forest” of decision trees and averages their outputs [34,35,37,40]. By aggregating many de-correlated trees, RF tends to achieve strong predictive accuracy and is less sensitive to outlier noise in the data. It can naturally model complex interactions and provides measures of variable importance, which is advantageous in environmental interpretation [35,38,40]. Nonetheless, RF models can occasionally be too flexible and, if not properly tuned, they may overfit subtle fluctuations in training data or exhibit instability with small changes in the input set [34,35,37,38,40]. Given these differences, it is not immediately clear which algorithm will perform best for projecting particulate matter under novel climate conditions; each may respond differently to the non-stationary relationships induced by a warming climate. Comparative evaluations of multiple ML models in this context remain limited in the literature, underscoring the need for systematic analysis.

Brazil provides a compelling setting to investigate these predictive approaches, due to its diverse climates and varied drivers of air pollution. The country’s vast territory encompasses distinct meteorological regimes, from humid tropical rainforests in the Amazon, to semi-arid interiors, to densely populated temperate zones in the southeast, each of which influences pollutant behavior in different ways [30]. Along with it, multiple drivers of PM are present in the Brazilian setting, with each region stemming from vehicular traffic, industrial emissions, biomass burning and wildfires, and natural emissions, individually or in conjunction [6,8,46]. Such heterogeneity in pollution sources and climate conditions means that an algorithm tuned to one city or region may not directly generalize to another. It also highlights why evaluating predictive models in the Brazilian context is particularly important, information gained can guide region-specific air quality management strategies in a country facing both rapid urbanization and escalating environmental change.

In this study, we address the above knowledge gaps by assessing and comparing the performance of SVM, KNN, and RF models for predicting particulate matter concentrations under future climate change scenarios in four distinct Brazilian cities. To our knowledge, this is the first work to systematically examine these machine learning approaches for air quality projection in Brazil’s diverse climatic settings under warming scenarios. The main goal of the research is to determine how each model responds to climate-driven shifts in meteorological inputs and to identify which method (or methods) offer the most reliable and sensitive predictions of PM₁₀, PM_2.5, and PM₁ levels as temperatures rise.

2. Materials and Methods

2.1. Study Area

For this investigation, four Brazilian capitals, Porto Alegre, Recife, Goiânia, and Belém, were selected due to their similar urban contexts, characterized by comparable population sizes, population densities, and a high degree of urbanization. These cities are all marked by significant challenges in urban management, environmental policy, and air quality control, making them ideal candidates for a comparative analysis of particulate matter concentrations. However, despite these common urban characteristics, the cities differ substantially in their geographic and climatic contexts (Figure 1). These geographic and climatic differences play an important role in shaping the meteorological patterns that influence the dispersion and behavior of air pollutants. Therefore, these cities present a unique opportunity to assess how diverse climatic conditions affect particulate matter concentrations and, consequently, the performance of machine learning models in predicting air quality under varying environmental scenarios.

Porto Alegre, the capital of the state of Rio Grande do Sul, has approximately 1.33 million inhabitants within its municipal boundaries and around 4.3 million in its metropolitan region, making it the fifth most populous in the country [47]. According to the Köppen–Geiger climate classification [48], it has a humid subtropical climate (Cfa) and is characterized by well-distributed rainfall throughout the year, with no defined dry season. Located south of the Tropic of Capricorn and in a continental area, the city exhibits significant thermal variability.

Recife, the capital of the state of Pernambuco, has about 1.49 million inhabitants and 3.73 million in its metropolitan region, ranking as the seventh largest in Brazil [49]. With a territorial area of approximately 218.84 km², the city has a high population density (6803.60 inhabitants/km²), which poses considerable challenges for environmental and urban management [50,51]. The prevailing climate is tropical with a dry season (Aw), according to the Köppen–Geiger classification [48], with average annual temperatures around 23.2 °C. The rainy season occurs from October to April, while the dry season extends from May to September, with an average annual precipitation of approximately 1300 mm.

Goiânia, the capital of the state of Goiás, was founded in 1933 and is one of the most recent capitals in the country. According to the Brazilian Institute of Geography and Statistics estimates, the city has approximately 1.49 million inhabitants [52] and is located in the interior of Brazil, far from the coastal influence that affects the other cities in this study. This city is located in a region with a tropical climate typical of Central Brazil, presenting a climate classified as tropical with a dry season (Aw) according to Köppen–Geiger [48], with average temperatures above 18 °C in all months of the year and a dry period lasting 4 to 5 months [53]. The region’s climatic characteristics are shaped by atmospheric systems of equatorial and tropical origin, as well as occasional incursions of polar air masses, resulting in two well-defined periods: dry and rainy seasons [54].

Belém, the capital of the state of Pará, has an estimated population of 1.33 million inhabitants [55]. The city’s climate is classified as tropical rainforest (Af) [48], according to Köppen–Geiger, being strongly influenced by the Intertropical Convergence Zone (ITCZ). There is no true dry month, as all months have an average precipitation above 60 mm. However, there is a distinction between the wetter season (December to May) and the less rainy season (June to November). Average temperatures remain stable throughout the year, around 26.5 °C. The predominant vegetation is humid tropical forest, typical of the Amazon region [56].

2.2. Sampling Procedure and Monitoring Period

The data used in this study were obtained from the Copernicus Atmosphere Monitoring Service (CAMS), an Earth observation program of the European Union coordinated by the European Centre for Medium-Range Weather Forecasts (ECMWF). CAMS provides operational air quality forecasts with high spatial resolution (0.1°) and vertical resolution (up to seven levels, from the surface to 5000 m) through an atmospheric modeling system composed of nine distinct chemistry-transport models [57]. These models represent the state of the art in multiscale simulations of atmospheric composition and operate jointly as an ensemble, with the median of the forecasts demonstrating superior performance compared to individual models [58].

The extracted information covers the period from 2019 to 2023 and comprises daily time series with spatial resolution appropriate for urban analyses, including both meteorological and air quality variables. To ensure data quality control, all extractions were performed at regular intervals by trained personnel. Data concentrations were reported in µg/m³ and organized by day, year, and season for analysis. A total of 1826 daily records were collected for each city, corresponding to the full study period, with no missing data reported. Outliers were identified through visual inspection and statistical checks, and values exceeding three standard deviations from the mean were verified and, when confirmed as inconsistent, replaced with the median value of the respective week to maintain data integrity. In all cities, the proportion of values replaced using this procedure did not exceed 0.3% of the dataset. The independent variables analyzed were as follows: mean air temperature, relative humidity, wind speed, atmospheric pressure, and accumulated precipitation, selected for their importance in modulating the dispersion and chemical transformation processes of atmospheric pollutants. The dependent variables considered were the daily concentrations of particulate matter, PM₁, PM_2.5, and PM₁₀.

Although CAMS data are model-based, numerous studies have evaluated its reliability and uncertainty in both global and regional contexts. Inness et al. [59] assessed the global CAMS reanalysis against AERONET observations of aerosol optical depth (AOD), reporting a mean bias of −0.013 ± 0.087 for the South America region. In Brazil, Santos et al. [60] validated PM₁₀ concentrations from CAMS against surface measurements in the south of Brazil, showing strong agreement (R² = 0.831; RMSE = 2.256). Similarly, Paiva et al. [61] confirmed that CAMS adequately represents near-surface air quality conditions in the metropolitan area of São Paulo, and Júnior et al. [62] demonstrated high correlations and low errors for AOD between CAMS and AERONET data across Brazil, capturing both seasonal and interannual variability. Furthermore, Mejía et al. [63] highlighted the robust spatiotemporal representation of pollutants in Ecuador from satellite data, including during critical periods such as the COVID-19 pandemic. Tavella et al. [64] also demonstrated the consistency of simulated meteorological and impurity data under temperature rise scenarios with CAMS data, highlighting their potential for climate impact studies. Collectively, these evaluations support the use of CAMS as a reliable source of both air quality and meteorological data in regions with sparse monitoring networks.

Despite inherent limitations associated with the use of reanalysis and satellite-based data, this approach is well-suited for this study due to the insufficient coverage of active air quality monitoring stations in the selected cities. Notably, Belém and Goiânia lack any active air quality monitoring stations, and Porto Alegre only monitors PM₁₀, with no data available for finer particulate matter [65]. Furthermore, the overall air quality monitoring network in Brazil is sparse, with only 1.6% of municipalities equipped with active monitoring stations [65]. A significant proportion of these stations are operated by private entities (41%), often without providing public access to the data, which leads to limited transparency and accessibility [66]. In this context, satellite-based data provides a valuable and reliable alternative in the Brazilian context, offering a comprehensive tool for air quality assessment in regions where ground-based monitoring is either non-existent or insufficient. Moreover, the World Health Organization also endorses the use of satellite and reanalysis data as valid substitutes in regions with limited ground-based observations [67].

2.3. Simulation of Predictive Scenarios and Data Analysis

The methodology of this study was structured with the objective of evaluating the predictive performance of different machine learning algorithms in simulating concentrations of particulate matter (PM₁, PM_2.5, and PM₁₀) under distinct climate scenarios and meteorological configurations. The approach included bivariate modeling, simulation of temperature-increase scenarios, and performance analysis across models. While this method simplifies complex atmospheric interactions, it was adopted to effectively isolate the direct impact of temperature changes on model performance, which aligns with the comparative goal of this study.

The analyses were conducted using the software STATISTICA v.13.2, applying a supervised learning approach for all models. The Support Vector Machine (SVM) model was implemented with a radial basis function (RBF) kernel. The kernel’s gamma parameter was defined as the ratio of the number of dependent variables (continuous dependent variables) to the number of independent variables (continuous predictors) in the model. A v-fold cross-validation was applied using a 10-fold strategy with a fixed seed (Seed = 1000), and a grid search was performed for the regularization parameter C, testing values from 1 to 10 in increments of 1. The K-Nearest Neighbors (KNN) model was performed using the Euclidean distance metric, with input variables standardized and neighbors’ responses combined using a uniform mean. For this model, the range of nearest neighbors (k) was set from 1 to 5, with an increment of 1, and a v-fold cross-validation was applied with parameters set to a v-value of 10 and a seed value of 1000.

Random Forest is a full implementation of the Random Forest algorithm developed by Breiman [68]. It is used to predict a continuous dependent variable through complex interrelationship analyses. The model was implemented using the default settings provided by the Statistica v.13.2, with the following parameters: the number of decision trees (n_estimators) was set to 100; the maximum number of features considered in each split (max_features) was set to 1 (equivalent to the square root of the total number of predictors); the maximum depth of the tree (max_depth) was limited to 10 levels; and the maximum number of nodes (max_nodes) was set to 100. Furthermore, a 50% subsampling ratio was used, bootstrap sampling was applied, and the Gini index was adopted as the node splitting criterion. Stopping criteria included a minimum of 5 cases per split and a minimum of 5 observations per terminal node. Similarly to the previous models, a v-fold cross-validation was applied with parameters set to a v-value of 10 and a seed value of 1000.

The division of the datasets into training and testing sets was performed chronologically, covering five years of daily data (1826 records). For the training phase, all original observed data were used to fit the models, ensuring that 100% of the historical variability was incorporated. The testing phase employed a corresponding set of 1826 daily records, adjusted to simulate the temperature increases by +2 °C and +4 °C for the projected scenarios. This approach allowed the models to learn from the complete historical dataset while being evaluated on a directly comparable set reflecting the modified climate conditions, preserving the temporal structure inherent to the time-series data. Initially, the original data (baseline scenario without temperature increase) were maintained as the reference. Subsequently, two new scenarios were created by simulating an increase in daily mean temperature by +2 °C and +4 °C, based on projections from the Intergovernmental Panel on Climate Change (IPCC). The remaining meteorological variables were kept as in the original scenario and modeled considering the temperature increase in the new scenarios, using cross-validation in all analyses to prevent overfitting and evaluate model performance.

Predictions were carried out following the following steps: (i) Preliminary modeling of meteorological variables: Before analyzing the pollutants, a “meteorological prediction” step was conducted, in which the variables (wind speed, relative humidity, temperature, pressure, and precipitation) were modeled based on the simulated warming scenarios. This step aimed to understand the direct impact of temperature increase on the individual behavior of these variables. (ii) Modeling of atmospheric pollutants: Next, the meteorological variables were used as predictors in the modeling of atmospheric pollutants (PM₁, PM_2.5, and PM₁₀). This stage was designed to individually assess the influence of each meteorological variable on pollutant concentrations under the three climate scenarios.

To compare the performance of the machine learning models, the Root Mean Square Error (RMSE) was adopted. This metric quantifies the deviations between predicted and observed values, being especially sensitive to large errors and widely used in predictive environmental studies [69,70,71]. RMSE provides a direct evaluation of model accuracy across different scenarios and combinations of meteorological variables. Importantly, the RMSE was calculated using the results from the testing datasets for each model individually, allowing an unbiased assessment of predictive performance. The resulting RMSE values were organized into individual heatmaps for each analyzed city.

3. Results

Our results demonstrate important differences among the three machine learning algorithms evaluated (SVM, KNN, and RF) when predicting the concentrations of atmospheric pollutants PM₁, PM_2.5, and PM₁₀ under future temperature-increase scenarios (+2 °C and +4 °C). Figure 2, Figure 3, Figure 4 and Figure 5 present the RMSE values for the four cities studied, for each model tested.

To provide context for the RMSE values obtained from the machine learning models, we first present the observed mean concentrations of particulate matter (PM₁, PM_2.5, and PM₁₀) at the monitoring stations during the baseline period (2019–2023). In Belém, annual mean concentrations were approximately 10.8 µg/m³ (PM₁), 15.9 µg/m³ (PM_2.5), and 21.6 µg/m³ (PM₁₀), whereas in Goiânia, the annual means were 9.8 µg/m³ (PM₁), 11.5 µg/m³ (PM_2.5), and 16.3 µg/m³ (PM₁₀). Porto Alegre exhibited mean concentrations of 11.2 µg/m³ (PM₁), 14.7 µg/m³ (PM_2.5), and 20.4 µg/m³ (PM₁₀), while Recife showed 9.5 µg/m³ (PM₁), 12.1 µg/m³ (PM_2.5), and 18.0 µg/m³ (PM₁₀). Further details, including observed and predicted concentrations for all stations and datasets used in the models, are provided in Table S1 (Supplementary Materials). Observed values correspond to the whole dataset between the study period (2019–2023), while predicted values for the “complete” and other subsets correspond to the trained data.

Seasonal information regarding the data highlight relevant fluctuations between cities. In Belém, PM_2.5 ranged from 12.5 µg/m³ in winter to 20.2 µg/m³ in summer, while in Goiânia PM₁₀ varied from 13.8 µg/m³ in autumn to 20.9 µg/m³ in winter. In Porto Alegre PM₁ ranged from 6.84 µg/m³ in summer to 15.15 µg/m³ in winter, while in Recife, PM_2.5 values varied between 9.32 µg/m³ in winter and 14.63 µg/m³ in summer. These differences underscore the importance of considering seasonality when evaluating air quality and interpreting model performance. Further information, including detailed seasonal averages for all pollutants and cities, is provided in Table S2 (Supplementary Materials).

The Random Forest model consistently showed the best predictive performance, with notably low average RMSE values around 0.3 µg/m³ across the four cities analyzed. One of the strengths of this algorithm is its robustness under temperature-increase scenarios (+2 °C and +4 °C). In contrast, the KNN model demonstrated intermediate performance, with average absolute errors ranging between 0.5 and 0.6 µg/m³ under the baseline scenario. Although it exhibited a slight increase in error (0.15 µg/m³) in the projected scenarios, the model remained relatively stable, suggesting that it may be considered a viable alternative.

On the other hand, the SVM model exhibited certain weaknesses under the simulated climate change scenarios. Although it initially presented acceptable error levels—around µg/m³—these values increased as temperature rose. This behavior, characterized by high sensitivity to temperature increase, suggests that SVM may not be suitable for forecasting under future climate conditions involving higher warming.

A more detailed analysis, conducted individually for each meteorological variable using a bivariate approach, complemented these findings. Atmospheric pressure and relative humidity stood out as the most reliable predictors, especially when applied to the Random Forest model, yielding average errors below 1 µg/m³ even under the most extreme scenario (+4 °C). This result indicates that these variables exhibit low sensitivity to simulated climate changes. Precipitation also showed reasonable stability, suggesting its effective contribution to the wet removal of airborne particles. In contrast, temperature and wind speed were highly sensitive to projected warming scenarios. Notably, in the SVM model, wind alone led to errors exceeding 20 µg/m³, while in KNN these errors ranged between 5 and 6 µg/m³. Only the Random Forest model was able to reduce these errors to below 3 µg/m³, highlighting its effectiveness and robustness even when dealing with meteorological variables more sensitive to climate change.

Among the four cities analyzed, Porto Alegre and Goiânia exhibited the largest absolute fluctuations in RMSE values, a result that may be related to the more direct impact of temperature changes on atmospheric variables. In Belém, an increase in RMSE was observed only in the SVM model and for the PM₁₀ pollutant, while the other models continued to provide stable results. Recife, on the other hand, showed the most consistent and least sensitive behavior across all models and scenarios, likely due to the low variability of the meteorological input data.

In summary, this study demonstrated that the Random Forest algorithm is the most suitable and robust for predicting future particulate matter concentrations under climate warming scenarios. KNN can be used with caution in intermediate-level analyses, whereas SVM requires closer monitoring due to its sensitivity to climate-related instabilities.

Detailed information, including the observed and predicted concentrations of the analyzed pollutants based on both bivariate analyses and the full multivariate analysis using meteorological variables, is presented in Table S1 in the Supplementary Materials. This table includes results for all three machine learning models applied across the four selected cities.

4. Discussion

Machine Learning (ML) algorithms stand out for their ability to recognize patterns and perform specific tasks, rather than simply storing training data. In comparison to traditional statistical techniques, such as regression models, ML demonstrates a higher capacity for generalization when making predictions with new datasets, making it a valuable advantage of these approaches [72]. Several studies have been conducted in Brazil using ML algorithms as tools in predictive systems, focusing on various urban contexts [30,73,74,75,76].

When comparing the models in temperature-increase scenarios, the heatmaps show that Random Forest (RF) consistently provides the most reliable performance in the majority of cases. For both +2 °C and +4 °C scenarios, RF maintains consistently low RMSE values, close to zero, demonstrating its robustness against the simulated climate changes. A similar result was observed in the study by Kim et al. (2022) [77] in Seoul, South Korea, which evaluated the effectiveness of tree-based ML algorithms for predicting particulate matter (PM₁₀ and PM_2.5), highlighting the performance of RF. Using meteorological data from the LDAPS system and observations from 2018 to 2021, RF demonstrated a high capacity for generalization, with low bias and RMSE values, along with high R² coefficients. Compared to the deterministic CMAQ model, RF showed a 21% lower RMSE and a 0.20-point higher R², maintaining strong performance even with high pollutant concentrations (R² between 0.89 and 0.97). These results further reinforce the reliability of RF as an efficient and trustworthy alternative for air quality forecasting in complex urban contexts.

In our study, in addition to the differences in urban and climatic settings among the cities analyzed, the ML models were tested with the goal of understanding how atmospheric pollutants behave in relation to specific meteorological variables and temperature-increase scenarios. The KNN model exhibited intermediate performance: RMSE values were low in the original and +2 °C scenarios, but they tended to increase as the complexity grew. This limitation became more pronounced in more complex scenarios, such as the +4 °C scenario, where the atmospheric dynamics shifted away from normal meteorological patterns, compromising the model’s reliability [78]. Similarly, Boateng et al. [40] highlighted that while KNN is intuitive and efficient in contexts with well-distributed data and low complexity, its lack of an explicit training phase makes it vulnerable in more dynamic scenarios. Still, studies such as Evitania et al. [79] show that KNN can achieve satisfactory results, reaching up to 93.94% accuracy in predicting atmospheric pollution levels based on parameters such as PM₁₀, CO, NO₂, SO₂, and O₃, demonstrating its potential as a predictive tool, especially when applied to more stable contexts.

On the other hand, the Support Vector Machine (SVM) model exhibited greater variability in errors across the tested scenarios, including the original scenario, suggesting a higher sensitivity to the relationship between pollutants and meteorological variables. While this model is recognized in the literature for its ability to predict nonlinear relationships in environmental data [41,42,43], it performed poorly in the analyzed scenarios, causing instability in bivariate modeling that compromised the consistency of predictions. However, it is important to highlight that, in multivariate contexts, SVM proves to be an effective predictive model, demonstrating robust performance in different studies applied to environmental modeling and air quality forecasting [73,74,75,76,80,81].

In general, the results observed in this study largely reflect the structural characteristics of each ML architecture employed. The superior performance of Random Forest (RF) is aligned with its ensemble nature based on decision trees, which allows it to capture nonlinear and robust interactions between meteorological variables and particulate matter concentrations. As highlighted by Méndez et al. [37], RF tends to be less sensitive to noise and outliers in the data and, by aggregating multiple decision trees, provides a more stable and accurate modeling approach, even under non-stationary conditions, such as those induced by climate warming. RF’s ability to generate variable importance measures also facilitates a richer interpretation, highlighting which variables remain most relevant even under future +2 °C and +4 °C scenarios, without significant complications due to increased RMSE values.

This can be related to its dependence on specific kernel function choices and regularization parameters. While Boateng et al. [40] highlight that SVMs are generally resistant to overfitting and effective at handling noise, these advantages seem to have been outweighed by the model’s excessive sensitivity to changes in input data, particularly in bivariate analyses. This limitation suggests that, in the context of bivariate analysis in relation to climate change, SVM is not the most suitable model. In this sense, the results obtained point to the fact that the model architecture directly influences its ability to generalize and its stability under climate warming scenarios.

Despite the promising results, this study has some limitations that should be considered in the interpretation of the findings. Firstly, the analysis focused on the classical three machine learning algorithms, which limits the comparison to a narrow range of predictive approaches. The use of hybrid models or combinations of algorithms, which could potentially enhance prediction accuracy and robustness in more complex scenarios, was not explored. Additionally, while the +2 °C and +4 °C temperature-increase scenarios are plausible climate change projections, the associated meteorological variables may not fully account for the intricate atmospheric interactions expected in the future. Moving forward, future studies should consider expanding the range of machine learning models to include more diverse approaches, which could improve performance in varying contexts. It would also be beneficial to explore the combination of multiple models to enhance robustness, especially in complex and dynamic conditions. Ultimately, implementing these models on a national scale, considering the broad variety of urban and climatic conditions across Brazil, would provide more comprehensive insights and support the development of more effective, region-specific air quality management strategies. Similar approaches have been successfully applied in other contexts, such as in Zhang et al. [82], who demonstrated how machine learning models can disentangle the effects of specific policies, like traffic controls, on air pollution levels in a major urban area, highlighting the practical utility of these techniques for policy evaluation and environmental management.

5. Conclusions

This study highlights the promising potential of machine learning (ML) algorithms for predicting particulate matter concentrations in Brazilian cities under future climate scenarios. Among the models evaluated, Random Forest (RF) consistently demonstrated the best performance, with low RMSE values around 0.3 µg/m³ across all cities under both +2 °C and +4 °C temperature increases. This result underscores the robustness of RF in complex urban environments, making it a reliable tool for air quality forecasting under climate change. In contrast, K-Nearest Neighbors (KNN) showed good performance in simpler scenarios but struggled with increasing complexity, while Support Vector Machine (SVM) exhibited higher sensitivity to temperature changes, resulting in greater variability in predictions.

These findings reinforce the importance of using advanced ML models, particularly RF, for more accurate and reliable air quality predictions. The demonstrated ability of these models to handle the inherent complexity and variability in atmospheric data lays the groundwork for future efforts to implement them on a national scale. By incorporating diverse urban and climatic conditions across Brazil, such models could play a pivotal role in shaping region-specific, data-driven air quality management strategies, ultimately contributing to more effective public health policies and environmental protection in the context of a warming climate.

Supplementary Materials

The following supporting information can be downloaded at the following address: https://www.mdpi.com/article/10.3390/atmos16091052/s1, Table S1: Observed and predicted concentrations of the analyzed pollutants (PM₁, PM_2.5, and PM₁₀) based on bivariate analyses and the full multivariate analysis using meteorological variables. Results are presented for the three machine learning models employed in the study (Support Vector Machine–SVM, K-Nearest Neighbors–KNN, and Random Forest–RF) across the four selected cities; Table S2: Seasonal averages of particulate matter concentrations (in µg/m3) recorded in the cities of Belém (PA), Porto Alegre (RS), Goiânia (GO), and Recife (PE), during the period 2019 to 2023, organized by season (summer, autumn, winter, and spring), representing means calculated from the original observed data.

Author Contributions

Conceptualization, A.d.S.B., R.A.T., and F.M.R.d.S.J.; methodology, A.d.S.B., R.A.T., R.d.L.B., G.d.O.S., R.C.F., G.F.S., and F.M.R.d.S.J.; software, A.d.S.B., R.d.L.B., G.d.O.S., R.C.F., G.F.S., and R.A.M.; validation, R.A.T., R.d.L.B., G.d.O.S., R.C.F., G.F.S., R.A.M., and D.F.A.; formal analysis, A.d.S.B., R.A.T., R.d.L.B., G.d.O.S., R.C.F., G.F.S., and F.M.R.d.S.J.; investigation, A.d.S.B., R.C.F., G.F.S., and F.M.R.d.S.J.; resources, D.F.A. and F.M.R.d.S.J.; data curation, A.d.S.B., R.A.T., R.d.L.B., G.d.O.S., R.C.F., G.F.S., and F.M.R.d.S.J.; writing—original draft preparation, A.d.S.B., R.A.T., R.d.L.B., G.d.O.S., and F.M.R.d.S.J.; writing—review and editing, A.d.S.B., R.A.T., and F.M.R.d.S.J.; visualization, A.d.S.B., R.A.T., R.A.M., D.F.A., and F.M.R.d.S.J.; supervision, F.M.R.d.S.J.; project administration, F.M.R.d.S.J.; funding acquisition, F.M.R.d.S.J. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Coordenação de Aperfeiçoamento de Pessoal de Nível Superior—Brasil (CAPES)—Finance Code 001, Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP), Grant 2024/02579-0, Conselho Nacional de Desenvolvimento Científico e Tecnológico, Grants 307791/2023-8 and 444528/2023-7, and Fundação de Amparo à Pesquisa do Estado do Rio Grande do Sul (FAPERGS), Grants 21/2551-0001981-6 and 23/2551-0002130-2.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data used in this study are publicly available through the Copernicus Atmosphere Monitoring Service (CAMS) at https://atmosphere.copernicus.eu. The dataset includes daily time series of meteorological and air quality variables from 2019 to 2023, which are accessible for further analysis and research purposes. If needed, the raw data of air pollution supporting the conclusions of this article will be made available by the authors on request.

Acknowledgments

The authors would like to thank Universidade Federal do Rio Grande (FURG), Universidade Federal de São Paulo (UNIFESP), and Universidade Federal de Alagoas (UFAL) for their support in providing data and network assistance throughout the research. We also thank Leopoldo Silva for his work in developing the map of the geographic locations of the cities.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

ML	Machine Learning
SVM	Support Vector Machine
KNN	K-Nearest Neighbors
RF	Random Forest
CAMS	Copernicus Atmosphere Monitoring Service
IPCC	Intergovernmental Panel on Climate Change
RMSE	Root-mean-square deviation
ECMWF	Advancing global NWP through international collaboration
CMAQ	The Community Multiscale Air Quality Modeling System
PM	Particulate Matter
CO	carbon monoxide
NO₂	Nitrogen Dioxide
SO₂	Sulfur dioxide
O₃	Ozone
ITCZ	Intertropical Convergence Zone

References

World Health Organization. Air Pollution. 2023. Available online: https://www.who.int/health-topics/air-pollution (accessed on 24 July 2025).
Brauer, M.; Roth, G.A.; Aravkin, A.Y.; Zheng, P.; Abate, K.H.; Abate, Y.H.; Abbafati, C.; Abbasgholizadeh, R.; Abbasi, M.A.; Abbasian, M.; et al. Global burden and strength of evidence for 88 risk factors in 204 countries and 811 subnational locations, 1990–2021: A systematic analysis for the Global Burden of Disease Study 2021. Lancet 2024, 403, 2162–2203. [Google Scholar] [CrossRef]
Health Effects Institute. State of Global Air 2024. Special Report; Health Effects Institute: Boston, MA, USA, 2024; Available online: https://www.stateofglobalair.org/resources/report/state-global-air-report-2024 (accessed on 24 July 2025).
World Bank. The Global Health Cost of PM2.5 Air Pollution: A Case for Action Beyond 2021. World Bank Group. 2021. Available online: https://openknowledge.worldbank.org/entities/publication/c96ee144-4a4b-5164-ad79-74c051179eee (accessed on 24 July 2025).
Im, U.; Brandt, J.; Geels, C.; Hansen, K.M.; Christensen, J.H.; Andersen, M.S.; Solazzo, E.; Kioutsioukis, I.; Alyuz, U.; Balzarini, A.; et al. Assessment and economic valuation of air pollution impacts on human health over Europe and the United States as calculated by a multi-model ensemble in the framework of AQMEII3. Atmos. Chem. Phys. 2018, 18, 5967–5989. [Google Scholar] [CrossRef]
Tavella, R.A.; de Moura, F.R.; Miraglia, S.G.E.K.; da Silva Júnior, F.M.R. A New Dawn for Air Quality in Brazil. Lancet Planet. Health 2024, 8, e717–e718. [Google Scholar] [CrossRef]
Brasil. Painel da Poluição Atmosférica e Saúde Humana. Secretaria de Vigilância em Saúde e Ambiente, Ministério da Saúde. 2024. Available online: https://www.gov.br/saude/pt-br/composicao/svsa/saude-ambiental/vigiar (accessed on 24 July 2025).
Buralli, R.J.; Connerton, P. Air pollution, health and regulations in Brazil: Are we progressing? Cad. De Saúde Pública 2025, 41, e00172924. [Google Scholar] [CrossRef]
Manisalidis, I.; Stavropoulou, E.; Stavropoulos, A.; Bezirtzoglou, E. Environmental and health impacts of air pollution: A review. Front. Public Health 2020, 8, 14. [Google Scholar] [CrossRef] [PubMed]
Feng, W.; Li, H.; Wang, S.; Van Halm-Lutterodt, N.; An, J.; Liu, Y.; Liu, M.; Wang, X.; Guo, X. Short-term PM10 and emergency department admissions for selective cardiovascular and respiratory diseases in Beijing, China. Sci. Total Environ. 2019, 657, 213–221. [Google Scholar] [CrossRef] [PubMed]
Ab Manan, N.; Aizuddin, A.N.; Hod, R. Effect of air pollution and hospital admission: A systematic review. Ann. Glob. Health 2018, 84, 670. [Google Scholar] [CrossRef]
Bălă, G.P.; Râjnoveanu, R.M.; Tudorache, E.; Motișan, R.; Oancea, C. Air pollution exposure—The (in) visible risk factor for respiratory diseases. Environ. Sci. Pollut. Res. 2021, 28, 19615–19628. [Google Scholar] [CrossRef]
Niu, Z.; Liu, F.; Yu, H.; Wu, S.; Xiang, H. Association between exposure to ambient air pollution and hospital admission, incidence, and mortality of stroke: An updated systematic review and meta-analysis of more than 23 million participants. Environ. Health Prev. Med. 2021, 26, 15. [Google Scholar] [CrossRef]
Karimi, B.; Shokrinezhad, B. Air pollution and mortality among infant and children under five years: A systematic review and meta-analysis. Atmos. Pollut. Res. 2020, 11, 61–70. [Google Scholar] [CrossRef]
Liu, S.; Wang, L.; Zhou, L.; Li, W.; Pu, X.; Jiang, J.; Chen, Y.; Zhang, L.; Qiu, H. Differential effects of fine and coarse particulate matter on hospitalizations for ischemic heart disease: A population-based time-series analysis in Southwestern China. Atmos. Environ. 2020, 224, 117366. [Google Scholar] [CrossRef]
Tavella, R.A.; Penteado, J.O.; de Lima Brum, R.; da Silva Bonifácio, A.; San Martin, M.C.; Saes-Silva, E.; Brum, A.N.; Buffarini, R.; Filho, W.L.F.C.; Adamatti, D.F.; et al. An exploratory study on the association between air pollution and health problems (ICD-10) with an emphasis on respiratory diseases. Atmos. Pollut. Res. 2025, 16, 102377. [Google Scholar] [CrossRef]
Liu, C.; Chen, R.; Sera, F.; Vicedo-Cabrera, A.M.; Guo, Y.; Tong, S.; Coelho, M.S.Z.S.; Saldiva, P.H.N.; Lavigne, E.; Matus, P.; et al. Ambient particulate air pollution and daily mortality in 652 cities. N. Engl. J. Med. 2019, 381, 705–715. [Google Scholar] [CrossRef] [PubMed]
Chen, R.; Yin, P.; Meng, X.; Wang, L.; Liu, C.; Niu, Y.; Liu, Y.; Liu, J.; Qi, J.; You, J.; et al. Associations between coarse particulate matter air pollution and cause-specific mortality: A nationwide analysis in 272 Chinese cities. Environ. Health Perspect. 2019, 127, 017008. [Google Scholar] [CrossRef]
Burnett, R.; Chen, H.; Szyszkowicz, M.; Fann, N.; Hubbell, B.; Pope, C.A., III; Apte, J.S.; Brauer, M.; Cohen, A.; Weichenthal, S.; et al. Global estimates of mortality associated with long-term exposure to outdoor fine particulate matter. Proc. Natl. Acad. Sci. USA 2018, 115, 9592–9597. [Google Scholar] [CrossRef]
United States Environmental Protection Agency. Climate Change Impacts on Air Quality. 2025. Available online: https://www.epa.gov/climateimpacts/climate-change-impacts-air-quality (accessed on 24 July 2025).
World Health Organization. Climate Impacts of Air Pollution. Air quality, Energy and Health. 2025. Available online: https://www.who.int/teams/environment-climate-change-and-health/air-quality-energy-and-health/health-impacts/climate-impacts-of-air-pollution (accessed on 24 July 2025).
Kinney, P.L. Interactions of climate change, air pollution, and human health. Curr. Environ. Health Rep. 2018, 5, 179–186. [Google Scholar] [CrossRef]
Williams, M. Tackling climate change: What is the impact on air pollution? Carbon Manag. 2012, 3, 511–519. [Google Scholar] [CrossRef]
Pinho-Gomes, A.C.; Roaf, E.; Fuller, G.; Fowler, D.; Lewis, A.; ApSimon, H.; Noakes, C.; Johnstone, P.; Holgate, S. Air pollution and climate change. Lancet Planet. Health 2023, 7, e727–e728. [Google Scholar] [CrossRef]
Arshad, K.; Hussain, N.; Ashraf, M.H.; Saleem, M.Z. Air pollution and climate change as grand challenges to sustainability. Sci. Total Environ. 2024, 928, 172370. [Google Scholar] [CrossRef]
Lou, J.; Wu, Y.; Liu, P.; Kota, S.H.; Huang, L. Health effects of climate change through temperature and air pollution. Curr. Pollut. Rep. 2019, 5, 144–158. [Google Scholar] [CrossRef]
Orru, H.; Ebi, K.L.; Forsberg, B. The interplay of climate change and air pollution on health. Curr. Environ. Health Rep. 2017, 4, 504–513. [Google Scholar] [CrossRef]
Tran, H.M.; Tsai, F.J.; Lee, Y.L.; Chang, J.H.; Chang, L.T.; Chang, T.Y.; Chung, K.F.; Kuo, H.-P.; Lee, K.-Y.; Chuang, K.-J.; et al. The impact of air pollution on respiratory diseases in an era of climate change: A review of the current evidence. Sci. Total Environ. 2023, 898, 166340. [Google Scholar] [CrossRef]
Kaur, R.; Pandey, P. Air pollution, climate change, and human health in Indian cities: A brief review. Front. Sustain. Cities 2021, 3, 705131. [Google Scholar] [CrossRef]
Tavella, R.A.; Scursone, G.F.; dos Santos da Silva, L.; Nadaleti, W.C.; Adamatti, D.F.; El Khouri Miraglia, S.G.; da Silva, F.M.R., Jr. Predicting air pollution changes due to temperature increases in two Brazilian capitals using machine learning–a necessary perspective for a climate resilient health future. Int. J. Environ. Health Res. 2025, 1–15. [Google Scholar] [CrossRef]
Rybarczyk, Y.; Zalakeviciute, R. Machine learning approaches for outdoor air quality modelling: A systematic review. Appl. Sci. 2018, 8, 2570. [Google Scholar] [CrossRef]
Kang, G.K.; Gao, J.Z.; Chiao, S.; Lu, S.; Xie, G. Air quality prediction: Big data and machine learning approaches. Int. J. Environ. Sci. Dev 2018, 9, 8–16. [Google Scholar] [CrossRef]
Masih, A. Machine learning algorithms in air quality modeling. Glob. J. Environ. Sci. Manag. (GJESM) 2019, 5, 515–534. [Google Scholar] [CrossRef]
Baklanov, A.; Zhang, Y. Advances in air quality modeling and forecasting. Glob. Transit. 2020, 2, 261–270. [Google Scholar] [CrossRef]
Liang, Y.C.; Maimury, Y.; Chen, A.H.L.; Juarez, J.R.C. Machine learning-based prediction of air quality. Appl. Sci. 2020, 10, 9151. [Google Scholar] [CrossRef]
Subramaniam, S.; Raju, N.; Ganesan, A.; Rajavel, N.; Chenniappan, M.; Prakash, C.; Pramanik, A.; Basak, A.K.; Dixit, S. Artificial intelligence technologies for forecasting air pollution and human health: A narrative review. Sustainability 2022, 14, 9951. [Google Scholar] [CrossRef]
Méndez, M.; Merayo, M.G.; Núñez, M. Machine learning algorithms to forecast air quality: A survey. Artif. Intell. Rev. 2023, 56, 10031–10066. [Google Scholar] [CrossRef] [PubMed]
Ravindiran, G.; Hayder, G.; Kanagarathinam, K.; Alagumalai, A.; Sonne, C. Air quality prediction by machine learning models: A predictive study on the indian coastal city of Visakhapatnam. Chemosphere 2023, 338, 139518. [Google Scholar] [CrossRef]
Mohammadi, F.; Teiri, H.; Hajizadeh, Y.; Abdolahnejad, A.; Ebrahimi, A. Prediction of atmospheric PM2. 5 level by machine learning techniques in Isfahan, Iran. Sci. Rep. 2024, 14, 2109. [Google Scholar] [CrossRef]
Boateng, E.Y.; Otoo, J.; Abaye, D.A. Basic tenets of classification algorithms K-nearest-neighbor, support vector machine, random forest and neural network: A review. J. Data Anal. Inf. Process. 2020, 8, 341–357. [Google Scholar] [CrossRef]
Singh, S.; Suthar, G.; Kulshreshtha, N.M.; Brighu, U.; Bezbaruah, A.N.; Gupta, A.B. A Futuristic Approach to Subsurface-Constructed Wetland Design for the South-East Asian Region Using Machine Learning. ACS EST Water 2024, 4, 4061–4074. [Google Scholar] [CrossRef]
Singh, S.; Suthar, G.; Bhushan Gupta, A.; Bezbaruah, A.N. Machine Learning Approach for Predicting Perfluorooctanesulfonate Rejection in Efficient Nanofiltration Treatment and Removal. ACS EST Water 2025, 5, 1216–1228. [Google Scholar] [CrossRef]
Suthar, G.; Kaul, N.; Khandelwal, S.; Singh, S. Predicting land surface temperature and examining its relationship with air pollution and urban parameters in Bengaluru: A machine learning approach. Urban Clim. 2024, 53, 101830. [Google Scholar] [CrossRef]
Baran, B. Air quality Index prediction in besiktas district by artificial neural networks and k nearest neighbors. Mühendislik Bilim. Ve Tasarım Derg. 2021, 9, 52–63. [Google Scholar] [CrossRef]
Tella, A.; Balogun, A.L.; Adebisi, N.; Abdullah, S. Spatial assessment of PM10 hotspots using random forest, K-nearest neighbour and Naïve Bayes. Atmos. Pollut. Res. 2021, 12, 101202. [Google Scholar] [CrossRef]
Squizzato, R.; Nogueira, T.; Martins, L.D.; Martins, J.A.; Astolfo, R.; Machado, C.B.; Andrade, M.d.F.; de Freitas, E.D. Beyond megacities: Tracking air pollution from urban areas and biomass burning in Brazil. npj Clim. Atmos. Sci. 2021, 4, 17. [Google Scholar] [CrossRef]
Instituto Brasileiro de Geografia e Estatística. Cidades e Estados—Porto Alegre. 2022. Available online: https://www.ibge.gov.br/cidades-e-estados/rs/porto-alegre.html (accessed on 24 July 2025).
Beck, H.E.; Zimmermann, N.E.; McVicar, T.R.; Vergopolan, N.; Berg, A.; Wood, E.F. Present and future Köppen-Geiger climate classification maps at 1-km resolution. Sci. Data 2018, 5, 180214. [Google Scholar] [CrossRef]
Instituto Brasileiro de Geografia e Estatística. Cidades e Estados—Recife. 2022. Available online: https://www.ibge.gov.br/cidades-e-estados/pe/recife.html (accessed on 24 July 2025).
Lima, R.F.D.; Aparecido, L.E.D.O.; Torsoni, G.B.; Rolim, G.D.S. Climate change assessment in Brazil: Utilizing the Köppen-Geiger (1936) climate classification. Rev. Bras. Meteorol. 2023, 38, e38230001. [Google Scholar] [CrossRef]
Fonseca, A.F.; Rodrigues, D.T.; Gonçalves, W.A.; Cabral, J.B., Jr.; de Souza, D.O.; e Silva, C.S. Probability Of Sub-Hourly Extreme Precipitation Events In Recife, Brazil. J. South Am. Earth Sci. 2025, 164, 105670. [Google Scholar] [CrossRef]
Instituto Brasileiro de Geografia e Estatística. Cidades e Estados—Goiânia. Available online: https://www.ibge.gov.br/cidades-e-estados/go/goiania.html (accessed on 24 July 2025).
Seibt, T.C.; Lins, G.A.; Rodrigues, M.G.; de Almeida, J.R. The Threat Of Global Dimming And The Pollution Of Atmospherich Air Case Study: Goiânia-Goiás-Brazil. Rev. Int. Ciências 2013, 3, 27–39. [Google Scholar] [CrossRef]
Nascimento, D.; Lima, L.V.; Cruz, V. Episódios e gênese dos eventos climáticos extremos em Goiânia-GO/Episodes and genesis of extreme climate events in Goiânia-GO. Cad. Geogr. 2019, 29, 583–608. [Google Scholar] [CrossRef]
Instituto Brasileiro de Geografia e Estatística. Cidades e Estados—Belém. 2022. Available online: https://www.ibge.gov.br/cidades-e-estados/pa/belem.html (accessed on 24 July 2025).
Moraes, B.C.; Sodré, G.R.C.; Cardoso, A.C.D.; Silva, A.R., Jr. Crescimento urbano e suas implicações para o tempo e clima da região metropolitana de Belém do Pará. Rev. Bras. Geogr. Física 2022, 15, 2045–2060. [Google Scholar] [CrossRef]
Schneider, R.; Masselot, P.; Vicedo-Cabrera, A.M.; Sera, F.; Blangiardo, M.; Forlani, C.; Douros, J.; Jorba, O.; Adani, M.; Kouznetsov, R.; et al. Differential impact of government lockdown policies on reducing air pollution levels and related mortality in Europe. Sci. Rep. 2022, 12, 726. [Google Scholar] [CrossRef] [PubMed]
Casciaro, G.; Cavaiola, M.; Mazzino, A. Calibrating the CAMS European multi-model air quality forecasts for regional air pollution monitoring. Atmos. Environ. 2022, 287, 119259. [Google Scholar] [CrossRef]
Inness, A.; Ades, M.; Agustí-Panareda, A.; Barré, J.; Benedictow, A.; Blechschmidt, A.M.; Dominguez, J.J.; Engelen, R.; Eskes, H.; Flemming, J.; et al. The CAMS reanalysis of atmospheric composition. Atmos. Chem. Phys. 2019, 19, 3515–3556. [Google Scholar] [CrossRef]
Santos, J.E.K.; Tavella, R.A.; de Lima Brum, R.; Ramires, P.F.; da Silva, L.D.S.; Filho, W.L.F.C.; Nadaleti, W.C.; Correa, E.K.; da Silva, F.M.R., Jr. PM2. 5/PM10 ratios in southernmost Brazilian cities and its relation with economic contexts and meteorological factors. Environ. Monit. Assess. 2025, 197, 191. [Google Scholar] [CrossRef] [PubMed]
Paiva, M.S.; Franco, M.A.; Rizzo, L.V. Evaluation of near-surface atmospheric composition reanalysis data in the metropolis of São Paulo, Brazil. J. South. Hemisph. Earth Syst. Sci. 2025, 75, ES24041. [Google Scholar] [CrossRef]
Pedreira, A.L., Jr.; Curado, L.F.A.; Palácios, R.D.S.; dos Santos, L.O.F.; Querino, C.A.S.; Querino, J.K.A.D.S.; Rodrigues, T.R.; Marques, J.B. Evaluation of Aerosol Optical Depth (AOD) estimated by Copernicus Atmosphere Monitoring Service (CAMS) in Brazil. Theor. Appl. Climatol. 2025, 156, 116. [Google Scholar] [CrossRef]
Mejía, D.; Faican, G.; Zalakeviciute, R.; Matovelle, C.; Bonilla, S.; Sobrino, J.A. Spatio-temporal evaluation of air pollution using ground-based and satellite data during COVID-19 in Ecuador. Heliyon 2024, 10, e28152. [Google Scholar] [CrossRef]
Tavella, R.A.; das Neves, D.F.; Silveira, G.D.O.; Vieira de Azevedo, G.M.G.; Brum, R.D.L.; Bonifácio, A.D.S.; Machado, R.A.; Brum, L.W.; Buffarini, R.; Adamatti, D.F.; et al. The relationship between surface meteorological variables and air pollutants in simulated temperature increase scenarios in a medium-sized industrial city. Atmosphere 2025, 16, 363. [Google Scholar] [CrossRef]
Instituto de Energia e Meio Ambiente. Plataforma da Qualidade do Ar. 2024. Available online: https://energiaeambiente.org.br/qualidadedoar (accessed on 24 July 2025).
Vormittag, E.D.M.P.D.A.; Cirqueira, S.S.R.; Wicher, H.; Saldiva, P.H.N. Análise do monitoramento da qualidade do ar no Brasil. Estudos Avançados 2021, 35, 7–30. [Google Scholar] [CrossRef]
World Health Organization. WHO Global Air Quality Guidelines: Particulate Matter (PM2.5 and PM10), Ozone, Nitrogen Dioxide, Sulfur Dioxide and Carbon Monoxide; World Health Organization: Geneva, Switzerland, 2021. [Google Scholar]
Breiman, L. Random forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
Ahmad, M.; Kamiński, P.; Olczak, P.; Alam, M.; Iqbal, M.J.; Ahmad, F.; Sasui, S.; Khan, B.J. Development of prediction models for shear strength of rockfill material using machine learning techniques. Appl. Sci. 2021, 11, 6167. [Google Scholar] [CrossRef]
Palanichamy, N.; Haw, S.C.; Murugan, R.; Govindasamy, K. Machine learning methods to predict particulate matter PM 2.5. F1000Research 2022, 11, 406. [Google Scholar] [CrossRef] [PubMed]
Guo, J.; Liu, Y.; Zou, Q.; Ye, L.; Zhu, S.; Zhang, H. Study on optimization and combination strategy of multiple daily runoff prediction models coupled with physical mechanism and LSTM. J. Hydrol. 2023, 624, 129969. [Google Scholar] [CrossRef]
Yaghoubi, E.; Yaghoubi, E.; Khamees, A.; Vakili, A.H. A systematic review and meta-analysis of artificial neural network, machine learning, deep learning, and ensemble learning approaches in field of geotechnical engineering. Neural Comput. Appl. 2024, 36, 12655–12699. Available online: https://link.springer.com/article/10.1007/s00521-024-09893-7 (accessed on 24 July 2025). [CrossRef]
da Silva, F.M.R., Jr. “New Normal”: The Dynamics of Air Pollutants on the Interruption–Recovery Pattern Related to the COVID-19 Pandemic in Recife, Northeastern Brazil. Aerosol Sci. Eng. 2022, 6, 316–322. [Google Scholar] [CrossRef]
Tavella, R.A.; El Koury Santos, J.; de Moura, F.R.; da Silva, F.M.R., Jr. Better understanding the behavior of air pollutants at shutdown times–results of a short full lockdown. Int. J. Environ. Health Res. 2023, 33, 1525–1532. [Google Scholar] [CrossRef]
Brum, R.D.L.; Tavella, R.A.; Ramires, P.F.; Santos, J.E.K.; Klein, R.D.; Da Silva, F.M.R., Jr. Ozone and PM2.5 behavior in small cities in southern Brazil. Vittalle-Rev. Ciências Saúde 2023, 35, 62–72. [Google Scholar] [CrossRef]
da Silva Bonifácio, A.; de Lima Brum, R.; Tavella, R.A.; They, N.H.; Nadaleti, W.C.; Coronas, M.V.; Saes-Silva, E.; Brum, A.N.; Buffarini, R.; Filho, W.L.F.C.; et al. Health impact assessment of air pollutants in simulated temperature scenarios in the largest coal mining region of Brazil. Case Stud. Chem. Environ. Eng. 2024, 10, 100923. [Google Scholar] [CrossRef]
Kim, B.Y.; Lim, Y.K.; Cha, J.W. Short-term prediction of particulate matter (PM10 and PM2. 5) in Seoul, South Korea using tree-based machine learning algorithms. Atmos. Pollut. Res. 2022, 13, 101547. [Google Scholar] [CrossRef]
Alhathloul, S.H.; Mishra, A.K.; Khan, A.A. Low visibility event prediction using random forest and K-nearest neighbor methods. Theor. Appl. Climatol. 2024, 155, 1289–1300. [Google Scholar] [CrossRef]
Evitania, C.G. Implementation of the K-Nearest Neighbor Algorithm to Predict Air Pollution. Inf. Technol. Syst. 2023, 1, 45–54. [Google Scholar] [CrossRef]
Leong, W.C.; Kelani, R.O.; Ahmad, Z.J.J.O.E.C.E. Prediction of air pollution index (API) using support vector machine (SVM). J. Environ. Chem. Eng. 2020, 8, 103208. [Google Scholar] [CrossRef]
Bhuvaneshwari, K.S.; Uma, J.; Venkatachalam, K.; Masud, M.; Abouhawwash, M.; Logeswaran, T. Gaussian Support Vector Machine Algorithm Based Air Pollution Prediction. Comput. Mater. Contin. 2022, 71, 683–695. [Google Scholar] [CrossRef]
Zhang, F.; Yan, R.; Ye, X.; Fei, L.; Zhu, Y.; Chen, X.; Zhu, S.; Qi, B.; Xu, D.; Li, W. Machine Learning and Causal Inference for Disentangling Air Pollution Reduction During the Asian Games in Megacity Hangzhou. Environ. Pollut. 2025, 382, 126775. [Google Scholar] [CrossRef] [PubMed]

Figure 1. Geographic locations of the four Brazilian capitals selected for the study: Porto Alegre, Recife, Goiânia, and Belém.

Figure 2. Heatmaps of RMSE generated from each modeling approach in Porto Alegre.

Figure 3. Heatmaps of RMSE generated from each modeling approach in Recife.

Figure 4. Heatmaps of RMSE generated from each modeling approach in Goiânia.

Figure 5. Heatmaps of RMSE generated from each modeling approach in Belém.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Bonifácio, A.d.S.; Tavella, R.A.; Brum, R.d.L.; Silveira, G.d.O.; Fernandes, R.C.; Scursone, G.F.; Machado, R.A.; Adamatti, D.F.; da Silva Júnior, F.M.R. Evaluating Machine Learning Models for Particulate Matter Prediction Under Climate Change Scenarios in Brazilian Capitals. Atmosphere 2025, 16, 1052. https://doi.org/10.3390/atmos16091052

AMA Style

Bonifácio AdS, Tavella RA, Brum RdL, Silveira GdO, Fernandes RC, Scursone GF, Machado RA, Adamatti DF, da Silva Júnior FMR. Evaluating Machine Learning Models for Particulate Matter Prediction Under Climate Change Scenarios in Brazilian Capitals. Atmosphere. 2025; 16(9):1052. https://doi.org/10.3390/atmos16091052

Chicago/Turabian Style

Bonifácio, Alicia da Silva, Ronan Adler Tavella, Rodrigo de Lima Brum, Gustavo de Oliveira Silveira, Ronabson Cardoso Fernandes, Gabriel Fuscald Scursone, Ricardo Arend Machado, Diana Francisca Adamatti, and Flavio Manoel Rodrigues da Silva Júnior. 2025. "Evaluating Machine Learning Models for Particulate Matter Prediction Under Climate Change Scenarios in Brazilian Capitals" Atmosphere 16, no. 9: 1052. https://doi.org/10.3390/atmos16091052

APA Style

Bonifácio, A. d. S., Tavella, R. A., Brum, R. d. L., Silveira, G. d. O., Fernandes, R. C., Scursone, G. F., Machado, R. A., Adamatti, D. F., & da Silva Júnior, F. M. R. (2025). Evaluating Machine Learning Models for Particulate Matter Prediction Under Climate Change Scenarios in Brazilian Capitals. Atmosphere, 16(9), 1052. https://doi.org/10.3390/atmos16091052

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Evaluating Machine Learning Models for Particulate Matter Prediction Under Climate Change Scenarios in Brazilian Capitals

Abstract

1. Introduction

2. Materials and Methods

2.1. Study Area

2.2. Sampling Procedure and Monitoring Period

2.3. Simulation of Predictive Scenarios and Data Analysis

3. Results

4. Discussion

5. Conclusions

Supplementary Materials

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI