From Detection to Solution: A Review of Machine Learning in PM2.5 Sensing and Sustainable Green Mitigation Approaches (2021–2025)

Arpita Adhikari; Chaudhery Mustansar Hussain

doi:10.3390/pr13072207

and

¹

Department of Electronics and Communication Engineering, Techno Main Salt Lake, Kolkata 700091, India

²

Department of Chemistry and Environmental Science, New Jersey Institute of Technology, Newark, NJ 07102, USA

^*

Author to whom correspondence should be addressed.

Processes2025, 13(7), 2207;https://doi.org/10.3390/pr13072207

This article belongs to the Special Issue Environmental Protection and Remediation Processes

Version Notes

Order Reprints

Abstract

Particulate matter 2.5 (PM_2.5) pollution poses severe threats to public health, ecosystems, and urban sustainability. With increasing industrialization and urban sprawl, accurate pollutant monitoring and effective mitigation of PM_2.5 have become global priorities. Recent advancements in machine learning (ML) have revolutionized PM_2.5 sensing by enabling high-accuracy predictions, and scalable solutions through data-driven approaches. Meanwhile, sustainable green technologies—such as urban greening, phytoremediation, and smart air purification systems—offer eco-friendly, long-term strategies to reduce PM_2.5 levels. This review, covering research publications from 2021 to 2025, systematically explores the integration of ML models with conventional sensor networks to enhance pollution forecasting, pollutant source attribution, and intelligent pollutant monitoring. The paper also highlights the convergence of ML and green technologies, including nature-based solutions and AI-driven environmental planning, to support comprehensive air quality management. In addition, the study critically examines integrated policy frameworks and lifecycle-based assessments that enable equitable, sector-specific mitigation strategies across industrial, transportation, energy, and urban planning domains. By bridging the gap between cutting-edge technology and sustainable practices, this study provides a comprehensive roadmap for researchers to combat PM_2.5 pollution.

Keywords:

pollution forecasting; sensor calibration; low-cost sensors; source attribution; nature-based solutions; phytoremediation; urban greening; IoT integration

1. Introduction

Particulate matter [1], classified based on aerodynamic diameter [2], is a critical indicator of air quality with significant implications for environmental integrity and human health. Among its subtypes, PM₁₀ (particles with diameters ≤ 10 μm) and PM_2.5 (particles with diameters ≤ 2.5 μm) are the most extensively studied due to their prevalence and harmful effects [3]. While both can penetrate the respiratory tract, PM_2.5, due to its smaller size, exhibits a greater capacity to infiltrate deep into the alveolar regions of the lungs. This results in more severe physiological impacts, including cardiopulmonary disorders, systemic inflammation, and increased mortality rates. [4]. Unlike coarser particulates, PM_2.5 exhibits higher residence time in the atmosphere and greater surface area-to-volume ratios, facilitating the adsorption of toxic heavy metals and organic compounds, which magnify its toxicity [5]. The World Health Organization (WHO) has consistently emphasized PM_2.5 as a major contributor to the global disease burden, linking it to chronic respiratory diseases, stroke, and ischemic heart conditions [6]. Additionally, PM_2.5 contributes to atmospheric haze, alters radiative forcing, and disrupts ecosystem functions, necessitating urgent intervention. Elevated PM_2.5 levels have been implicated in the acceleration of climate-related feedback loops by influencing cloud condensation nuclei formation and radiative forcing [7]. From an ecological perspective, PM_2.5 has been associated with decreased photosynthetic activity in plants, soil degradation, and long-term atmospheric changes, making it a more insidious and challenging pollutant relative to PM₁₀ [8]. The escalating prevalence of PM_2.5, driven by industrial emissions, vehicular exhaust, and biomass combustion, underlines the urgent need for advanced detection and mitigation strategies.

Timely and accurate detection of PM_2.5 is a prerequisite for effective intervention strategies aimed at reducing exposure and minimizing health risks [9]. Conventional monitoring systems, though scientifically rigorous, often suffer from limitations in spatial resolution, cost-effectiveness, and scalability. In densely populated and pollution-prone urban environments, these shortcomings hinder real-time decision-making and the formulation of proactive mitigation policies [10]. Furthermore, PM_2.5’s dynamic nature-driven meteorological conditions and heterogeneous emission sources necessitate continuous data acquisition and adaptive modeling to capture temporal variations [11]. ML has emerged as a transformative approach to environmental monitoring, revolutionizing the way PM_2.5 data is collected, interpreted, and forecasted. Unlike deterministic models that rely on predefined assumptions, ML models excel in identifying complex, nonlinear relationships within high-dimensional datasets [12]. Techniques such as Random Forests, Support Vector Regression, Artificial Neural Networks (ANN), and deep learning frameworks like Long Short-Term Memory (LSTM) and Gated Recurrent Units (GRU) have demonstrated strong performance in managing PM_2.5 concentrations across diverse temporal and spatial scales [13]. These methods not only improve prediction accuracy but also support rapid model adaptation to evolving pollution patterns influenced by urbanization, industrial activities, and meteorological variability.

Recent advancements in ML have extended beyond forecasting to include sensor calibration, anomaly detection, and source attribution. ML-driven calibration techniques significantly enhance the reliability of low-cost air quality sensors by correcting for drift, temperature fluctuations, and humidity interference [14]. Ensemble learning and hybrid modeling approaches that integrate classical time-series techniques with neural networks have further improved the interpretability and robustness of forecasts, particularly in regions with complex pollution dynamics [15].

Moreover, the integration of ML with Internet of Things (IoT) platforms [16] has enabled the development of responsive, scalable air quality monitoring systems capable of providing near real-time alerts and long-term trend analysis [17]. Such systems are especially valuable in low- and middle-income countries, where traditional infrastructure is limited, but air quality challenges are severe. Research in this domain continues to expand, with growing attention to incorporating spatiotemporal attention mechanisms, data decomposition techniques and model optimization strategies to better capture localized pollution episodes and regional pollutant transport [18]. These developments have not only improved the scientific understanding of PM_2.5 behavior but also informed evidence-based policymaking and public health interventions. As the field matures, interdisciplinary collaborations between atmospheric scientists, data scientists, and policymakers will be critical to translating ML-driven insights into actionable outcomes.

Mitigating PM_2.5 pollution requires a multidisciplinary approach that combines advanced technology with ecological strategies. While ML and IoT-based sensors have greatly improved PM_2.5 detection, these tools must be paired with effective mitigation measures. Immediate solutions like air filtration and emission controls provide short-term relief [19], but must be integrated into broader, sustainable policies such as urban greening [20]. The dual imperative of enhancing detection fidelity and implementing mitigation at scale represents a cornerstone in combating PM_2.5-related health inequities and environmental degradation.

This review synthesizes recent advances from 2021 to 2025 in PM_2.5 monitoring through the integration of ML and sustainable mitigation strategies. The central aim is to systematically consolidate the fragmented yet rapidly evolving bodies of research on ML-enhanced PM_2.5 sensing, encompassing sensor calibration, forecasting, and source attribution, alongside sustainable mitigation interventions. By critically examining the interface between these traditionally disparate domains, the review elucidates their potential for synergistic integration. It further delineates a comprehensive conceptual framework illustrating that the convergence of advanced ML techniques with ecologically grounded practices can enable holistic and adaptive air quality management (Figure 1).

Figure 1. Machine learning in PM_2.5 management.

2. Review Methodology

This review adopts a systematically rigorous approach to literature synthesis, adhering to the methodological framework outlined by the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines [21] to ensure transparency, reproducibility, and methodological integrity. The bibliographic database Scopus was utilized due to its comprehensive indexing of peer-reviewed scientific literature across disciplines pertinent to atmospheric science, sensor engineering, and computational modeling. A refined search strategy was implemented using the following controlled vocabulary terms and keywords: PM_2.5, sensor, detection, calibration, prediction, forecasting, nowcasting, machine learning, green, sustainable mitigation strategies, nature-based, technology-based, health risks, economic costs, mortality. These terms were constrained to the article title field to enhance topical relevance and precision. The dataset was further filtered by restricting the search to documents categorized as journal articles, written in English, and published between 2021 and 2025. The inclusion criteria were rigorously formulated to identify studies centered on the calibration of low-cost PM_2.5 sensors, as well as the application of advanced machine learning techniques for particulate matter management. Additionally, the criteria incorporated research addressing sustainable mitigation approaches, encompassing both nature-based interventions and technologically driven solutions. Articles failing to meet these criteria, including duplicate records or studies lacking a direct focus on PM_2.5 or ML-driven sensor applications, were excluded. Relevance screening was performed through a tiered review of article titles, abstracts, and full texts to ensure methodological alignment with the review’s objectives.

3. PM_2.5 Pollution: Source Complexity and Societal Impact

3.1. Major Anthropogenic and Natural Sources of PM_2.5

The origin of PM_2.5 particles is multifaceted, arising from a combination of anthropogenic activities and natural phenomena. Among the anthropogenic contributors, combustion-related emissions—particularly from vehicular exhaust [22], coal-fired power plants [23], industrial manufacturing [24], and biomass burning [25]—constitute the most dominant and persistent sources. Traffic-related emissions contribute 25% to 40% of urban PM_2.5 in developed regions, while agricultural burning accounts for 30% to 50% in developing rural areas [26]. In urban centers, traffic congestion and unregulated diesel usage result in localized PM_2.5 hotspots [27], while in rural areas, crop residue burning [28] and household solid fuel combustion [29] exacerbate background concentrations. Black carbon (BC), a byproduct of incomplete combustion, represents 5% to 15% of PM_2.5 mass in megacities, with elevated concentrations near industrial zones [26]. Secondary particulate formation through atmospheric chemical reactions—such as the transformation of sulfur dioxide and nitrogen oxides into sulfates and nitrates—further amplifies PM_2.5 loads [30].

Natural sources, although less controllable, also contribute significantly; these include dust storms, volcanic activity, wildfires, and sea spray, with variable intensity across geoclimatic zones [31]. The complex interplay between these sources, mediated by meteorological variables, such as wind speed, humidity, and solar radiation, creates a dynamic and spatially heterogeneous PM_2.5 distribution [32] that challenges both detection and regulation efforts. Notably, source apportionment studies demonstrated that approximately half of PM_2.5-related mortality burdens in China were attributable to transboundary pollution from external regions [33], highlighting the need for coordinated regional mitigation strategies. Understanding the relative contributions and transformation mechanisms of these sources remains critical for developing targeted mitigation strategies and accurate source apportionment models. Table 1 presents the major sources of PM_2.5 along with their respective contributions to atmospheric concentrations.

Table 1. Major Sources of PM_2.5 and their Atmospheric Contributions.

3.2. Health Risks and Socioeconomic Burden

The serious health effects of prolonged exposure to PM_2.5 are well-documented, encompassing both acute and chronic conditions that extend across physiological systems. Short-term exposure to PM_2.5 contributed to 2.08% of total global deaths annually, with higher fractions in eastern Asia (3.21%) and urban areas (2.30%), indicating acute risks in densely populated and polluted regions [34]. At the cardiopulmonary level, PM_2.5 has been implicated in increased incidence of asthma [35], chronic obstructive pulmonary disease (COPD) [36], ischemic heart disease [37], and premature mortality [38]. Epidemiological evidence confirms that oxidative stress, systemic inflammation, and DNA damage constitute primary mechanisms linking PM_2.5 to cardiopulmonary and neurological disorders [39]. PM_2.5-induced immunotoxicity and genotoxicity may disrupt cellular homeostasis, accelerating carcinogenesis and metabolic dysfunction [39].

Fine particulate matter’s ability to penetrate deep pulmonary alveoli and translocate into the circulatory system underpins its heightened toxicity relative to coarser particulates [40]. Meta analyses confirm that long-term exposure to specific PM_2.5 components, including BC, nitrates (NO₃⁻), and nickel (Ni), exhibits nonlinear concentration response relationships with mortality, with Ni demonstrating the highest toxicity per unit mass [26]. Neuroinflammatory pathways triggered by PM_2.5 penetration have been associated with neurodegenerative diseases such as Alzheimer’s [41] and Parkinson’s [42].

The global economic burden of PM_2.5-related mortality reached $8.1 trillion in 2019 (6.1% of global GDP), with lower-middle-income countries experiencing the highest death rates (964 per 100,000 population) compared to high-income nations (622 per 100,000) [43]. Advanced econometric analyses reveal that lagged PM_2.5 exposure increases mortality rates, with GDP per capita inversely correlated to health impacts; for instance, a $10,000 rise in GDP reduces deaths by 20–110 per 100,000 [43]. From a socioeconomic perspective, the burden is equally profound: healthcare expenditures surge in polluted regions [44], and vulnerable populations—particularly those in low-income urban settlements—suffer disproportionately [45]. Per capita health costs from combustion sources averaged $147 globally (range: $0.01 in Marshall Islands to $660 in Hungary), with population aging driving 43% of excess costs in China ($72 per capita) and PM_2.5 concentration contributing $40 per capita in India [46]. The annual economic costs of PM_2.5 pollution in China alone reached approximately 127 billion USD, with significant cross-regional inequalities in both health impacts and abatement responsibilities [33].

Recent modeling studies estimated that a 1 μg/m³ reduction in PM_2.5 concentrations from priority emission sources could yield health benefits valued at 76–153 million USD in downwind regions [33], demonstrating the economic rationale for targeted pollution control. Globally, PM_2.5 contributes to approximately 4.1 million premature deaths annually, with low- and middle-income countries experiencing 65% of this burden due to higher exposure levels and limited healthcare infrastructure [26]. PM_2.5 exposure drives substantial economic losses through healthcare costs and reduced productivity, with estimated annual damages exceeding $150 billion in heavily affected regions [26]. These findings emphasize the urgent need for component-specific air quality regulations, particularly targeting BC and metals such as Ni, which pose disproportionate health risks relative to their atmospheric concentrations.

4. Key Machine Learning Approaches for PM_2.5 Monitoring

ML has transformed PM_2.5 monitoring by enabling precise prediction, sensor calibration, and fine-scale pollution mapping. This section critically reviews major ML paradigms—from supervised to emerging hybrid models—addressing the nonlinear complexities of particulate pollution. Capitalizing on heterogeneous data and advanced algorithms, these methods surpass traditional approaches, providing scalable solutions for urban air quality management. Recent advancements highlight the transformative role of graph-based architectures and attention mechanisms in enhancing the spatiotemporal resolution and predictive accuracy of pollution modeling. Concurrently, emerging paradigms, such as physics-informed learning, offer a robust framework for integrating domain-specific physical laws with data-driven approaches, thereby fostering a more coherent and physically consistent representation of atmospheric processes. Each ML technique is evaluated for its distinct strengths and functional roles in PM_2.5 analysis and environmental sensing.

4.1. Supervised Learning

Supervised learning [47] has become the cornerstone of PM_2.5 prediction, where labeled datasets train models to establish precise relationships between pollution levels and influencing factors. The growing emphasis on model interpretability has led to increased adoption of explainable AI (XAI) techniques like SHAP and LIME, which are particularly important for policy-relevant applications. These approaches are particularly effective when historical air quality data is available, enabling accurate forecasting and sensor calibration through regression and classification tasks. The work of Shahriar et al. [48] in Bangladeshi cities demonstrated that CatBoost’s Gradient Boosting framework can outperform hybrid ARIMA-ANN models in handling tropical urban pollution patterns. Meanwhile Choojam et al. [49] showed that ARIMA-ANN-REG hybrids with residual optimization achieve superior performance in Thailand by combining statistical and neural network approaches. Ensemble methods like Random Forest and Gradient Boosting have demonstrated particular effectiveness, as seen in Kim et al.’s [50] work where LightGBM outperformed traditional chemical transport models by 21% in RMSE reduction.

Neural networks, especially hybrid architectures like GNN-LSTM [51], have shown remarkable success in capturing complex spatiotemporal patterns. Bera et al.’s [52] ANN implementation in Kolkata during COVID-19 lockdowns further validated neural networks’ superiority over linear regression for capturing nonlinear atmospheric responses to abrupt emission changes. The strength of supervised learning lies in its ability to incorporate diverse input features—from meteorological parameters to urban morphology—while maintaining computational efficiency for operational deployment. Supervised models offer high accuracy and well-defined training protocols, but are heavily dependent on large, labeled datasets, limiting their applicability in regions with sparse monitoring infrastructure.

4.2. Semi-Supervised Learning

Bridging the gap between data scarcity and model accuracy, semi-supervised [53] approaches leverage limited labeled PM_2.5 measurements alongside abundant unlabeled environmental data. Emerging federated learning frameworks [54] now enable collaborative model training across distributed monitoring networks while preserving data privacy—a crucial development for multinational air quality initiatives. This paradigm is especially relevant for expanding monitoring capabilities in resource-constrained regions where reference-grade sensors are sparse. Paluang et al.’s [55] MLP-ANN model for Northern Thailand exemplifies this paradigm, effectively combining limited ground monitoring with satellite-derived aerosol data and emission inventories to estimate biomass burning impacts. These methods can significantly expand the utility of existing datasets, as evidenced by Li et al.’s [51] work on unmonitored site prediction, where the model effectively generalized from limited labeled data to new locations. The growing availability of low-cost sensor networks and satellite data makes semi-supervised learning increasingly relevant for global-scale air quality monitoring. While semi-supervised methods reduce reliance on labeled data and improve generalizability, their performance is often sensitive to noise in the unlabeled dataset, and they require careful validation.

4.3. Unsupervised Learning

When labeled data is scarce, unsupervised methods [56] extract hidden patterns from PM_2.5 measurements by autonomously clustering similar pollution events or reducing feature dimensionality. Modern implementations increasingly combine traditional techniques like principal component analysis (PCA) with neural approaches such as auto-encoders for more powerful feature extraction [57]. These techniques prove invaluable for identifying novel emission sources and anomalous air quality patterns without pre-existing categorization. Huang and Qian [58] and Huang et al. [59] demonstrated the power of variational and empirical mode decomposition (VMD/EMD) as unsupervised preprocessing steps that enhance GRU network performance by isolating interpretable temporal components. Clustering algorithms like k-means and DBSCAN can identify distinct pollution profiles across urban areas, while dimensionality reduction methods (PCA, t-SNE) help visualize high-dimensional sensor data. These approaches complement supervised methods by providing insights into underlying data distributions and identifying novel features for predictive modeling. For instance, the wavelet decomposition in Karimian et al.’s [60] study effectively separated PM_2.5 time series into interpretable components before supervised modeling, demonstrating the synergy between unsupervised and supervised paradigms. Unsupervised learning is scalable and label-independent, but its outputs often lack clear interpretability and require post hoc validation for environmental relevance.

4.4. Reinforcement Learning

Reinforcement learning (RL) [61] introduces adaptive decision-making to air quality management, where algorithms optimize pollution mitigation strategies through continuous environmental interaction. Recent work has shown that deep RL can dynamically optimize sensor network configurations and forecasting system parameters in real time. The attention mechanisms in Guyu et al.’s [62] MGCGRU-SAN framework demonstrate that the way adaptive features weighted—a key RL principle—can enhance prediction accuracy by dynamically prioritizing relevant spatiotemporal patterns. While still emerging in environmental applications, RL’s potential is highlighted by studies like Zhang et al.’s [63] attention-based models, which implicitly learn to prioritize spatial–temporal features—a capability that could be formally optimized through reinforcement paradigms. Despite its strength in dynamic environments and real-time learning, RL remains data- and computation-intensive and suffers from low interpretability compared to supervised methods.

4.5. Other Paradigms

Cutting-edge hybrid models combine the strengths of multiple ML paradigms with domain-specific knowledge of atmospheric science. By integrating physical constraints into data-driven architectures, these approaches enhance both the accuracy and interpretability of PM_2.5 prediction systems while supporting mitigation planning. Transfer learning enables knowledge sharing across geographical domains and temporal scales, dramatically reducing the data requirements for new monitoring locations. Meanwhile, physics-informed neural networks integrate atmospheric chemistry principles with data-driven approaches, addressing the interpretability challenges of pure ML models. These advanced techniques build upon the foundation laid by studies like Mohammadi et al., whose ANN successfully captured nonlinear atmospheric relationships while maintaining physical plausibility.

Hybrid approaches like GNN-LSTM demonstrate that combining spatial topology (via graphs) with temporal dynamics (via LSTM) yields superior accuracy and scalability, particularly for multi-station prediction tasks. The integration of domain knowledge with ML flexibility represents the next frontier in air quality modeling. Three particularly promising directions are emerging. Firstly, Graph Neural Networks (GNN) now enable more sophisticated modeling of air pollution dispersion networks and multi-station relationships [64], as detailed in Li et al.’s [51] GNN-LSTM work. Thereafter, physics-informed neural networks [65] integrate atmospheric chemistry principles with data-driven approaches, addressing interpretability challenges while maintaining the physical plausibility demonstrated in Mohammadi et al.’s [66] ANN study. Lastly, Transformer Architectures are showing promise for capturing long-range temporal dependencies beyond what GRUs/LSTMs achieve [67], complementing existing temporal approaches like Huang and Qian’s [58] VMD-GRU framework.

These innovations expand upon successful hybrid architectures like Faraji et al.’s [68] 3D CNN-GRU model for Tehran and Guyu et al.’s [62] MGCGRU-SAN framework for regional prediction. Transfer learning [69] enables knowledge sharing across geographical domains and temporal scales, dramatically reducing data requirements for new monitoring locations. Emerging techniques, like federated learning [70], now allow privacy-preserving collaboration across monitoring networks, while XAI methods [71] build trust in model outputs for policy applications.

The integration of domain knowledge with ML flexibility represents the next frontier in air quality modeling, as exemplified by Zhang et al.’s [63] attention-based models and Karimian et al.’s [60] wavelet-XGBoost hybrid. These advanced techniques continue to build upon the foundation laid by studies like Mohammadi et al.’s [66], which demonstrated that hybrid approaches can maintain physical consistency while leveraging ML’s pattern recognition strengths.

However, hybrid and physics-informed models often involve increased complexity and reduced transparency, necessitating careful model governance for real-world deployment. Overall, each ML paradigm entails trade-offs among data availability, computational demands, interpretability, and scalability (Table 2). Their strategic integration—as exemplified by hybrid models—may yield the most robust solutions for diverse PM_2.5 monitoring goals, from real-time forecasting to long-term mitigation planning.

Table 2. Summary of Key Machine Learning Paradigms for PM_2.5 Monitoring.

5. Framework of Machine Learning Integration in PM_2.5 Management

The process of applying ML in PM_2.5 management involves several sequential and interlinked stages. Each stage plays a vital role in transforming raw environmental data into actionable intelligence for pollution mitigation.

5.1. Sensor Calibration

Accurate data acquisition forms the bedrock of any PM_2.5 management system. However, low-cost sensors often suffer from variability, environmental interference, and hardware drift. ML algorithms, such as linear regression, Support Vector Regression, and Gaussian Processes, are employed to calibrate these sensors by correlating them with reference-grade instruments. This process ensures consistent, reliable data quality necessary for downstream analytics.

5.2. Next-Generation Monitoring

After calibration, sensor networks are deployed to create dense spatial–temporal monitoring grids. ML facilitates real-time anomaly detection, spatial interpolation, and intelligent sensor placement through clustering algorithms (e.g., K-means), principal component analysis (PCA), and adaptive learning techniques. These tools enhance the granularity and scalability of air quality data collection in urban and semi-urban environments.

5.3. Forecasting

Once robust data streams are established, ML is applied to forecast PM_2.5 concentrations. Time-series forecasting models such as ARIMA, Long Short-Term Memory (LSTM), Gated Recurrent Units (GRU), and ensemble methods like Gradient Boosting or Random Forest Regressors are trained on historical PM_2.5 data along with meteorological and land-use parameters. These predictive models offer short-term air quality projections essential for public health advisories and pollution alerts.

5.4. Predictive Modeling

Beyond short-term forecasts, predictive modeling plays a strategic role in simulating long-term pollution trends and assessing the impact of various mitigation strategies. ML approaches, including Decision Trees, Support Vector Machines (SVM), XGBoost, and Deep Neural Networks, are used to model scenario-based outcomes under different intervention frameworks—such as traffic restrictions, industrial regulations, or green buffer zones.

5.5. PM_2.5 Management

The ultimate goal of the framework is to enable data-driven PM_2.5 management. Insights generated from ML-powered forecasting and modeling are integrated into policymaking, urban planning, and environmental governance. For example, ML outputs may inform zoning laws, adaptive traffic policies, or targeted mitigation in pollution hotspots. Thus, ML transforms raw environmental data into strategic intelligence that supports sustainable air quality improvement.

6. Machine Learning Applications in PM_2.5 Monitoring and Modeling

ML has emerged as a pivotal technological enabler in environmental sensing, transforming the landscape of air quality monitoring through its capacity to model nonlinear, high-dimensional, and temporally dynamic systems [72]. Traditional physical and statistical models, although foundational, often fail to capture the complex interplay between emission sources, meteorological factors, and atmospheric chemical transformations that influence PM_2.5 dynamics. ML overcomes these limitations by autonomously learning from historical and real-time data, thereby improving predictive accuracy and enabling adaptive learning in the presence of evolving pollution patterns [73]. As environmental datasets continue to grow in scale and heterogeneity—spanning sensor networks, remote sensing platforms, and meteorological stations—ML offers a scalable and cost-efficient alternative to conventional modeling techniques.

6.1. Machine Learning for Low-Cost PM_2.5 Sensor Calibration

The calibration of low-cost PM_2.5 sensors using ML has emerged as a critical solution to enhance measurement accuracy, particularly in regions with limited regulatory-grade monitoring infrastructure. Recent studies demonstrate that ML-driven calibration can effectively compensate for sensor inaccuracies caused by environmental interference, cross-sensitivities, and manufacturing variability. In one notable investigation, Kumar and Sahu [74] conducted a comprehensive evaluation of nine ML regression algorithms for calibrating Plantower PMS5003 low-cost sensors. Their findings revealed that tree-based ensemble models, such as Random Forest and Gradient Boosting, along with k-Nearest Neighbors (kNN), outperformed linear approaches, like MLR, Lasso, and Ridge, due to their superior ability to handle nonlinear sensor responses.

While linear models exhibited underfitting and regression trees suffered from overfitting, ensemble methods provided superior generalization, offering a robust framework for improving low-cost sensor reliability in air quality networks. Nevertheless, Park et al. [75] explored the potential of temporal modeling through a HybridLSTM architecture that combines LSTM networks with Deep Neural Networks (DNNs) to capture time-dependent patterns in sensor data. Trained on PM_2.5, temperature, and humidity inputs, the model significantly outperformed traditional MLR and standalone DNNs in RMSE and R² metrics when validated against gravimetric reference measurements. This study highlighted the importance of integrating temporal features for dynamic calibration, though it also highlighted the need for broader validation across diverse geographic and seasonal conditions to ensure scalability. Further expanding the scope to large-scale deployments, Adong et al. [17] demonstrated the practical challenges of scaling calibration efforts by calibrating over 120 AirQo sensors across Sub-Saharan Africa using the Random Forest model (Figure 2). While cross-site validation affirmed the model’s general effectiveness, performance dips during short-term pollution spikes indicated a need for integrating meteorological variables and implementing periodic retraining to maintain accuracy through seasonal fluctuations.

Figure 2. A scatter plot compares reference-grade BAM measurements with those from the low-cost PM sensor (AQ_88) in the test set, illustrating their PM_2.5 relationship both before and after calibration with the proposed random forest model [17].

Srisang et al. [76] identified unit-to-unit variability in low-cost sensors and emphasized the importance of device-specific calibration and inclusion of environmental factors to improve the robustness of large-scale IoT-based air quality monitoring systems. Complementing these ground-level efforts, ML calibration has also been applied to enhance satellite-derived and model-based PM_2.5 estimates. For instance, Qor-el-aine et al. [77] tackled the underestimation bias in Copernicus Atmospheric Monitoring Service (CAMS) data over Hungary using LightGBM, Random Forest, and MLR. Among these, LightGBM delivered the best calibration performance. However, the coarse spatial resolution (0.1° × 0.1°) of CAMS data limited its ability to capture finer urban-scale pollution variations. This finding reinforces the necessity for higher-resolution monitoring networks and advanced ML techniques capable of resolving sub-grid variability, which is crucial for effective and localized air quality management. Table 3 summarizes ML-based calibration methods for PM_2.5 sensors, including the details of reference instrument and performance matrices of ML models.

Table 3. Comparative Analysis of ML-Based Calibration Methods for PM_2.5 Sensing.

6.2. Next-Generation PM_2.5 Monitoring Powered by Machine Learning

ML has significantly advanced PM_2.5 sensing beyond traditional monitoring, enabling high-resolution pollution mapping, source attribution, and dynamic exposure assessment. Recent studies leverage multi-source data, including satellite remote sensing, IoT networks, and human mobility patterns, to enhance spatial granularity and predictive accuracy in complex urban environments. One compelling example is the work by Ly et al. [78], which demonstrated the effectiveness of Random Forest models combined with Concentration Weighted Trajectory (CWT) and trajectory clustering analysis in disentangling local and transboundary PM_2.5 contributions in Hanoi. By integrating compact sensor data with meteorological normalization, the study revealed significant wintertime pollution inflows from northern industrial regions. Figure 3 shows five major 72-h backward trajectory clusters arriving at Hanoi, computed using the HYSPLIT model at 500 m AGL during high-pollution winter periods. The clusters originated from diverse regions including inland northeastern China, coastal marine areas, and northern Vietnam. Clusters No.2 and No.3, linked to the highest PM_2.5 levels, passed through heavily industrialized provinces, such as Hai Phong, Quang Ninh, and Hai Duong, with descending vertical velocities, indicating stagnant atmospheric conditions. In contrast, Clusters No. 4 and No. 5, which crossed less-industrialized areas in northern Vietnam, were associated with lower PM_2.5 concentrations. By integrating compact sensor data with meteorological normalization, the study uncovered significant pollution inflows from northern industrial regions during winter haze episodes. Additionally, partial dependence plots revealed that temperature inversions and humidity amplified localized emissions, offering researchers valuable insights into the seasonal drivers of pollution. This cluster-based analysis demonstrates the strength of ML in identifying dominant pollution pathways influenced by long-range transport and meteorology, which traditional methods often fail to capture.

Figure 3. Air-mass trajectory analysis includes (a) a spatial map of clustered trajectories and (b) the partial influence of these trajectories on PM_2.5 concentrations. Reprinted from [78].

In efforts to overcome gaps in ground-based monitoring infrastructure, Chai et al. [79] developed a hybrid XGBoost–Land Use Regression (LUR) model to enhance PM_2.5 estimation across China. By fusing satellite-derived aerosol optical depth (AOD) with urban landscape predictors, the model addressed the limitations of sparse ground networks. Seasonal stratification further reduced prediction errors to below 3%, while high-resolution outputs enabled the precise identification of pollution hotspots. These findings emphasize the potential of ML in democratizing access to air quality information, particularly in regions with limited monitoring infrastructure. Advancing exposure assessment, Yu et al. [13] introduced an AutoML-driven, mobility-based exposure (MBE) modeling framework that integrated low-cost sensor data with SafeGraph human mobility datasets. Their hybrid LUR model mapped PM_2.5 exposure risks by correlating pollution surges with high foot-traffic areas, such as transit hubs. This approach uncovered previously neglected exposure disparities and demonstrated that ML can synthesize heterogeneous data streams to inform targeted public health interventions.

Focusing on the interplay between urban form and seasonal pollution patterns, Lee et al. [80] employed Random Forest modeling with SHAP analysis to evaluate the influence of over 80 urban design variables on PM_2.5 concentrations across Seoul. During winter, traffic and industrial sources emerged as the dominant contributors. In the summer months, green spaces and favorable meteorological conditions were found to mitigate pollution levels, whereas, in autumn, road width and building density were identified as critical predictors of PM_2.5 variation. The model achieved high accuracy, with R² values ranging from 0.95 to 0.96, and its interpretability emphasized that IoT-enabled ML frameworks can support season-specific urban planning strategies—such as enhancing green infrastructure—to effectively reduce pollution.

Furthermore, Li et al. [81] enhanced real-time urban PM_2.5 mapping in Jinan by fusing mobile and fixed sensor data through a LightGBM model enriched with SHAP explainability. The model identified secondary inorganic aerosols as major pollution contributors and dynamically accounted for meteorological and traffic-related fluctuations. Despite challenges such as sensor drift, the API-integrated system proved to be scalable and cost-effective, providing a promising framework for adaptive and high-resolution air quality management in urban areas. A comparative overview of recent ML-based PM_2.5 sensing frameworks is provided in Table 4.

Table 4. Comparative Analysis of ML-Driven PM_2.5 Sensing Frameworks.

6.3. Machine Learning-Driven PM_2.5 Forecasting

ML has revolutionized PM_2.5 forecasting by enabling accurate predictions across multiple temporal scales, ranging from real-time alerts to long-term trend assessments. A notable contribution in this area is the study by Yang et al. [82], who conducted a comprehensive comparison of LSTM, CNN, and hybrid CNN-LSTM models for hourly PM_2.5 forecasting in Beijing. Their results indicated that the temporal autocorrelation of PM_2.5 was the most influential predictive factor, with co-pollutants and wind speed further improving model accuracy. The hybrid CNN-LSTM model outperformed others for short-term forecasts of less than 12 h by effectively capturing spatiotemporal patterns, while the standalone LSTM model demonstrated better generalization capabilities for longer lead times. These findings accentuate the importance of selecting forecasting architectures based on the specific temporal horizon, offering valuable guidance for optimizing air quality prediction systems in urban environments.

Expanding on hybrid architectures, Shahriar et al. [48] demonstrated the superiority of CatBoost over traditional statistical and hybrid models (ARIMA-ANN, ARIMA-SVM) for daily PM_2.5 forecasting in Bangladeshi megacities, highlighting Gradient Boosting’s efficacy in handling complex atmospheric dynamics in tropical urban environments. Similarly, Choojam et al. [49] advanced hybrid modeling through their ARIMA-ANN-REG framework, which integrated time-series analysis, neural networks, and regression layers to achieve superior performance in Thailand, while addressing the limitations of standalone ARIMA and ANN models through residual optimization. For non-stationary time-series challenges, decomposition-enhanced deep learning approaches have emerged as particularly effective.

Huang and Qian [58] introduced a self-weighted VMD-GRU model that adaptively prioritized error-prone subsequences during training, significantly improving prediction reliability for transient pollution events. This builds on earlier work by Huang et al. [59], where EMD-GRU hybrids mitigated phase-shift errors in Beijing’s PM_2.5 forecasts by decomposing raw data into stationary components before GRU processing, outperforming conventional deep learning methods. Zaini et al. [83] further validated the utility of signal preprocessing through their EEMD-LSTM model in urban Malaysia, which decomposed pollution data into intrinsic mode functions to enhance temporal pattern recognition for one-hour-ahead forecasts. These studies collectively highlight the transformative potential of combining decomposition techniques with deep learning to address non-stationarity and nonlinearity in air quality data.

Focusing on regional forecasting, Zhang et al. [63] developed a novel CNN model that integrates spatial–temporal attention mechanisms with residual learning to enhance prediction accuracy across the Yangtze River Delta (Figure 4). Their model dynamically adjusted the importance of pollution and meteorological patterns across cities, effectively capturing inter-city pollutant transport processes. The success of such architectures, alongside the aforementioned hybrid and decomposition-based approaches, highlights a paradigm shift toward multi-modal frameworks that balance temporal, spatial, and statistical learning for comprehensive air quality management.

Figure 4. (a) The average PM_2.5 prediction performance across three major cities and (b) the average improvement rate achieved by the proposed STA-ResCNN model for PM_2.5 prediction. Reprinted from [63].

In a different approach, Wood [84] explored the utility of supervised ML paired with optimized feature selection for forecasting PM_2.5 levels across fifteen cities in England. By prioritizing temporally derived trend features, the study reduced model input complexity while maintaining high predictive accuracy. Among the models tested, LASSO and Support Vector Regression emerged as the most effective, with LASSO standing out for real-time applications due to its computational efficiency. Additionally, the analysis revealed notable pollution declines during COVID-19 restriction periods, illustrating ML’s dual utility in both forecasting and retrospective trend analysis.

Abuouelezz et al. [85] conducted a comparative evaluation of forecasting models tailored to different time horizons in the UAE. Their study found that SVR was most effective for short-term alert generation, while Facebook Prophet yielded superior performance in long-term trend prediction. In contrast, decision tree models consistently underperformed. These findings highlight the importance of selecting models based on specific forecasting horizons and suggest that hybrid combinations of models may offer optimal solutions for future air quality forecasting systems.

Collectively, these studies demonstrate that contemporary PM_2.5 forecasting benefits from three key methodological advances: (1) hybrid architectures that combine complementary modeling paradigms, (2) signal decomposition techniques that address non-stationarity, and (3) adaptive learning mechanisms that optimize feature weighting, all while maintaining computational efficiency for operational deployment. Table 5 summarizes ML-based PM_2.5 forecasting models by location, data period and performance pattern.

Table 5. Summary of ML-Based PM_2.5 Forecasting Models.

6.4. Machine Learning for Predictive PM_2.5 Modeling

ML has transformed PM_2.5 prediction by enabling accurate estimations at both monitored and unmonitored locations through innovative model architectures and the integration of multi-source data. Addressing the challenge of estimating PM_2.5 levels at unmonitored locations, Li et al. [51] developed a two-stage hybrid deep learning framework that combined GNN with Long Short-Term Memory (LSTM) networks. This approach allowed the model to effectively process heterogeneous sensor data while capturing intricate spatial relationships and temporal dependencies.

Expanding the geographical scope of such applications, Bera et al. [52] demonstrated the superiority of ANN over traditional linear regression for PM_2.5 forecasting in Kolkata during COVID-19 lockdowns, showcasing neural networks’ enhanced capacity to capture nonlinear atmospheric responses to abrupt emission changes. Similarly, Paluang et al. [55] developed an MLP-ANN model incorporating biomass burning emissions and satellite-derived aerosol data for Northern Thailand, establishing a template for air quality prediction in regions with recurrent agricultural fires. The GNN-LSTM architecture produced optimized feature representations, which were further refined through a fully connected neural network, ultimately generating high-resolution hourly predictions across metropolitan regions. This nested methodology demonstrated a significant performance advantage over traditional spatial interpolation techniques, highlighting the efficacy of deep learning in bridging gaps within urban monitoring networks.

Faraji et al. [68] advanced this paradigm through their 3D CNN-GRU hybrid, which simultaneously captured spatial correlations across Tehran’s monitoring network and temporal pollution patterns, demonstrating the value of integrated spatiotemporal architectures for megacity applications. Building on these spatial approaches, Guyu et al. [62] introduced the MGCGRU-SAN framework that combined graph convolutional networks with attention mechanisms to model both localized pollution dynamics and long-term temporal trends across the Beijing–Tianjin–Hebei region.

Further advancing operational forecasting capabilities, Kim et al. [50] demonstrated the practical superiority of tree-based ML algorithms through their implementation of a Light Gradient Boosting (LGB) model for Seoul. By incorporating meteorological forecast data from the Local Data Assimilation and Prediction System (LDAPS), the model significantly outperformed conventional chemical transport models (CTMs). It achieved a 21% reduction in root mean square error (RMSE) while maintaining robustness during severe pollution episodes (Figure 5). This study provided strong evidence that tree-based methods can serve as both accurate and computationally efficient alternatives to traditional physics-based models when supported by high-quality meteorological data.

Figure 5. Displaying the hourly time series of observed PM_2.5 concentrations alongside predictions generated by the LGB and ADAM models for the period 1–6 March 2019. Shaded regions surrounding each line represent the corresponding standard deviations [50].

Karimian et al. [60] contributed to this evolving field by introducing a Wavelet Transform (WT)-XGBoost hybrid framework, which first decomposed PM_2.5 time series into interpretable sub-components and then applied ensemble modeling. Paired with minimal redundancy maximal relevance (mRMR) feature selection, the model achieved a high degree of accuracy (R² = 0.90) by isolating dominant temporal patterns and eliminating redundant inputs. This integrated approach demonstrated that signal processing techniques can be effectively combined with ensemble learning to enhance prediction performance and model interpretability—an essential trait for operational forecasting applications. Collectively, these studies demonstrate three key advancements in ML-driven PM_2.5 prediction: (1) hybrid architectures that combine complementary spatial and temporal learning paradigms, (2) incorporation of non-traditional data sources like satellite observations and emission inventories, and (3) adaptive frameworks capable of responding to abrupt environmental changes, such as lockdowns or biomass burning events.

Demonstrating the strength of neural networks in understanding complex environmental interactions, Mohammadi et al. [66] used ANN to model nonlinear relationships between meteorological variables and PM_2.5 concentrations in Isfahan. Following a rigorous multicollinearity analysis to eliminate confounding inputs, their ANN achieved a prediction accuracy of 90.1% using optimized meteorological data. Furthermore, the team used spatial interpolation techniques to generate monthly pollution maps, translating ML predictions into actionable geospatial insights for urban planning. This study set a benchmark for harmonizing atmospheric science with ML-driven forecasting systems.

Expanding the frontiers of spatial coverage, Makhdoomi et al. [86] pioneered the concept of virtual monitoring stations by applying a Gradient Boosting Regressor (GBR) to a seven-year multidimensional dataset. The model demonstrated outstanding performance (R² > 0.96), particularly in capturing extreme pollution events and complex atmospheric interactions. Notably, the model incorporated temporal patterns, including lockdown-related variations, to provide adaptive and responsive air quality forecasts. This study affirmed ensemble regression methods as powerful tools for creating intelligent virtual networks capable of enhancing real-time air quality prediction in data-scarce regions. Table 6 summarizes PM_2.5 prediction studies, highlighting key ML approaches adapted for the study, and performance on the basis of temporal and spatial resolution.

Table 6. Summary of PM_2.5 prediction studies, covering data characteristics, temporal scale and spatial resolutions.

7. Green and Sustainable Mitigation Strategies for PM_2.5 Management

The mitigation of PM_2.5 pollution requires a multifaceted approach that integrates nature-based solutions and technological innovations. Recent advances in pollution control have identified three critical intervention points: (1) source emission reduction through sustainable technologies, (2) atmospheric dispersion enhancement via urban planning, and (3) particulate capture through engineered and natural systems. While ML enhances detection and prediction capabilities, sustainable strategies are essential for long-term pollution reduction and improved air quality [87]. This section presents a comprehensive framework for PM_2.5 mitigation organized across four spatial scales: urban planning (macro-scale), neighborhood design (meso-scale), building-level interventions (micro-scale), and point-source control (nano-scale).

7.1. Nature-Based Solutions

Green spaces [88], blue spaces [89], and phytoremediation technologies [90] provide sustainable solutions for PM_2.5 mitigation through particulate capture and degradation. Certain plant species demonstrate particular effectiveness in improving both indoor and outdoor air quality through natural filtration processes [91]. These nature-based approaches deliver multiple benefits, combining pollution reduction with valuable ecosystem services when properly incorporated into planning strategies.

Emerging research highlights the critical role of green and blue space (GBS) design in mitigating PM_2.5 pollution, with effectiveness varying substantially based on vegetation type, spatial configuration, and environmental context. Comparative studies demonstrate that vegetated barriers, particularly tree-type fences, outperform solid walls by approximately 25% in PM_2.5 reduction through enhanced aerodynamic mixing and vertical air flow modification [92]. At the neighborhood scale, compact green spaces with high vegetation coverage (exceeding 85% within 300-m radii) and simplified geometric configurations (shape index of 1.2) demonstrate optimal performance for simultaneous PM_2.5 and CO₂ mitigation [93].

Innovative indoor applications show that mechanically-assisted green walls incorporating species such as Episcia cupreata achieve particulate removal rates 445% higher than passive systems [94] (Figure 6).

Figure 6. (a) Passive green wall (PGW) and (b) active green wall (AGW) systems demonstrate PM_2.5 deposition patterns on Nephrolepis exaltata (L.) Schott leaf surfaces [94].

Muenrew et al. [95] demonstrated that carefully selected tropical plant species in controlled environments can achieve substantial PM_2.5 reductions (Figure 7), with Spathiphyllum cannifolium showing particular effectiveness (26% reduction) due to its unique leaf morphology featuring grooved surfaces and amphistomatic structure. Their research on smart green wall systems further revealed these vegetated barriers could enhance natural PM_2.5 removal rates by over 400% in indoor environments, highlighting the potential for targeted phytoremediation solutions in built environments.

Figure 7. The smart green wall system demonstrated a significant reduction in PM_2.5 concentrations compared to the control empty room condition [95].

Larger contiguous green patches (minimum 50 hectares) consistently show stronger particulate reduction capabilities compared to fragmented vegetation, as evidenced by studies in rapidly urbanizing Chinese cities [96,97]. The normalized difference vegetation index (NDVI) emerges as a key indicator, with high NDVI vegetation clusters proving particularly effective in urban cores [98]. Among vegetation types, deciduous trees demonstrate superior particulate deposition compared to bushland, lawn parks, street trees, and water areas, due to their dense canopy structures [99]. However, the most effective mitigation strategies combine green and blue infrastructure, with synergistic effects becoming particularly pronounced when green space coverage exceeds 40% and blue space elements are spaced approximately 200 m apart [100]. According to the study of Cao et al. [101], blue spaces demonstrate significant potential for PM_2.5 mitigation when designed with specific spatial parameters, particularly when maintaining patch contiguity values above 0.26 and ensuring inter-patch distances remain under 400 m.

These findings collectively emphasize the importance of context-specific GBS design, recommending the following: prioritized deployment of compact, high NDVI vegetation in dense urban areas; strategic integration of water features meeting specific spatial criteria; and implementation of active biofiltration systems in high-traffic indoor environments. The research highlights that optimal PM_2.5 mitigation requires careful consideration of spatial scale, seasonal variations, and local pollution characteristics when planning urban green and blue infrastructure.

7.2. Technological Interventions

Recent advances in AI-driven systems are transforming PM_2.5 management in both indoor and urban environments. Jeong et al. [102] demonstrated the efficacy of RL in underground subway systems, where a proximal policy optimization framework reduced ventilation energy use by 22% while maintaining PM_2.5 below safety thresholds. Their two-stage approach combined genetic algorithm-calibrated mechanistic modeling with real-time adaptive control, achieving 19% prediction accuracy for nonlinear indoor air quality dynamics. Complementing these indoor solutions, Wang et al. [103] quantified the way, green technology innovation (GTI) drives outdoor emission reductions across Chinese cities, revealing that each 1% increase in green patents correlated with 0.01% lower PM_2.5 emissions. Industrial-scale interventions continue to evolve through smart systems integration. Electrostatic precipitators now achieve a high rate of removal of PM in power plants when paired with ML-optimized voltage modulation [104]. Similarly, IoT-enabled air purifiers leverage adaptive algorithms to balance indoor particulate removal with energy savings [105]. These technological interventions collectively demonstrate that data-driven approaches can transform urban air quality management when paired with appropriate infrastructure investments.

7.3. Policy and Implementation Frameworks

Air pollution mitigation requires a multifaceted approach integrating regulatory frameworks, economic incentives, behavioral interventions, and equity considerations across diverse contexts. Evidence from China demonstrates that sector-specific emission caps and low-carbon feedstock mandates (e.g., hydrogen adoption in aluminum production) significantly reduce industrial PM_2.5 and SO₂ emissions, while ML-informed policy modeling helps optimize these transitions [106]. Transportation policies must address both exhaust and non-exhaust sources, as shown in Bogotá, where over 50% of PM_2.5 emissions originated from vehicle wear and road dust. This necessitates paved road maintenance laws and Euro 6-equivalent standards, alongside subsidies for electric buses and hybrid fleets [107]. Energy infrastructure regulations are equally critical, with Quito’s experience proving that diesel generator use during blackouts sharply increases sulfur dioxide levels, demanding strict fuel quality standards (≤15 ppm sulfur) and renewable energy integration in emergency systems [108].

Urban design plays a pivotal role, where computational fluid dynamics reveal that grid-based street networks and varied building heights enhance PM_2.5 dispersion, informing zoning codes for aerodynamic cities [109]. Economic instruments, like carbon pricing and targeted investments in marginalized neighborhoods (e.g., road paving in Quito’s Guamaní district), can accelerate equitable transitions, while behavioral interventions must account for socioeconomic factors. China’s example shows pollution concern peaks at specific income and education thresholds, requiring tailored communication strategies [110]. Public participation through citizen science networks and crisis preparedness education further strengthens mitigation efforts, particularly when addressing systemic inequities like disproportionate exposure in low-income areas with poor ventilation. A comprehensive policy roadmap should combine these technological, economic and social approaches, employing lifecycle emission assessments, ML optimization, and meteorological integration to create sustainable, equitable air quality solutions across industrial, transportation, energy and urban planning sectors.

7.4. Integrated Assessment Approaches

Contemporary air quality management demands comprehensive integrated assessment approaches that combine life cycle analysis, source apportionment modeling, and spatially explicit cost–benefit evaluations to address complex pollution challenges. The life cycle assessment of polyhydroxyalkanoates production by Razak et al. [111] demonstrates the critical importance of evaluating environmental impacts across entire production chains, revealing the way biological extraction methods reduced solvent waste but generated significant PM_2.5 emissions (4.59 mg PM_2.5 eq/kg polymer) through energy-intensive drying processes. This highlights the necessity of life cycle thinking to identify and optimize environmental hotspots in industrial systems while avoiding pollution displacement.

At the policy level, the UK Integrated Assessment Model (UKIAM) developed by ApSimon et al. [112] provides a robust framework for multi-sectoral source apportionment, integrating atmospheric dispersion models to evaluate contributions from domestic, industrial and transboundary sources while employing innovative metrics like population-weighted mean concentration to assess health impacts. The model’s scenario analysis capabilities, including evaluation of shipping emission controls and traffic management strategies, offer valuable insights for evidence-based decision-making aligned with international air quality standards.

Spatial equity considerations are equally crucial, as demonstrated by [113] through their high-resolution (5 km × 5 km) cost–benefit analysis of China’s clean heating transition, which achieved 80% PM_2.5 reductions but revealed that uniform policies disproportionately burdened rural populations. Their ML-enhanced framework identified optimal intervention points and recommended targeted fiscal transfers to address spatial externalities, providing a model for equitable policy design in energy transitions.

These integrated approaches collectively emphasize three key principles for effective PM_2.5 management: (1) comprehensive life cycle assessment to prevent pollution transfer across production stages; (2) multi-sectoral modeling to optimize source-specific interventions; and (3) spatially differentiated implementation to ensure equitable outcomes. Future strategies should leverage emerging technologies like ML for predictive optimization while maintaining flexibility to address regional variations in emission sources, infrastructure, and socioeconomic conditions. The integration of these assessment methodologies offers a powerful paradigm for developing policies that simultaneously achieve air quality improvements, climate mitigation, and sustainable development goals across diverse contexts.

8. Discussion

The integration of machine learning with PM_2.5 monitoring and mitigation has achieved notable progress, yet critical challenges remain in addressing the complex spatiotemporal variability of particulate matter. Advanced architectures, including GNN LSTM hybrids [51] and 3D CNNs [68], have demonstrated improved capability in capturing localized pollution dynamics, yet significant limitations persist. Current models frequently fail to account for abrupt concentration changes during pollution episodes, micro-scale variations across urban terrain, and the effects of long-range transport that alter regional pollution baselines. This spatiotemporal variability undermines model generalizability, particularly when training data lacks sufficient geographic coverage or temporal duration. The interpretability challenge compounds these issues, as complex models that better handle variability often sacrifice the transparency needed for regulatory adoption. Meanwhile, sensor networks in developing regions face additional hurdles of data scarcity and calibration drift, further limiting their ability to capture fine-grained spatiotemporal patterns. These technical barriers reveal a pressing need for more robust frameworks that can adapt to both rapid temporal fluctuations and heterogeneous spatial distributions of PM_2.5, while maintaining practical utility for decision- makers.

9. Conclusions and Future Outlook

In synthesizing the critical advancements from 2021 to 2025, this review unequivocally demonstrates that the synergistic integration of sophisticated ML methodologies with sustainable green mitigation strategies represents the most promising paradigm for confronting the global PM_2.5 crisis. Machine learning has demonstrably revolutionized detection capabilities. It enables unprecedented accuracy in low-cost sensor calibration. It also enhances high-resolution spatiotemporal forecasting that accounts for urban microenvironments and regional transport patterns. Furthermore, ML contributes to source attribution and predictive modeling. Collectively, these advancements overcome the limitations of conventional monitoring and provide actionable intelligence for urban air quality management.

Future systems must prioritize dynamic architectures, like self-adjusting graph networks and attention-based temporal models, to better capture PM_2.5’s inherent spatiotemporal variability across different scales. Simultaneously, nature-based solutions have emerged through systematic investigations into vegetation structure. These include the strategic planning of green and blue infrastructures, and the deployment of active biofiltration systems. In parallel, artificial intelligence-enabled technological interventions have advanced significantly. Together, these approaches offer scalable and environmentally sustainable strategies. They contribute to the mitigation of particulate matter emissions at both the source and the point of human exposure. These mitigation strategies will need to incorporate spatiotemporal optimization algorithms to maximize their effectiveness against pollution patterns that vary dramatically by season, weather, and urban morphology.

However, several critical challenges remain. These challenges include the need to improve the interpretability of complex machine learning models. There is also the difficulty of expanding sensor networks in regions with limited resources. Another key issue is the need to develop adaptive algorithms that can maintain accuracy across diverse spatiotemporal contexts. Additionally, long-term ecological sustainability and spatial efficiency must be ensured. This is particularly important for large green infrastructure systems that aim to convert predictive insights into practical mitigation actions.

Future progress will require stronger interdisciplinary collaboration that brings together atmospheric science, data engineering, urban ecology, and public policy to create solutions that are both technically robust against spatiotemporal variability and practically implementable across different geographic contexts. Ultimately, the pursuit of sustainable air quality necessitates a comprehensive integration of advanced technological innovations with ecologically restorative practices. This dual approach ensures that enhancements in pollutant detection, monitoring, and predictive modeling are effectively translated into actionable and durable interventions. Crucially, such strategies must not only address environmental degradation but also safeguard public health outcomes. Furthermore, these solutions should be adaptable and equitable, delivering consistent benefits across regions with varying patterns of pollution exposure and socioeconomic conditions.

Author Contributions

Conceptualization, C.M.H.; methodology A.A.; writing—original draft preparation, A.A.; writing—review and editing, C.M.H. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Conflicts of Interest

The authors declare no conflict of interest.

References

Liu, Y.; Xu, F.; Liu, W.; Liu, X.; Wang, D. Characteristics, Sources, Exposure, and Health Effects of Heavy Metals in Atmospheric Particulate Matter. Curr. Pollut. Rep. 2025, 11, 16. [Google Scholar] [CrossRef]
Titova, A.G.; Zanyatkin, I.A.; Volkova, A.G.; Nechaev, D.N.; Trusov, G.A. Epigenetic Markers of The Influence of Particulate Matter with Different Aerodynamic Diameters on Human Health: A Review. Ekol. Cheloveka Hum. Ecol. 2021, 28, 4–12. [Google Scholar] [CrossRef]
Yadav, V.K.; Bijekar, S.; Gacem, A.; Alkahtani, A.M.; Yadav, K.K.; Alreshidi, M.A.; Kumar, P.; Ghosh, T.; Verma, R.K.; Mishra, S.; et al. The Impact of Fine Particulate Matters (PM₁₀, PM_2.5) from Incense Smokes on the Various Organ Systems: A Review of an Invisible Killer. Part. Part. Syst. Charact. 2024, 41, 2300157. [Google Scholar] [CrossRef]
Afthab, M.; Hambo, S.; Kim, H.; Alhamad, A.; Harb, H. Particulate Matter-Induced Epigenetic Modifications and Lung Complications. Eur. Respir. Rev. 2024, 33, 240129. [Google Scholar] [CrossRef]
Wang, J.; Zhang, S.; Qiu, X.; Li, K.; Li, J.; Ren, Y.; Zhu, C.; Zhang, X. Characteristics and Health Risks of PM_2.5-Bound Metals in a Central City of Northern China: A One-Year Observation Study. Aerosol Air Qual. Res. 2024, 24, 230165. [Google Scholar] [CrossRef]
World Health Organization. WHO Global Air Quality Guidelines: Particulate Matter (PM_2.5 and PM₁₀), Ozone, Nitrogen Dioxide, Sulfur Dioxide and Carbon Monoxide. Available online: https://www.who.int/publications/i/item/9789240034228 (accessed on 27 May 2025).
Gao, D.; Zhao, B.; Wang, S.; Wang, Y.; Gaudet, B.; Zhu, Y.; Wang, X.; Shen, J.; Li, S.; He, Y.; et al. Increased Importance of Aerosol–Cloud Interactions for Surface PM_2.5 Pollution Relative to Aerosol–Radiation Interactions in China with the Anthropogenic Emission Reductions. Atmos. Chem. Phys. 2023, 23, 14359–14373. [Google Scholar] [CrossRef]
Roy, A.; Mandal, M.; Das, S.; Popek, R.; Rakwal, R.; Agrawal, G.K.; Awasthi, A.; Sarkar, A. The Cellular Consequences of Particulate Matter Pollutants in Plants: Safeguarding the Harmonious Integration of Structure and Function. Sci. Total. Environ. 2024, 914, 169763. [Google Scholar] [CrossRef]
Kurniawati, S.; Santoso, M.; Nurhaini, F.F.; Atmodjo, D.P.D.; Lestiani, D.D.; Ramadhani, M.F.; Kusmartini, I.; Syahfitri, W.Y.N.; Damastuti, E.; Tursinah, R. Assessing Low-Cost Sensor for Characterizing Temporal Variation of PM_2.5 in Bandung, Indonesia. Kuwait J. Sci. 2025, 52, 100297. [Google Scholar] [CrossRef]
Yang, Z.; Zdanski, C.; Farkas, D.; Bang, J.; Williams, H. Evaluation of Aerosol Optical Depth (AOD) and PM_2.5 Associations for Air Quality Assessment. Remote Sens. Appl. Soc. Environ. 2020, 20, 100396. [Google Scholar] [CrossRef]
Chen, Z.; Chen, D.; Zhao, C.; Kwan, M.; Cai, J.; Zhuang, Y.; Zhao, B.; Wang, X.; Chen, B.; Yang, J.; et al. Influence of Meteorological Conditions on PM_2.5 Concentrations across China: A Review of Methodology and Mechanism. Environ. Int. 2020, 139, 105558. [Google Scholar] [CrossRef]
Olawade, D.B.; Wada, O.Z.; Ige, A.O.; Egbewole, B.I.; Olojo, A.; Oladapo, B.I. Artificial Intelligence in Environmental Monitoring: Advancements, Challenges, and Future Directions. Hyg. Environ. Health Adv. 2024, 12, 100114. [Google Scholar] [CrossRef]
Yu, M.; Zhang, S.; Zhang, K.; Yin, J.; Varela, M.; Miao, J. Developing High-Resolution PM_2.5 Exposure Models by Integrating Low-Cost Sensors, Automated Machine Learning, and Big Human Mobility Data. Front. Environ. Sci. 2023, 11, 1223160. [Google Scholar] [CrossRef]
Tao, H.; Jawad, A.H.; Shather, A.H.; Al-Khafaji, Z.; Rashid, T.A.; Ali, M.; Al-Ansari, N.; Marhoon, H.A.; Shahid, S.; Yaseen, Z.M. Machine Learning Algorithms for High-Resolution Prediction of Spatiotemporal Distribution of Air Pollution from Meteorological and Soil Parameters. Environ. Int. 2023, 175, 107931. [Google Scholar] [CrossRef]
Damkliang, K.; Chumnaul, J. Deep Learning and Statistical Approaches for Area-Based PM_2.5 Forecasting in Hat Yai, Thailand. J. Big Data 2025, 12, 36. [Google Scholar] [CrossRef]
Popescu, S.M.; Mansoor, S.; Wani, O.A.; Kumar, S.S.; Sharma, V.; Sharma, A.; Arya, V.M.; Kirkham, M.B.; Hou, D.; Bolan, N.; et al. Artificial Intelligence and IoT Driven Technologies for Environmental Pollution Monitoring and Management. Front. Environ. Sci. 2024, 12, 1336088. [Google Scholar] [CrossRef]
Adong, P.; Bainomugisha, E.; Okure, D.; Sserunjogi, R. Applying Machine Learning for Large Scale Field Calibration of Low-Cost PM_2.5 and PM₁₀ Air Pollution Sensors. Appl. AI Lett. 2022, 3, e76. [Google Scholar] [CrossRef]
Tang, D.; Zhan, Y.; Yang, F. A Review of Machine Learning for Modeling Air Quality: Overlooked but Important Issues. Atmos. Res. 2024, 300, 107261. [Google Scholar] [CrossRef]
Niu, Z.; He, Q.; Chen, C. A PM_2.5 Pollution-Level Adaptive Air Filtration System Based on Elastic Filters for Reducing Energy Consumption. J. Hazard. Mater. 2024, 478, 135546. [Google Scholar] [CrossRef]
Li, L.; Zheng, M.; Zhang, J.; Li, C.; Ren, Y.; Jin, X.; Chen, J. Effects of Green Infrastructure on the Dispersion of PM_2.5 and Human Exposure on Urban Roads. Environ. Res. 2023, 223, 115493. [Google Scholar] [CrossRef]
Page, M.J.; McKenzie, J.E.; Bossuyt, P.M.; Boutron, I.; Hoffmann, T.C.; Mulrow, C.D.; Shamseer, L.; Tetzlaff, J.M.; Akl, E.A.; Brennan, S.E.; et al. The PRISMA 2020 Statement: An Updated Guideline for Reporting Systematic Reviews. BMJ 2021, 372, n71. [Google Scholar] [CrossRef]
Tomar, G.; Nagpure, A.S.; Kumar, V.; Jain, Y. High Resolution Vehicular Exhaust and Non-Exhaust Emission Analysis of Urban-Rural District of India. Sci. Total Environ. 2022, 805, 150255. [Google Scholar] [CrossRef] [PubMed]
Tsai, C.-Y.; Chen, T.-F.; Chang, K.-H. Role of an Ultra-Large Coal-Fired Power Plant in PM_2.5 Pollution in Taiwan. Atmosphere 2024, 15, 56. [Google Scholar] [CrossRef]
Lin, C.-H.; Lai, C.-H.; Hsieh, T.-H.; Tsai, C.-Y. Source Apportionment and Health Effects of Particle-Bound Metals in PM_2.5 near a Precision Metal Machining Factory. Air Qual. Atmos. Health 2022, 15, 605–617. [Google Scholar] [CrossRef]
Zauli-Sajani, S.; Thunis, P.; Pisoni, E.; Bessagnet, B.; Monforti-Ferrario, F.; De Meij, A.; Pekar, F.; Vignati, E. Reducing Biomass Burning Is Key to Decrease PM_2.5 Exposure in European Cities. Sci. Rep. 2024, 14, 10210. [Google Scholar] [CrossRef] [PubMed]
Chen, S.; Liu, D.; Huang, L.; Guo, C.; Gao, X.; Xu, Z.; Yang, Z.; Chen, Y.; Li, M.; Yang, J. Global Associations between Long-Term Exposure to PM_2.5 Constituents and Health: A Systematic Review and Meta-Analysis of Cohort Studies. J. Hazard. Mater. 2024, 474, 134715. [Google Scholar] [CrossRef]
Khreis, H.; Sanchez, K.A.; Foster, M.; Burns, J.; Nieuwenhuijsen, M.J.; Jaikumar, R.; Ramani, T.; Zietsman, J. Urban Policy Interventions to Reduce Traffic-Related Emissions and Air Pollution: A Systematic Evidence Map. Environ. Int. 2023, 172, 107805. [Google Scholar] [CrossRef]
Lan, R.; Eastham, S.D.; Liu, T.; Norford, L.K.; Barrett, S.R.H. Air Quality Impacts of Crop Residue Burning in India and Mitigation Alternatives. Nat. Commun. 2022, 13, 6537. [Google Scholar] [CrossRef]
Lai, A.; Lee, M.; Carter, E.; Chan, Q.; Elliott, P.; Ezzati, M.; Kelly, F.; Yan, L.; Wu, Y.; Yang, X.; et al. Chemical Investigation of Household Solid Fuel Use and Outdoor Air Pollution Contributions to Personal PM_2.5 Exposures. Environ. Sci. Technol. 2021, 55, 15969–15979. [Google Scholar] [CrossRef]
Guo, F.; Xie, S. Formation Mechanisms of Secondary Sulfate and Nitrate in PM_2.5. Prog. Chem. 2023, 35, 1313–1326. [Google Scholar] [CrossRef]
Delbari, S.H.; Zare Shahne, M.; Hosseini, V. An Analysis of Primary Contributing Sources to the PM_2.5 Composition in a Port City in Canada Influenced by Traffic, Marine, and Wildfire Emissions. Atmos. Environ. 2024, 334, 120712. [Google Scholar] [CrossRef]
Parra, J.C.; Gómez, M.; Salas, H.D.; Botero, B.A.; Piñeros, J.G.; Tavera, J.; Velásquez, M.P. Linking Meteorological Variables and Particulate Matter PM_2.5 in the Aburrá Valley, Colombia. Sustainability 2024, 16, 10250. [Google Scholar] [CrossRef]
Liu, M.; Lei, Y.; Wang, X.; Xue, W.; Zhang, W.; Jiang, H.; Wang, J.; Bi, J. Source Contributions to PM_2.5-Related Mortality and Costs: Evidence for Emission Allocation and Compensation Strategies in China. Environ. Sci. Technol. 2023, 57, 4720–4731. [Google Scholar] [CrossRef]
Yu, W.; Xu, R.; Ye, T.; Abramson, M.J.; Morawska, L.; Jalaludin, B.; Johnston, F.H.; Henderson, S.B.; Knibbs, L.D.; Morgan, G.G.; et al. Estimates of Global Mortality Burden Associated with Short-Term Exposure to Fine Particulate Matter (PM2·5). Lancet Planet. Health 2024, 8, e146–e155. [Google Scholar] [CrossRef] [PubMed]
Ni, R.; Su, H.; Burnett, R.T.; Guo, Y.; Cheng, Y. Long-Term Exposure to PM_2.5 Has Significant Adverse Effects on Childhood and Adult Asthma: A Global Meta-Analysis and Health Impact Assessment. One Earth 2024, 7, 1953–1969. [Google Scholar] [CrossRef]
Zhou, J.-X.; Peng, Z.-X.; Zheng, Z.-Y.; Ni, H.-G. Big Picture Thinking of Global PM_2.5-Related COPD: Spatiotemporal Trend, Driving Force, Minimal Burden and Economic Loss. J. Hazard. Mater. 2025, 488, 137321. [Google Scholar] [CrossRef]
Montone, R.A.; Rinaldi, R.; Bonanni, A.; Severino, A.; Pedicino, D.; Crea, F.; Liuzzo, G. Impact of Air Pollution on Ischemic Heart Disease: Evidence, Mechanisms, Clinical Perspectives. Atherosclerosis 2023, 366, 22–31. [Google Scholar] [CrossRef]
Saini, P.; Sharma, M. Cause and Age-Specific Premature Mortality Attributable to PM_2.5 Exposure: An Analysis for Million-Plus Indian Cities. Sci. Total Environ. 2020, 710, 135230. [Google Scholar] [CrossRef]
Sangkham, S.; Phairuang, W.; Sherchan, S.P.; Pansakun, N.; Munkong, N.; Sarndhong, K.; Islam, A.; Sakunkoo, P. An Update on Adverse Health Effects from Exposure to PM_2.5. Environ. Adv. 2024, 18, 100603. [Google Scholar] [CrossRef]
Lin, C.-H.; Lung, S.-C.C.; Chen, Y.-C.; Wang, L.-C. Pulmonary Toxicity of Actual Alveolar Deposition Concentrations of Ultrafine Particulate Matters in Human Normal Bronchial Epithelial Cell. Environ. Sci. Pollut. Res. 2021, 28, 50179–50187. [Google Scholar] [CrossRef]
Thiankhaw, K.; Chattipakorn, N.; Chattipakorn, S.C. PM_2.5 Exposure in Association with AD-Related Neuropathology and Cognitive Outcomes. Environ. Pollut. 2022, 292, 118320. [Google Scholar] [CrossRef]
Krzyzanowski, B.; Searles Nielsen, S.; Turner, J.R.; Racette, B.A. Fine Particulate Matter and Parkinson Disease Risk Among Medicare Beneficiaries. Neurology 2023, 101, e2058–e2067. [Google Scholar] [CrossRef]
Lončar, D.; Tyack, N.B.; Krstić, V.; Paunković, J. Methods for Assessing the Impact of PM_2.5 Concentration on Mortality While Controlling for Socio-Economic Factors. Heliyon 2022, 8, e10729. [Google Scholar] [CrossRef]
Xie, Y.; Zhong, H.; Weng, Z.; Guo, X.; Kim, S.E.; Wu, S. PM_2.5 Concentration Declining Saves Health Expenditure in China. Front. Environ. Sci. Eng. 2023, 17, 90. [Google Scholar] [CrossRef]
Frazenburg, C.; Sepadi, M.M.; Chitakira, M. Investigating the Disproportionate Impacts of Air Pollution on Vulnerable Populations in South Africa: A Systematic Review. Atmosphere 2025, 16, 49. [Google Scholar] [CrossRef]
Yin, H.; McDuffie, E.E.; Martin, R.V.; Brauer, M. Global Health Costs of Ambient PM2·5 from Combustion Sources: A Modelling Study Supporting Air Pollution Control Strategies. Lancet Planet. Health 2024, 8, e476–e488. [Google Scholar] [CrossRef]
Tiwari, A. Chapter 2–Supervised Learning: From Theory to Applications. In Artificial Intelligence and Machine Learning for EDGE Computing; Pandey, R., Khatri, S.K., Singh, N.K., Verma, P., Eds.; Academic Press: Cambridge, MA, USA, 2022; pp. 23–32. ISBN 978-0-12-824054-0. [Google Scholar]
Shahriar, S.A.; Kayes, I.; Hasan, K.; Hasan, M.; Islam, R.; Awang, N.R.; Hamzah, Z.; Rak, A.E.; Salam, M.A. Potential of ARIMA-ANN, ARIMA-SVM, DT and CatBoost for Atmospheric PM_2.5 Forecasting in Bangladesh. Atmosphere 2021, 12, 100. [Google Scholar] [CrossRef]
Choojam, S.; Chumnau, J.; Jetwanna, K.W.N. Accurate Model for Forecasting PM_2.5 Concentrations in Hat Yai, Songkhla, Thailand: The ARIMA-ANN-REG HybridApproach via AAR4PM. EnvironmentAsia 2024, 17, 115. [Google Scholar] [CrossRef]
Kim, B.-Y.; Lim, Y.-K.; Cha, J.W. Short-Term Prediction of Particulate Matter (PM₁₀ and PM_2.5) in Seoul, South Korea Using Tree-Based Machine Learning Algorithms. Atmos. Pollut. Res. 2022, 13, 101547. [Google Scholar] [CrossRef]
Li, J.; Crooks, J.; Murdock, J.; de Souza, P.; Hohsfield, K.; Obermann, B.; Stockman, T. A Nested Machine Learning Approach to Short-Term PM_2.5 Prediction in Metropolitan Areas Using PM_2.5 Data from Different Sensor Networks. Sci. Total Environ. 2023, 873, 162336. [Google Scholar] [CrossRef]
Bera, B.; Bhattacharjee, S.; Sengupta, N.; Saha, S. PM_2.5 Concentration Prediction during COVID-19 Lockdown over Kolkata Metropolitan City, India Using MLR and ANN Models. Environ. Chall. 2021, 4, 100155. [Google Scholar] [CrossRef]
Yang, X.; Song, Z.; King, I.; Xu, Z. A Survey on Deep Semi-Supervised Learning. IEEE Trans. Knowl. Data Eng. 2023, 35, 8934–8954. [Google Scholar] [CrossRef]
Ramadan, M.N.A.; Ali, M.A.H.; Khoo, S.Y.; Alkhedher, M. SecureIoT-FL: A Federated Learning Framework for Privacy-Preserving Real-Time Environmental Monitoring in Industrial IoT Applications. Alex. Eng. J. 2025, 114, 681–701. [Google Scholar] [CrossRef]
Paluang, P.; Thavorntam, W.; Phairuang, W. Application of Multilayer Perceptron Artificial Neural Network (MLP-ANN) Algorithm for PM_2.5 Mass Concentration Estimation during Open Biomass Burning Episodes in Thailand. Int. J. Geoinformatics 2024, 20, 28–42. [Google Scholar] [CrossRef]
Zhang, M. Unsupervised Learning Algorithms in Big Data: An Overview; Atlantis Press: Dordrecht, The Netherlands, 2022; pp. 910–931. [Google Scholar]
Yadav, V.; Yadav, A.K.; Singh, V.; Singh, T. Artificial Neural Network an Innovative Approach in Air Pollutant Prediction for Environmental Applications: A Review. Results Eng. 2024, 22, 102305. [Google Scholar] [CrossRef]
Huang, H.; Qian, C. Modeling PM_2.5 Forecast Using a Self-Weighted Ensemble GRU Network: Method Optimization and Evaluation. Ecol. Indic. 2023, 156, 111138. [Google Scholar] [CrossRef]
Huang, G.; Li, X.; Zhang, B.; Ren, J. PM_2.5 Concentration Forecasting at Surface Monitoring Sites Using GRU Neural Network Based on Empirical Mode Decomposition. Sci. Total Environ. 2021, 768, 144516. [Google Scholar] [CrossRef]
Karimian, H.; Li, Y.; Chen, Y.; Wang, Z. Evaluation of Different Machine Learning Approaches and Aerosol Optical Depth in PM_2.5 Prediction. Environ. Res. 2023, 216, 114465. [Google Scholar] [CrossRef]
Shakya, A.K.; Pillai, G.; Chakrabarty, S. Reinforcement Learning Algorithms: A Brief Survey. Expert Syst. Appl. 2023, 231, 120495. [Google Scholar] [CrossRef]
Guyu, Z.; Xiaoyuan, Y.; Jiansen, S.; Hongdou, H.; Qian, W. A PM_2.5 Spatiotemporal Prediction Model Based on Mixed Graph Convolutional GRU and Self-Attention Network. Environ. Pollut. 2025, 368, 125748. [Google Scholar] [CrossRef]
Zhang, K.; Yang, X.; Cao, H.; Thé, J.; Tan, Z.; Yu, H. Multi-Step Forecast of PM_2.5 and PM₁₀ Concentrations Using Convolutional Neural Network Integrated with Spatial–Temporal Attention and Residual Learning. Environ. Int. 2023, 171, 107691. [Google Scholar] [CrossRef]
Terroso-Saenz, F.; Morales-García, J.; Muñoz, A. Nationwide Air Pollution Forecasting with Heterogeneous Graph Neural Networks. ACM Trans. Intell. Syst. Technol. 2024, 15, 1–19. [Google Scholar] [CrossRef]
Gawlikowski, J.; Tassi, C.R.N.; Ali, M.; Lee, J.; Humt, M.; Feng, J.; Kruspe, A.; Triebel, R.; Jung, P.; Roscher, R.; et al. A Survey of Uncertainty in Deep Neural Networks. Artif. Intell. Rev. 2023, 56, 1513–1589. [Google Scholar] [CrossRef]
Mohammadi, F.; Teiri, H.; Hajizadeh, Y.; Abdolahnejad, A.; Ebrahimi, A. Prediction of Atmospheric PM_2.5 Level by Machine Learning Techniques in Isfahan, Iran. Sci. Rep. 2024, 14, 2109. [Google Scholar] [CrossRef]
Pan, P.; Malarvizhi, A.S.; Yang, C. Data Augmentation Strategies for Improved PM_2.5 Forecasting Using Transformer Architectures. Atmosphere 2025, 16, 127. [Google Scholar] [CrossRef]
Faraji, M.; Nadi, S.; Ghaffarpasand, O.; Homayoni, S.; Downey, K. An Integrated 3D CNN-GRU Deep Learning Method for Short-Term Prediction of PM_2.5 Concentration in Urban Environment. Sci. Total Environ. 2022, 834, 155324. [Google Scholar] [CrossRef]
Hosna, A.; Merry, E.; Gyalmo, J.; Alom, Z.; Aung, Z.; Azim, M.A. Transfer Learning: A Friendly Introduction. J. Big Data 2022, 9, 102. [Google Scholar] [CrossRef]
Ramadan, M.N.A.; Ali, M.A.H.; Jaber, H.; Alkhedher, M. Blockchain-Secured IoT-Federated Learning for Industrial Air Pollution Monitoring: A Mechanistic Approach to Exposure Prediction and Environmental Safety. Ecotoxicol. Environ. Saf. 2025, 300, 118442. [Google Scholar] [CrossRef]
Wong, P.-Y.; Su, H.-J.; Candice Lung, S.-C.; Liu, W.-Y.; Tseng, H.-T.; Adamkiewicz, G.; Wu, C.-D. Explainable Geospatial-Artificial Intelligence Models for the Estimation of PM_2.5 Concentration Variation during Commuting Rush Hours in Taiwan. Environ. Pollut. 2024, 349, 123974. [Google Scholar] [CrossRef]
Rautela, K.S.; Goyal, M.K. Transforming Air Pollution Management in India with AI and Machine Learning Technologies. Sci. Rep. 2024, 14, 20412. [Google Scholar] [CrossRef]
Shakya, D.; Deshpande, V.; Goyal, M.K.; Agarwal, M. PM_2.5 Air Pollution Prediction through Deep Learning Using Meteorological, Vehicular, and Emission Data: A Case Study of New Delhi, India. J. Clean. Prod. 2023, 427, 139278. [Google Scholar] [CrossRef]
Kumar, V.; Sahu, M. Evaluation of Nine Machine Learning Regression Algorithms for Calibration of Low-Cost PM_2.5 Sensor. J. Aerosol Sci. 2021, 157, 105809. [Google Scholar] [CrossRef]
Park, D.; Yoo, G.-W.; Park, S.-H.; Lee, J.-H. Assessment and Calibration of a Low-Cost PM_2.5 Sensor Using Machine Learning (HybridLSTM Neural Network): Feasibility Study to Build an Air Quality Monitoring System. Atmosphere 2021, 12, 1306. [Google Scholar] [CrossRef]
Srisang, W.; Jaroensutasinee, K.; Jaroensutasinee, M.; Khongthong, C.; Piamonte, J.R.P.; Sparrow, E.B. PM_2.5 IoT Sensor Calibration and Implementation Issues Including Machine Learning. Emerg. Sci. J. 2024, 8, 2267–2277. [Google Scholar] [CrossRef]
Qor-el-aine, A.; Béres, A.; Géczi, G. Calibration of CAMS PM_2.5 Data over Hungary: A Machine Learning Approach. Environ. Res. Commun. 2024, 6, 075026. [Google Scholar] [CrossRef]
Ly, B.-T.; Matsumi, Y.; Vu, T.V.; Sekiguchi, K.; Nguyen, T.-T.; Pham, C.-T.; Nghiem, T.-D.; Ngo, I.-H.; Kurotsuchi, Y.; Nguyen, T.-H.; et al. The Effects of Meteorological Conditions and Long-Range Transport on PM_2.5 Levels in Hanoi Revealed from Multi-Site Measurement Using Compact Sensors and Machine Learning Approach. J. Aerosol Sci. 2021, 152, 105716. [Google Scholar] [CrossRef]
Chai, J.; Song, J.; Xu, Y.; Zhang, L.; Guo, B. Enhancing the Applicability of Satellite Remote Sensing for PM_2.5 Estimation Using Machine Learning Models in China. J. Sens. 2022, 2022, 7148682. [Google Scholar] [CrossRef]
Lee, J.; Barquilla, C.A.M.; Park, K.; Hong, A. Urban Form and Seasonal PM_2.5 Dynamics: Enhancing Air Quality Prediction Using Interpretable Machine Learning and IoT Sensor Data. Sustain. Cities Soc. 2024, 117, 105976. [Google Scholar] [CrossRef]
Li, T.; Huang, X.; Zhang, Q.; Wang, X.; Wang, X.; Zhu, A.; Wei, Z.; Wang, X.; Wang, H.; Chen, J.; et al. Machine Learning-Guided Integration of Fixed and Mobile Sensors for High Resolution Urban PM_2.5 Mapping. npj Clim. Atmos. Sci. 2025, 8, 95. [Google Scholar] [CrossRef]
Yang, J.; Yan, R.; Nong, M.; Liao, J.; Li, F.; Sun, W. PM_2.5 Concentrations Forecasting in Beijing through Deep Learning with Different Inputs, Model Structures and Forecast Time. Atmos. Pollut. Res. 2021, 12, 101168. [Google Scholar] [CrossRef]
Zaini, N.; Ean, L.W.; Ahmed, A.N.; Abdul Malek, M.; Chow, M.F. PM_2.5 Forecasting for an Urban Area Based on Deep Learning and Decomposition Method. Sci. Rep. 2022, 12, 17565. [Google Scholar] [CrossRef]
Wood, D.A. Trend-Attribute Forecasting of Hourly PM_2.5 Trends in Fifteen Cities of Central England Applying Optimized Machine Learning Feature Selection. J. Environ. Manag. 2024, 356, 120561. [Google Scholar] [CrossRef] [PubMed]
Abuouelezz, W.; Ali, N.; Aung, Z.; Altunaiji, A.; Shah, S.B.; Gliddon, D. Exploring PM_2.5 and PM₁₀ ML Forecasting Models: A Comparative Study in the UAE. Sci. Rep. 2025, 15, 9797. [Google Scholar] [CrossRef] [PubMed]
Makhdoomi, A.; Sarkhosh, M.; Ziaei, S. PM_2.5 Concentration Prediction Using Machine Learning Algorithms: An Approach to Virtual Monitoring Stations. Sci. Rep. 2025, 15, 8076. [Google Scholar] [CrossRef]
Wen, Z.; Ma, X.; Xu, W.; Si, R.; Liu, L.; Ma, M.; Zhao, Y.; Tang, A.; Zhang, Y.; Wang, K.; et al. Combined Short-Term and Long-Term Emission Controls Improve Air Quality Sustainably in China. Nat. Commun. 2024, 15, 5169. [Google Scholar] [CrossRef]
Patel, V.K.; Kuttippurath, J.; Kashyap, R. Increased Global Cropland Greening as a Response to the Unusual Reduction in Atmospheric PM_2.5 Concentrations during the COVID-19 Lockdown Period. Chemosphere 2024, 358, 142147. [Google Scholar] [CrossRef] [PubMed]
Yang, K.; Lin, F.; Wang, X.; Wang, H.; Shi, Y.; Chen, L.; Weng, Y.; Chen, X.; Zeng, Y.; Wang, Y.; et al. Residential Blue Space, Cognitive Function, and the Role of Air Pollution in Middle-Aged and Older Adults: A Cross-Sectional Study Based on UK Biobank. Ecotoxicol. Environ. Saf. 2024, 288, 117355. [Google Scholar] [CrossRef] [PubMed]
Irga, P.J.; Morgan, A.; Fleck, R.; Torpy, F.R. Phytoremediation of Indoor Air Pollutants from Construction and Transport by a Moveable Active Green Wall System. Atmos. Pollut. Res. 2023, 14, 101896. [Google Scholar] [CrossRef]
Han, Y.; Lee, J.; Haiping, G.; Kim, K.-H.; Wanxi, P.; Bhardwaj, N.; Oh, J.-M.; Brown, R.J.C. Plant-Based Remediation of Air Pollution: A Review. J. Environ. Manag. 2022, 301, 113860. [Google Scholar] [CrossRef]
Park, S.-J.; Kang, G.; Choi, W.; Kim, D.-Y.; Kim, J.; Kim, J.-J. Effects of Fences and Green Zones on the Air Flow and PM_2.5 Concentration around a School in a Building-Congested District. Appl. Sci. 2021, 11, 9216. [Google Scholar] [CrossRef]
Ma, X.; Wang, M.; She, X.; Zhao, J. Unlocking Urban Breathability: Investigating the Synergistic Mitigation of PM_2.5 and CO2 by Community Park Green Space in the Built Environment Using Simulation. Buildings 2024, 14, 3407. [Google Scholar] [CrossRef]
Plitsiri, I.; Taemthong, W. Green Wall Systems as a Solution for PM_2.5 Mitigation in Indoor Environments: Comparing Passive and Active Systems. Int. J. Environ. Sci. Dev. 2024, 15, 294–299. [Google Scholar] [CrossRef]
Muenrew, J.; Rakarcha, S.; Nuammee, A.; Panyadee, P.; Tala, W.; Yabueng, N.; Chantara, S. Efficiency of Tropical Plants and Smart Green Wall on Reduction of Fine Particulate Matters (PM_2.5 and PM0.3–1.1) in a Closed-System Chamber. Environ. Technol. Innov. 2025, 39, 104268. [Google Scholar] [CrossRef]
Chen, Y.; Ke, X.; Min, M.; Zhang, Y.; Dai, Y.; Tang, L. Do We Need More Urban Green Space to Alleviate PM_2.5 Pollution? A Case Study in Wuhan, China. Land 2022, 11, 776. [Google Scholar] [CrossRef]
Li, K.; Li, C.; Liu, M.; Hu, Y.; Wang, H.; Wu, W. Multiscale Analysis of the Effects of Urban Green Infrastructure Landscape Patterns on PM_2.5 Concentrations in an Area of Rapid Urbanization. J. Clean. Prod. 2021, 325, 129324. [Google Scholar] [CrossRef]
Luo, S.; Chen, W.; Sheng, Z.; Wang, P. The Impact of Urban Green Space Landscape on PM_2.5 in the Central Urban Area of Nanchang City, China. Atmos. Pollut. Res. 2023, 14, 101903. [Google Scholar] [CrossRef]
Jiang, R.; Xie, C.; Man, Z.; Zhou, R.; Che, S. Effects of Urban Green and Blue Space on the Diffusion Range of PM_2.5 and PM₁₀ Based on LCZ. Land 2023, 12, 964. [Google Scholar] [CrossRef]
Fan, Z.; Zhan, Q.; Liu, H.; Wu, Y.; Xia, Y. Investigating the Interactive and Heterogeneous Effects of Green and Blue Space on Urban PM_2.5 Concentration, a Case Study of Wuhan. J. Clean. Prod. 2022, 378, 134389. [Google Scholar] [CrossRef]
Cao, W.; Wang, L.; Li, R.; Zhou, W.; Zhang, D. Unveiling the Nonlinear Relationships and Co-Mitigation Effects of Green and Blue Space Landscapes on PM_2.5 Exposure through Explainable Machine Learning. Sustain. Cities Soc. 2025, 122, 106234. [Google Scholar] [CrossRef]
Jeong, C.; Heo, S.; Woo, T.; Kim, S.; Yoo, C. AI-Driven Ventilation Control Policy Proximal Optimization Coupled with Dynamic-Informed Real-Time Model Calibration for Healthy and Sustainable Indoor PM_2.5 Management. Energy Build. 2024, 323, 114786. [Google Scholar] [CrossRef]
Wang, N.; Wei, C.; Zhao, X.; Wang, S.; Ren, Z.; Ni, R. Does Green Technology Innovation Reduce Anthropogenic PM_2.5 Emissions? Evidence from China’s Cities. Atmos. Pollut. Res. 2023, 14, 101699. [Google Scholar] [CrossRef]
Sokolovskij, E.; Kilikevičius, A.; Chlebnikovas, A.; Matijošius, J.; Vainorius, D. Innovative Electrostatic Precipitator Solutions for Efficient Removal of Fine Particulate Matter: Enhancing Performance and Energy Efficiency. Machines 2024, 12, 761. [Google Scholar] [CrossRef]
Vakharia, A.; Chavan, A. Development of a Compact IoT-Enabled Air Purification System for Indoor Air Quality Improvement. In Proceedings of the 2025 5th International Conference on Trends in Material Science and Inventive Materials (ICTMIM), Kanyakumari, India, 7–9 April 2025; pp. 1115–1123. [Google Scholar]
Zhu, X.; Jin, Q. Investigating the GHG Emissions, Air Pollution and Public Health Impacts from China’s Aluminium Industry: Historical Variations and Future Mitigation Potential. J. Environ. Manag. 2025, 376, 124530. [Google Scholar] [CrossRef] [PubMed]
Cuéllar-Álvarez, Y.; Guevara-Luna, M.A.; Belalcázar-Cerón, L.C.; Clappier, A. Well-to-Wheels Emission Inventory for the Passenger Vehicles of Bogotá, Colombia. Int. J. Environ. Sci. Technol. 2023, 20, 12141–12154. [Google Scholar] [CrossRef]
Vallejo, F.; Villacrés, P.; Yánez, D.; Espinoza, L.; Bodero-Poveda, E.; Díaz-Robles, L.A.; Oyaneder, M.; Campos, V.; Palmay, P.; Cordovilla-Pérez, A.; et al. Prolonged Power Outages and Air Quality: Insights from Quito’s 2023–2024 Energy Crisis. Atmosphere 2025, 16, 274. [Google Scholar] [CrossRef]
Lee, D.; Barquilla, C.A.M.; Lee, J. Analyzing Dispersion Characteristics of Fine Particulate Matter in High-Density Urban Areas: A Study Using CFD Simulation and Machine Learning. Land 2025, 14, 632. [Google Scholar] [CrossRef]
Xu, G.; Liu, H.; Jia, C.; Zhou, T.; Shang, J.; Zhang, X.; Wang, Y.; Wu, M. Spatiotemporal Patterns and Drivers of Public Concern about Air Pollution in China: Leveraging Online Big Data and Interpretable Machine Learning. Environ. Impact Assess. Rev. 2025, 114, 107897. [Google Scholar] [CrossRef]
Razak, I.H.A.; Phuang, Z.X.; Woon, K.S.; Sudesh, K. Life Cycle Assessment on Global Warming and Fine Particulate Matter Formation for Biological Extraction Method in Polyhydroxyalkanoates (PHA) Production: A Sustainable Alternative. Chem. Eng. Trans. 2024, 113, 397–402. [Google Scholar] [CrossRef]
ApSimon, H.; Oxley, T.; Woodward, H.; Mehlig, D.; Dore, A.; Holland, M. The UK Integrated Assessment Model for Source Apportionment and Air Pollution Policy Applications to PM_2.5. Environ. Int. 2021, 153, 106515. [Google Scholar] [CrossRef]
Guo, X.; Jia, C.; Xiao, B. Spatial Variations of PM_2.5 Emissions and Social Welfare Induced by Clean Heating Transition: A Gridded Cost-Benefit Analysis. Sci. Total Environ. 2022, 826, 154065. [Google Scholar] [CrossRef]

Figure 1. Machine learning in PM_2.5 management.

Figure 2. A scatter plot compares reference-grade BAM measurements with those from the low-cost PM sensor (AQ_88) in the test set, illustrating their PM_2.5 relationship both before and after calibration with the proposed random forest model [17].

Figure 3. Air-mass trajectory analysis includes (a) a spatial map of clustered trajectories and (b) the partial influence of these trajectories on PM_2.5 concentrations. Reprinted from [78].

Figure 4. (a) The average PM_2.5 prediction performance across three major cities and (b) the average improvement rate achieved by the proposed STA-ResCNN model for PM_2.5 prediction. Reprinted from [63].

Figure 5. Displaying the hourly time series of observed PM_2.5 concentrations alongside predictions generated by the LGB and ADAM models for the period 1–6 March 2019. Shaded regions surrounding each line represent the corresponding standard deviations [50].

Figure 6. (a) Passive green wall (PGW) and (b) active green wall (AGW) systems demonstrate PM_2.5 deposition patterns on Nephrolepis exaltata (L.) Schott leaf surfaces [94].

Figure 7. The smart green wall system demonstrated a significant reduction in PM_2.5 concentrations compared to the control empty room condition [95].

Table 1. Major Sources of PM_2.5 and their Atmospheric Contributions.

Source Type	Specific Source	Contribution/Characteristics	Reference
Anthropogenic	Vehicular exhaust	25–40% of urban PM_2.5 in developed regions	[22,26]
	Coal-fired power plants	Major contributor to combustion-related PM_2.5	[23]
	Industrial manufacturing	Persistent contributor in urban industrial zones	[24]
	Biomass burning (agricultural/residential)	30–50% in rural areas in developing countries	[25,26]
	Diesel combustion in urban traffic	Responsible for localized PM_2.5 hotspots	[27]
	Crop residue burning	Increases rural PM_2.5 background levels	[28]
	Household solid fuel combustion	Significant in rural areas	[29]
Combustion Byproduct	Black Carbon (BC)	5–15% of PM_2.5 mass in megacities; elevated near industrial zones	[26]
Secondary Sources	Sulphur dioxide and nitrogen oxides	Undergo atmospheric chemical transformation to sulfates and nitrates, increasing PM_2.5 load	[30]
Natural Sources	Dust storms, volcanic activity, wildfires, sea spray	Variable by geoclimatic zone; less controllable	[31]
Meteorological Factors	Wind speed, humidity, solar radiation	Influence source interaction and spatial distribution of PM_2.5	[32]
Transboundary Pollution	Regional emissions from outside areas	~50% of PM_2.5 mortality burden in China attributed to external sources	[27]

Table 2. Summary of Key Machine Learning Paradigms for PM_2.5 Monitoring.

ML Paradigm	Data Requirement	Interpretability	Computational Demand	Spatiotemporal Resolution	Scalability	Representative Applications
Supervised	High (requires labeled data)	Moderate to High (via XAI)	Moderate	High (with GNN, LSTM integration)	Good	Forecasting, sensor calibration, source attribution
Semi-Supervised	Moderate (limited labeled + large unlabeled data)	Moderate	Moderate	High	Excellent	Network augmentation, unmonitored site estimation, data-scarce regions
Unsupervised	Low (unlabeled data only)	Low to Moderate	Low	Medium	High	Emission clustering, feature extraction, anomaly detection
Reinforcement	Moderate (requires interactive environment)	Low	High	High (adaptive learning)	Moderate	Real-time deployment, sensor network optimization, strategy control
Hybrid	Variable (multi-modal and multi-source data)	Low to High (context-dependent)	High	Very High (spatiotemporal + domain fusion)	Good	Multi-station modeling, fusion of physics and ML, high-resolution forecasting
Physics-Informed	Moderate to High	High (integrated domain knowledge)	High	High	Moderate	Physically consistent modeling, interpretable forecasting, regulatory-relevant applications
Transfer Learning	Low (for new domain adaptation)	Moderate	Moderate	High	Excellent	Cross-regional generalization, rapid deployment in data-scarce environments

Table 3. Comparative Analysis of ML-Based Calibration Methods for PM_2.5 Sensing.

Sensor Type	Reference Instrument	ML Models Evaluated	Best Performing Model(s)	Performance Metrics	Ref.
Plantower PMS5003	Thermo Fisher SHARP 5030	MLR, Lasso, Ridge, SVR, MLP, Regression Tree, kNN, RF, GB	kNN, RF, GB	Train Score = 0.99; Test Score: kNN = 0.97, RF = 0.96, GB = 0.95	Kumar and Sahu [74]
Unspecified low-cost sensor	Gravimetric method	MLR, DNN, HybridLSTM	HybridLSTM	RMSE reduced 41–60% (raw), 30–51% (MLR), 8–40% (DNN); R² = 0.93 (HybridLSTM)	Park et al. [75]
AirQo Sensors	Beta Attenuation Monitor (BAM)	kNN, SVR, MLR, Ridge, Lasso, Elastic Net, XGBoost, MLP, RF, GB	Random Forest	RMSE reduced from 18.6 to 7.2 µg/m³ (mean ref. = 37.8 µg/m³)	Adong et al. [17]
Plantower PMS3003	Davis AirLink	Linear Regression, Decision Trees, RF, GBT, kNN, NN, Gaussian Process	Decision Trees, Neural Networks	R² > 0.858 after implementing a suitable calibration model over a 320-day study	Srisang et al. [76]
CAMS Model Data (0.1° × 0.1° grid)	In situ air quality stations (Hungary)	LightGBM, RF, MLR	LightGBM	R² up to 0.93, SR ~0.95, RSR < 0.5, NSE > 0.75	Qor-el-aine et al. [77]

Table 4. Comparative Analysis of ML-Driven PM_2.5 Sensing Frameworks.

Study Objective	ML Algorithm	Key Innovation	Sensor Infrastructure	Dominant Predictors	Geographic Scope	Data Fusion Strategy	Ref
Attribution of PM_2.5 to meteorology and long-range transport	Random Forest	Integration of RF with CWT for transboundary analysis	Compact air sensors at three sites	Wind direction, temperature, humidity, source trajectory	Hanoi, Thai Nguyen (Vietnam)	RF + CWT + weather data	Ly et al. [78]
Improve PM_2.5 estimation using satellite-AOD with ML	eXtreme Gradient Boosting (XGBoost)	Satellite-AOD and LUR fusion with XGBoost	Satellite MODIS AOD data, limited ground monitors	AOD, land use, anthropogenic indicators	Nationwide (China)	Satellite AOD + XGBoost + LUR	Chai et al. [79]
Develop high-resolution exposure models integrating mobility	AutoML with hybrid Land Use Regression (LUR)	Incorporation of SafeGraph mobility data into exposure modeling	PurpleAir sensors and regulatory monitors (eight US cities)	AOD, NDVI, meteorology, time, land use, human mobility	Eight major U.S. cities	AutoML + sensor + mobility + LUR	Yu et al. [13]
Explore PM_2.5–urban form–seasonal interaction using IoT and ML	Random Forest with Recursive Feature Elimination (RFE)	Urban morphology and seasonally differentiated analysis	1069 IoT sensors across Seoul	Building density, green space, road width, traffic	Seoul, South Korea	IoT + urban morphology + RF	Lee et al. [80]
High-resolution PM_2.5 mapping via fixed-mobile sensor fusion	LightGBM with SHAP (XAI-enhanced)	Real-time, fused mobile–fixed monitoring with XAI guidance	614 fixed micro-sensors + 200 mobile vehicle sensors	Secondary inorganic aerosols, meteorology, traffic, urban form	Jinan, China	Fixed and mobile sensors + meteorology + SHAP	Li et al. [81]

Table 5. Summary of ML-Based PM_2.5 Forecasting Models.

Location and Data Period	Models Used	Forecast Horizon	Key Input Features	Architecture Type	Best Performing Model(s)	Reference
Beijing, China (2015–2016)	LSTM, CNN, CNN-LSTM, BPNN	1 to 24 h	Historical PM_2.5, meteorological parameters, co-pollutants	Deep learning (LSTM, CNN, hybrid)	CNN-LSTM (1–12 h), LSTM (>12 h)	Yang et al. [82]
Cheras and Batu Muda, Malaysia	EEMD-LSTM	1 h	Decomposed PM_2.5 IMFs, atmospheric parameters	Hybrid signal processing + deep learning	EEMD-LSTM	Zaini et al. [83]
Yangtze River Delta (3 + 23 cities)	STA-ResCNN, CNN, LSTM	1 to 4 h	PM_2.5, PM₁₀, meteorological data, spatiotemporal correlations	CNN + Residual + Spatial–Temporal Attention	STA-ResCNN	Zhang et al. [63]
15 cities, Central England (2018–2022)	LASSO, KNN, SVR, XGB	t0 to t + 12 h	Trend attributes (t-1 to t-12), selected via feature optimization	Supervised ML (regression + boosting)	LASSO (efficiency), SVR (accuracy)	Wood [84]
Abu Dhabi, UAE (5 years)	DT, RF, SVR, CNN, LSTM, Prophet	1–2 h, 1 day, 1 week	Historical PM_2.5, meteorological data	ML, DL, and time-series hybrid	SVR (short-term), Prophet (long-term)	Abuouelezz et al. [85]
Dhaka, Narayanganj, Gazipur, Bangladesh (2013–2019)	ARIMA-ANN, ARIMA-SVM, DT, CatBoost, PCR	1 day	Historical PM_2.5, meteorological parameters, air quality indicators	Hybrid ML + Tree-based models	CatBoost	Shahriar et al. [48]
Hat Yai, Thailand (2016–2022)	ARIMA, ANN, ARIMA-ANN, ARIMA-ANN-REG	1 day	Historical PM_2.5 concentrations, meteorological variables	Hybrid statistical + deep learning + regression layer	ARIMA-ANN-REG	Choojam et al. [49]
Multi-regional (three datasets)	Self-weighted VMD-GRU	Multi-scale	Decomposed PM_2.5 IMFs via VMD	Adaptive ensemble deep learning	Self-weighted VMD-GRU	Huang and Qian [58]
Beijing, China	EMD-GRU	Short-term (unspecified)	Decomposed PM_2.5 (IMFs) + meteorological features	Hybrid EMD + GRU	EMD-GRU	Huang et al. [59]

Table 6. Summary of PM_2.5 prediction studies, covering data characteristics, temporal scale and spatial resolutions.

Study Location	Temporal Scale	ML Models Used	Best Model Identified	Input Features	Sensor Type	Data Period	Spatial Resolution	Comparison with Traditional Models	Ref
Denver, USA	Hourly	GNN-LSTM + FC NN	GNN-LSTM + FC	PM_2.5 from dual networks, socio-environmental data	Regulatory + low-cost sensors	2021	Unmonitored sites estimation	Outperformed baseline models	Li et al. [51]
Pukou, China	Hourly	WT + XGBoost, RF, GBRT, MLR	WT + XGBoost	PM_2.5 time series + AOD; decomposed via WT + mRMR feature selection	Ground PM_2.5, AOD	2016–2017	Single location	Outperformed MLR by large margins	Karimian et al. [60]
Seoul, South Korea	Hourly and Daily	LightGBM (LGB), RF, others	LightGBM	Meteorological forecast data (LDAPS), station location, time features	Ground monitoring + meteorological forecasts	July 2018–June 2021	City-wide prediction	Outperformed CTM by 21% (%RMSE) and 0.20 (R²)	Kim et al. [50]
Isfahan, Iran	Daily	ANN, KNN, SVM, RF	ANN	Meteorological data (Tavg, RH, Precip., WD, WSavg, WSmax)	Ground meteorological stations	9 years (unspecified)	City-wide prediction	ANN outperformed other ML models	Mohammadi et al. [66]
Mashhad, Iran	Daily	LGBM, XGBR, RF, GBR	GBR	Meteorological + air quality (e.g., visibility, RH, wind, dust freq.)	Ground meteorological and air quality stations	2016–2022	Urban-wide virtual stations	GBR outperformed other ML models	Makhdoomi et al. [86]
Kolkata, India	Daily	MLR, ANN	ANN	PM_2.5, Meteorological parameters	Regulatory monitors + weather repositories	2020 (Lockdown period)	City-wide	ANN outperformed MLR (R²: 0.91, RMSE: 3.74)	Bera et al. [52]
Northern Thailand	Daily	MLP-ANN	MLP-ANN (8-16-1)	Meteorological data, AOD, open biomass burning emissions	Satellite + ground monitors	Dry season 2024	Province level (four provinces)	Underestimated PCD slightly, but effective with AOD + OBB	Paluang et al. [55]
Tehran, Iran	Hourly and Daily	3D CNN-GRU, LSTM, GRU, ANN, SVR, ARIMA	3D CNN-GRU	PM_2.5, spatial correlations, historical AQ data	Ground AQ stations	2016–2019	Urban-wide	Outperformed LSTM/GRU (R²: 0.84 hourly, 0.78 daily)	Faraji et al. [68]
Beijing–Tianjin–Hebei, China	Hourly	MGCGRU-SAN	MGCGRU-SAN	Short-term AQ + meteorology, Long-term PM_2.5, spatial graph, SAN	Ground stations	2022–2023	Multi-city, multi-station	Outperformed baselines by 6–9% across metrics	Guyu et al. [62]

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

From Detection to Solution: A Review of Machine Learning in PM_2.5 Sensing and Sustainable Green Mitigation Approaches (2021–2025)

Abstract

1. Introduction

2. Review Methodology

3. PM_2.5 Pollution: Source Complexity and Societal Impact

3.1. Major Anthropogenic and Natural Sources of PM_2.5

3.2. Health Risks and Socioeconomic Burden