Next Article in Journal
Parallel Surface Renewal for Estimating Turbulent Fluxes in Vineyards and Almond Orchards
Previous Article in Journal
Characterising Multivariate Air Pollution State Evolution in an Urban Atmosphere Using Deep-Learned Baseline Representations: London
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Multi-Method Explainable AI Framework for Quantifying Traffic and Meteorological Contributions to Urban Air Pollution: A Case Study of Istanbul’s Bosphorus Bridge Corridor

1
Department of Climate Science and Meteorological Engineering, İstanbul Technical University, Maslak, 34469 Istanbul, Türkiye
2
Eurasia Institute of Earth Sciences, Climate and Marine Sciences, İstanbul Technical University, Maslak, 34469 Istanbul, Türkiye
*
Author to whom correspondence should be addressed.
Atmosphere 2026, 17(6), 591; https://doi.org/10.3390/atmos17060591 (registering DOI)
Submission received: 12 April 2026 / Revised: 3 June 2026 / Accepted: 4 June 2026 / Published: 9 June 2026
(This article belongs to the Section Air Quality)

Abstract

Urban air pollution results from complex interactions between vehicle emissions, meteorological conditions, and atmospheric chemistry. While machine learning models achieve high accuracy in air quality prediction, their limited transparency hinders policy adoption. We present an integrated (M-ETAQI) framework combining multiple XAI techniques, temporal decomposition, and causal inference to quantify traffic and meteorological contributions to PM10, PM2.5, NOX, and NO2 concentrations in the Istanbul FSM Bridge corridor (2022–2023 hourly data). Five machine learning models, including XGBoost, LightGBM, CatBoost, Random Forest, and CNN–LSTM–Attention, were trained with temporal cross-validation. SHAP, LIME, PDP, and ALE were applied for interpretability; STL decomposition isolated temporal components, and CCM tested causal links. Tree-based models achieved R2 > 0.80 for all pollutants, with CatBoost reaching PM2.5 R2 = 0.876. SHAP confirmed Lag1 as the dominant feature. Wind speed had a significant negative effect on NOX, while traffic contributed ~20% to NOX, twice that of other pollutants. STL showed the trend component dominated total variance; NO2 trend variance = 56.3%. CCM revealed wind speed as the strongest causal driver of NOX (ρ = 0.37) and confirmed direct traffic–NOX links. Knowledge distillation from CatBoost improved CNN–LSTM–Attention performance. The four XAI methods yielded consistent attributions, providing robust, cross-validated evidence for traffic management and air-quality policy.

Graphical Abstract

1. Introduction

Urban air pollution is one of the most significant environmental and public health problems of the 21st century. The World Health Organization (WHO) estimates that ambient air pollution causes approximately 4.2 million premature deaths worldwide each year [1]. Exposure to air pollution has been proven to have harmful effects on physical health, mental health, and mortality rates [2]. Particulate matter (PM) and nitrogen oxides (NOX) in particular have been identified as primary concerns by the WHO and the European Environment Agency (EEA) [1,3].
In cities, vehicle exhaust emissions from internal combustion engines are the primary source of NOX and nitrogen dioxide (NO2), while brake and tire wear and road dust resuspension are major contributors to particulate matter (PM). For example, non-exhaust emissions from vehicles, such as brake and tire wear, have been shown to produce urban particles with diameters lower than 2.5 µm (PM2.5) that can reach levels exceeding those from exhaust emissions in some periods, contributing up to 1.39–7.13% of the total particulate matter concentration. It also significantly contributes to the formation of particulate matter, such as particles with diameters below 10 µm (PM10) and PM2.5 [4,5]. The magnitude of traffic-related emissions is influenced not only by vehicle size but also by driving dynamics such as speed and acceleration patterns [6]. For instance, one study found that measurements under real traffic conditions showed that gasoline vehicles emitted higher levels of carbon monoxide (CO) and carbon dioxide (CO2), while diesel vehicles emitted higher levels of NOX and PM2.5 [7]. Therefore, a proper understanding of vehicle emission distribution in urban environments is critical for developing sustainable transportation policies and identifying strategies to improve air quality [8].
Machine learning (ML) approaches have emerged as powerful tools for air quality prediction. They capture complex, nonlinear relationships among meteorological variables, traffic parameters, and pollutant concentrations that traditional regression methods often fail to model [9]. Recent studies have shown that ensemble tree-based algorithms such as eXtreme Gradient Boosting (XGBoost), Light Gradient Boosting Machine (LightGBM), and Categorical Boosting (CatBoost) achieve the best performance in hourly air quality forecasting [10]. Machine learning models are classified as black-box or white-box. White-box models are transparent and understandable but generally have lower accuracy. In contrast, black-box models are more accurate but lack transparency and require explanation methods [11]. Therefore, this problem has limited their adoption among environmental policymakers and regulators who require transparent, interpretable evidence to justify intervention measures [12].
Explainable Artificial Intelligence (XAI) can demonstrate the relationship between prediction accuracy and scientific interpretability. For example, Local Interpretable Model-Independent Explanations (LIME) provide complementary local surrogate interpretations [13], while Partial Dependence Graphs (PDP) and Accumulated Local Effects (ALE) visualize marginal and unbiased feature effects, respectively [14,15]. More recently, Shapley Additive Explanations (SHAP), based on cooperative game theory, provide theoretically consistent feature-attribution values at both the global and local levels [16]. For instance, one study used the SHAP model to assess the effects of meteorological variables on the air pollutants PM10, PM2.5, and ozone (O3) [17].
Most ML-based air-quality studies have treated temporal patterns as monolithic wholes. For example, air quality index prediction has been performed using daily [18] or hourly [19] data. Therefore, long-term trends, seasonal patterns, and short-term pollution analysis cannot be fully distinguished from meteorological and traffic-related factors. Causal relationships between traffic flow and air pollution have also not been sufficiently investigated. However, correlation-based analyses have been performed. For example, Pearson correlation was used as a statistical test in a study conducted in Melbourne [20], or emission rates have been estimated based on average annual traffic parameters [21]. Furthermore, interpretable machine learning approaches are limited in air pollution prediction studies, with the most commonly used methods being SHAP and partial dependency graphs [22].
The effects of traffic volumes on air pollution have not been fully investigated in Istanbul. However, there are ML-based studies that predict air pollution and assess the impact of meteorological variables. For example, a convolutional neural network-long short-term memory (CNN-LSTM) based spatiotemporal model was developed to predict pollutant concentrations using real sensor data from Istanbul [23]. In addition, the air quality index was predicted using hybrid artificial intelligence (AI) models in Başakşehir, Istanbul [24]. Also, correlation analyses were conducted to examine the effects of meteorological variables (e.g., wind speed and relative humidity) on PM10 concentrations in Istanbul [25,26].
This study proposes a Multimethod Explainable Artificial Intelligence (M-ETAQI) framework to quantify the contributions of traffic and meteorological factors to urban air pollution using hourly data from 2022 to 2023. Recent studies have applied ML and XAI to air pollution forecasting, examined meteorological effects, or performed temporal analyses. However, none have simultaneously integrated multi-method XAI, temporal decomposition, interaction analysis, and causal inference to comprehensively quantify the contributions of traffic and meteorological factors to urban air pollution [27,28,29,30]. The primary contribution of this study lies not in proposing new individual algorithms, but in the systematic integration and cross-validation of these complementary methods within a unified framework, enabling consensus-based interpretability that no single method can achieve alone. This study addresses four key gaps in the existing literature. Multi-method XAI frameworks that simultaneously cross-validate interpretability across SHAP, LIME, PDP, and ALE have not yet been applied in traffic-related air pollution studies. Also, temporal decomposition approaches that specifically address trend, seasonal, and residual components are lacking. The interaction effects of traffic and meteorological variables on pollutant concentrations have not been adequately characterized. The need for causal inference between traffic and air quality systems has not been met. To address these gaps, this study aims to develop and compare XGBoost, LightGBM, CatBoost, Random Forest (RF), and CNN–LSTM–Attention models for hourly PM10, PM2.5, NOX and NO2 prediction to implement a multi-method explainable artificial intelligence framework; to perform component-specific SHAP analysis with STL-based temporal decomposition; and to reveal causal relationships using convergent cross-mapping.

2. Study Area and Data

2.1. Study Area

The study area encompasses the Fatih Sultan Mehmet (FSM) Bridge corridor located in the Sarıyer and Beykoz districts of Istanbul. This 1510 m bridge, connecting the European and Asian continents, has eight lanes of motor vehicle traffic (four lanes in each direction) and is one of Istanbul’s busiest highways; the average daily traffic volume exceeds 250,000 vehicles. The bridge connects the Trans-European Motorway (TEM) on the European side to the O-2 motorway on the Asian side. Traffic data were collected from two Road Traffic Monitoring Remote Traffic Microwave Sensor (RTMS) sensors: RTMS No. 417, located at the TEM FSM Toll Gate Pre-approach on the European side, recording inbound traffic over six lanes, and RTMS No. 268, located approximately 3 km west of the bridge at TEM Levent, recording outbound traffic over lanes 5–8 of the eight-lane section. The combined use of these two sensors provides bidirectional traffic characterization for the FSM corridor. Air pollutant data were obtained from the Maslak air quality monitoring station (AQMS), operated by the Istanbul Metropolitan Municipality (IMM), which measures hourly concentrations of PM10, PM2.5, NOX and NO2. Likewise, meteorological observations were taken by the Maslak AQMS. The proximity of these monitoring systems (within approximately a 1500 m radius) minimizes spatial representation errors in the integrated analysis. The locations of the AQMS, the meteorological observation park, and the traffic sensors are shown in Figure 1.

2.2. Traffic Data

Traffic flow on the FSM Bridge was characterized using two complementary RTMSs to capture traffic in both the approach and departure directions. Approach direction (inbound) data were obtained from RTMS number 417 (TEM FSM Toll Gate Area Pre-Traffic, 6 lanes), which monitors European–Asian traffic entering the bridge toll area. However, sensor 417 records only the approach direction. Therefore, other direction (outbound) traffic parameters, representing traffic returning from the bridge, were obtained from RTMS 268 at Levent 1 (TEM Levent, 8 lanes: 4 approach + 4 departure) using departure-direction recordings from lanes 5–8. Raw data from both sensors were recorded at approximately 2 min intervals. Hourly totals were calculated by taking the arithmetic mean of the speed and occupancy variables, while the number of vehicles was calculated by summing the variables. Due to the availability of RTMS data for Levent only until 09:00 UTC on 17 October 2023, the study period was limited to January 2022 to October 2023 (22 months). This resulted in 16,390 h of recording for sensor 417 and 12,537 h of recording for sensor 268. The approach direction dataset includes lane-specific speeds (S1–S6, km/h), lane-specific vehicle volumes (V1–V6), long-vehicle counts (VL1–VL6), total number of vehicles, average speed, total long- and short-vehicle counts, and average lane occupancy. The departure direction dataset includes the corresponding variables for lanes 5–8 (S5–S8, V5–V8, VL5–VL8) along with average departure speed, total number of vehicles, long/short vehicle classification, and average occupancy.

2.3. Air Quality and Meteorological Data

Hourly concentrations of PM10, PM2.5, NOX and NO2 were obtained from the Maslak Air Quality Monitoring Station (AQMS) for the same study period. Similarly, meteorological data were also obtained from the air quality monitoring station. Meteorological data were collected hourly as follows: air temperature (°C), relative humidity (%), atmospheric pressure (hPa), wind speed (m/s), wind direction (°), solar radiation (W/m2), and precipitation (mm). Descriptive statistics of the data used are shown in Table 1.

3. Methods

The proposed M-ETAQI framework consists of five sequential steps: (1) data preprocessing and missing data completion, (2) feature engineering, (3) model development, (4) multimethod XAI analysis, and (5) causal inference.

3.1. Data Preprocessing and Missing Data Imputation

Missing data were completed using a hierarchical approach that matched the observed patterns of missingness within each variable category. Air pollutant variables exhibited two levels of missingness: NO2 and NOX had high missingness rates (~22%), while PM10 and PM2.5 had relatively low rates (~3%). Meteorological variables exhibited moderate missing data rates, ranging from 5.5% to 11.4%, with the highest rate observed for wind direction (11.39%). For all variables with missing-data rates below 12%, cubic-spline temporal interpolation was first applied to fill gaps shorter than 3 h, thereby leveraging the strong temporal autocorrelation of environmental time series. The remaining gaps were addressed using the MissForest algorithm, an iterative random forest-based approach that captures nonlinear relationships between variables [31]. The effectiveness of multivariate imputation approaches has been demonstrated in previous studies [32]. A more cautious strategy was adopted for NO2 and NOX when the percentage of missing values exceeded 22%. The MissForest completion procedure was applied, using PM10, PM2.5 traffic volume, and meteorological variables as auxiliary estimators, leveraging well-known physicochemical relationships among these pollutants. Traffic data gaps were filled using median values of the relevant day of the week and time of day in a ± four-week window to maintain daily and weekly cyclical patterns. This method has also been used in air quality index studies [33,34]. Outlier detection was performed using a combined approach that leverages the Interquartile Range (IQR) and area information. Variables showing high positive skewness, particularly precipitation (31.72), NOX (2.22), and PM10 (1.88), were flagged for cautious evaluation. Outliers in these variables generally reflect actual meteorological or emission events rather than measurement error. To prevent removing valid outliers in these right-skewed variables, a modified IQR threshold of Q3 + 3.0 IQR was applied. For variables with approximately normal distributions, such as temperature (skewness = −0.14) and relative humidity (skewness = −0.40), the standard 1.5 × IQR threshold was used. Instead of removing the detected outliers, they were replaced using temporal interpolation to preserve the continuous time-series structure.

3.2. Feature Engineering

Beyond the measured raw variables, various engineering features were created to more comprehensively capture traffic dynamics and atmospheric conditions. The Traffic Congestion Index (TCI) was defined as TCI = Vtotal/(Smean × C), where Vtotal represents the total number of vehicles per hour, Smean represents the average speed (km/h), and C represents the theoretical lane capacity (1800 vehicles/hour/lane × 6 lanes = 10,800 vehicles/hour). This dimensionless index captures traffic density under free-flow conditions. In addition, the Heavy Vehicle Ratio (HVR = Vlong/Vtotal) was calculated, given that heavy vehicles emit disproportionately more NOx and PM per vehicle-kilometer [35]. Atmospheric conditions were characterized using a simplified Atmospheric Stability Index (ASI), following the Pasquill-Gifford stability classification adapted for urban environments [36]. The index was calculated from inputs for wind speed, solar radiation, and time of day (classes A–F, coded 1–6). Temporal features included cyclic sine/cosine conversions for time of day and day of year to maintain periodicity, and binary indicators for weekdays/weekends and Turkish public holidays. Multiplicative interaction terms (Speed × Wind Speed, Volume × ASI, Density × Humidity) were created to capture synergistic effects between traffic and meteorological variables. Lagged values (t − 1, t − 2, t − 3, t − 6, t − 12, t − 24) and moving statistics (3 h, 6 h, 12 h, and 24 h moving averages and standard deviations) were calculated for both pollutant and meteorological variables to account for temporal autocorrelation and lagged transport effects [37,38]. The complete list of engineered features, including their mathematical definitions and descriptions, is summarized in Table 2. Variance Inflation Factor (VIF) analysis was performed to assess multicollinearity among engineered features. Specifically, features with VIF > 10 between lagged and sliding statistics were identified. However, tree-based ensemble models are inherently robust to multicollinearity because they select feature subsets at each split. Therefore, associated features were preserved to maintain predictive information, and individual contributions were decomposed via SHAP interaction analysis (Section 3.4).

3.3. Machine Learning Models

3.3.1. XGBoost

Depth-first tree building, L1 and L2 regularization, and column undersampling were implemented alongside XGBoost [39]. Hyperparameters were optimized using Bayesian optimization with the Tree-Structured Parzen Estimator (TPE) [40] over 200 iterations with a 5-fold temporal cross-validation scheme to prevent temporal leakage. The optimized hyperparameters for all model–pollutant combinations are provided in Supplementary Material S1. XGBoost demonstrated superior performance in air-quality prediction studies compared with both traditional machine-learning and deep-learning approaches [41].

3.3.2. LightGBM

LightGBM uses leaf-based tree growth with Gradient-Based One-Sided Sampling (GOSS) and Feature Packing (EFB) to improve computational efficiency [42]. It has been shown that LightGBM achieves good results for hourly PM2.5 estimation using only meteorological variables, with R2 values of 0.80 at the hourly scale and 0.89 at the daily scale [43]. Similarly, it has been optimized using Bayesian optimization.

3.3.3. CatBoost

Categorical Increment uses a sequential increment with symmetric tree construction, enabling the natural processing of categorical features and reducing overfitting through goal-based coding [44]. The algorithm’s built-in cross-validation is used in conjunction with external Bayesian optimization.

3.3.4. Random Forest

RF with 500 trees, bootstrap sampling, and feature-subset randomization, was used as the basic ensemble method [45]. Although RF generally exhibits lower predictive performance than successive boosting methods on tabular data, it provides unbiased out-of-bag (OOB) error estimates without requiring a separate validation set and produces stable, permutation-based feature importance measures that are less sensitive to hyperparameter selection. These features make RF a valuable benchmark for cross-validating feature-importance rankings from boosting algorithms in a multi-model XAI framework [46].

3.3.5. CNN–BiLSTM–Attention

A hybrid deep learning architecture combining three complementary components has been designed: (1) a 1-dimensional Convolutional Neural Network (Conv1D) layer for local temporal feature extraction with 3 and 5 core sizes to capture short-range pollutant fluctuation patterns and meteorological microtrends; (2) a 128-hidden-unit Bidirectional Long Short-Term Memory (BiLSTM) layer [47] that processes the input sequence in both forward and backward directions, enabling the model to leverage past and future contextual information within the input window; and (3) an 8-head multi-head self-attention mechanism that allows the model to dynamically assign higher importance to critical time steps such as peak hour traffic peaks or atmospheric inversion events and weaken less informative periods. The input window is set to 24 h (24 time steps), and dropout (0.3) and layer normalization are applied between components to reduce overfitting and stabilize gradient flow. Training was performed using the Adam optimizer with a cosine-annealed learning rate schedule and early stopping (patience = 15). This hybrid CNN-LSTM architecture has been shown to improve predictions of particulate matter and nitrogen oxides in urban air-quality prediction tasks compared with standalone LSTM networks [23,48]. Knowledge distillation was implemented using a response-based approach. First, the CatBoost teacher model generated out-of-fold (OOF) predictions through 5-fold temporal cross-validation to prevent data leakage. These teacher predictions were then appended as an additional input feature to the CNN–BiLSTM–Attention student model’s feature set, enabling the student to learn from the teacher’s representation of the target variable alongside the original features. The student model was trained using Huber loss with the Adam optimizer and cosine-annealed learning rate scheduling.

3.4. Multi-Method Explainable AI Framework

3.4.1. SHAP Analysis

SHAP values were calculated using TreeExplainer [49], an exact polynomial-time algorithm for tree-based models, and DeepExplainer (DeepLIFT-based) for the CNN–LSTM–Attention model [16]. SHAP analysis was performed at three levels: (a) global feature significance using average absolute SHAP values across the entire test set, providing a model-independent ranking of predictive contributions; (b) local explanations using SHAP force plots for individual high-pollution and low-pollution events, revealing specific feature interactions driving extreme events; and (c) SHAP interaction values to quantify binary feature interactions, focusing particularly on traffic–meteorology such as congestion, wind speed and volume–atmospheric stability combinations [49]. This multi-level SHAP framework correlates global feature rankings with event-specific mechanistic explanations, providing both scientific insight and actionable policy predictions.

3.4.2. LIME Analysis

LIME was applied to 500 randomly sampled test samples for each pollutant, using 5000 degradations per sample [13]. Local surrogate models were analyzed to obtain feature-importance rankings, if-then decision rules for domain-expert interpretation, and confidence intervals for local attributions to cross-validate with SHAP results. The complementary use of LIME with SHAP provides a robust assurance of interpretability: while SHAP theoretically provides consistent global attributions, LIME offers more accessible, intuitive local explanations for policymakers [30].

3.4.3. PDP and ALE Graphics

One-dimensional PDP and ALE plots were generated for the top 10 features identified by SHAP. Two-dimensional PDP surfaces were calculated for the top 5 traffic × meteorology feature pairs (e.g., Vehicle Volume × Wind Speed, Speed × Atmospheric Stability). ALE plots were preferred over PDP for correlated features because of their unbiased forecasting properties. Individual Conditional Expectation (ICE) plots were generated alongside PDPs to visualize forecast heterogeneity across samples.

3.4.4. Multimethod Consensus Analysis

The agreement among the XAI methods was quantitatively assessed using Spearman’s rank correlation on SHAP, LIME, and permutation-based significance rankings. The consensus feature significance score was calculated as a rank-weighted average across all three methods. Cases of disagreement (Spearman ρ < 0.7 between any pair) were examined individually to identify methodological limitations and feature-specific interpretation challenges, as current comparative analyses have shown that SHAP and LIME outputs can be significantly affected by model selection and feature multicollinearity [50].

3.5. Temporal Decomposition with Component-Specific XAI

Each pollutant time series was subjected to STL decomposition using a 24 h seasonal period (daily cycle) and a 720 h (30-day) trend correction window [51]. The decomposed component, trend, seasonal, and residual were then used as separate forecasting targets for the best-performing ML model. SHAP analysis was performed independently on each component model, enabling the identification of: (a) trend drivers reflecting long-term emission changes and policy impacts; (b) seasonal drivers capturing daily traffic patterns, photochemistry, and heating cycles; and (c) residual drivers associated with short-term pollution events and meteorological anomalies. This component-specific XAI approach was found to improve both forecasting accuracy and interpretability in raw time series modeling [52]. Similarly, hybrid frameworks that combine signal decomposition techniques with gradient boosting models have demonstrated improved forecasting performance for environmental time series [53].

3.6. Causal Inference Through Convergent Cross-Mapping

Convergent Cross-Mapping (CCM) was used to determine the direction of causal relationships between traffic parameters and pollutant concentrations [54]. Statistical significance was assessed using surrogate time series generated by seasonal shuffling (1000 surrogates, p < 0.05). The embedding dimension (m) was determined using simplex projection by selecting a value that maximized the estimation capability in the range m = 2–8; all variables yielded an optimal m = 2 value, and the time delay (τ) was set to 1 h, consistent with the hourly resolution of the dataset (Figure S29). A sensitivity analysis performed in the range m = 2–6 confirmed that absolute ρ values increased monotonically with higher m, while the relative ranking of causal factors remained constant for all pollutants (Table S3). This demonstrates that m = 2 provides the most conservative causal power estimates and that the reported findings are robust against embedding size selection. CCM was applied to all traffic-pollutant pairs and meteorology-pollutant pairs to create a directed causal graph.

3.7. Model Evaluation

Model performance was evaluated using a strict temporal training-test split to prevent data leakage. The training set covered January 2022–June 2023 (11,663 samples, 80%), while the test set covered July–October 2023 (2916 samples, 20%). No test data were used during model training or hyperparameter tuning. 3-fold time series cross-validation was applied during Bayesian optimization for hyperparameter selection, and 5-fold time series cross-validation was used for the Friedman statistical comparison across models. Performance metrics included the coefficient of determination (R2), root-mean-square error (RMSE), mean absolute error (MAE), and mean absolute percentage error (MAPE). Additionally, a seasonal stratified assessment (DJF, MAM, JJA, SON) was conducted to evaluate the model’s robustness under different meteorological regimes. Detailed descriptions of the performance metrics are provided in Supplementary Material S7 [55].

4. Results

4.1. Descriptive Analysis and Temporal Patterns

As shown in Figure 2, the PM10 and PM2.5 time series exhibit significant seasonal fluctuations: peak values are observed during the winter months (December–February) due to heating-related emissions and atmospheric inversions. NO2 and NOX time series show daily cyclical patterns consistent with traffic density. Weekday/weekend differences are clearly distinguishable in traffic volume data. While the temperature profile follows the expected annual sinusoidal trend, wind speed shows a more stochastic distribution. For example, one study found that for all three pollutants (NO, NO2, and PM2.5), weekday roadside concentrations were significantly higher than weekend levels [56]. Examining the daily (diurnal) profiles in Figure 3, PM10 and PM2.5 concentrations are high during nighttime (00:00–06:00) and decrease during midday (12:00–16:00). A study conducted in European cities found that concentrations of NO2, PM2.5, and PM10 were higher at night and in the evening, but lower during midday [57]. This pattern can be explained by the lower mixing height and greater atmospheric stability during nighttime. NO2 and NOX profiles exhibit a distinct bimodal structure: peaks are observed between 07:00 and 09:00 in the morning and 17:00–20:00, corresponding to rush hour traffic. These profiles confirm that traffic emissions are a dominant source, especially for NOX. According to the correlation heat map in Figure 4, there is a strong positive correlation (r = 0.71) between PM10 and PM2.5, indicating that these two types of particulate matter are fed from common sources. There is also a significant correlation (r = 0.46) between NOX and PM10. Wind speed exhibits a negative correlation with all pollutants (PM10: r = −0.41, PM2.5: r = −0.36, NOX: r = −0.48); this reflects the dispersion effect of wind. A negative correlation was found between daily measured PM10 levels and wind speed in Warsaw [58].

4.2. Model Performance Comparison

According to the model performance comparison results presented in Table 3 and Figure 5, tree-based ensemble models consistently achieved R2 > 0.80 for all pollutants. The highest performance was achieved by the CatBoost model for NOX (R2 = 0.866, RMSE = 14.29), PM2.5 (R2 = 0.876, RMSE = 3.20), and NO2 (R2 = 0.818, RMSE = 6.17). XGBoost achieved R2 = 0.838 for PM10. LightGBM provided the highest R2 (0.845) for PM10. Random Forest remained the lowest-performing tree-based model, achieving R2 > 0.79 for all pollutants. It was reported that using CatBoost in conjunction with SHAP achieved high predictive accuracy (R2 = 0.96) for urban PM2.5 concentrations [59]. The CNN–BiLSTM–Attention deep learning model lagged behind tree-based models (PM10: R2 = 0.416, PM2.5: R2 = 0.453, NO2: R2 = 0.368, NOX: R2 = 0.515). The main reasons for this difference are: (1) the superiority of tree-based models over tabular data, (2) the need for more training data for the deep learning model, and (3) the complexity of the hyperparameter optimization space. However, the CNN–BiLSTM–Attention model significantly improved performance by learning from CatBoost’s outputs via knowledge distillation (V5: PM10 R2 = 0.640, PM2.5 R2 = 0.717). This finding is consistent with [60], who showed that integrating tree-based model outputs into the CNN–BiLSTM–Attention architecture significantly improved PM2.5 prediction accuracy.
According to the Friedman statistical test results, a statistically significant difference was found between the models only for NOX estimation (χ2 = 10.68, p = 0.014). For other pollutants (PM10: p = 0.472, PM2.5: p = 0.069, NO2: p = 0.323), the performance differences between the models are not statistically significant, indicating that the tree-based models perform similarly. An ablation analysis was performed to evaluate the contribution of feature engineering. CatBoost models trained with only raw traffic and meteorological variables (19 features without delay, rounding, and interaction terms) achieved significantly lower prediction accuracy (mean R2 ≈ 0.09), while models with the full engineered feature set (186 features) achieved R2 > 0.80 across all pollutants under temporal cross-validation. This confirms that delay, rolling statistics, and interaction features provide essential prediction information justifying increased feature dimensionality. Individual fold-level R2 scores obtained from the five-fold temporal cross-validation method are reported in Supplementary Material S2. The Friedman test is a well-established nonparametric method for simultaneously comparing multiple machine learning models, particularly when performance metrics are not normally distributed [61]. In the scatter plots in Figure 6, the points from the tree-based models cluster closely around the 1:1 line. In PM2.5 predictions, all models show the densest clustering. A wider scatter is observed in the CNN–BiLSTM–Attention model, and predictive performance declines, especially for high-concentration values (outliers). Although the scatter of the Random Forest model is close to that of tree-based boosting methods, there is a slight tendency towards underestimation at outliers. This behavior was similarly observed, with the generalization ability of RF-based models decreasing under high-concentration and temporally distant conditions [62].

4.3. Explainable AI: SHAP Feature Importance

Figure 7 shows the SHAP beeswarm graphs of the XGBoost model. The dominant feature for all pollutants is the 1 h lag (Lag1): PM10_Lag1, PM2.5_Lag1, NO2_Lag1, NOX_Lag1. This confirms that air pollution has a strong autocorrelation structure [63]. It has been shown that delay characteristics significantly improve model performance in PM10 and PM2.5 prediction in Iran. Moving average and standard deviation features (RollMean6, RollStd3) are second in importance. Among meteorological variables, wind speed has a significant negative effect, particularly for NOX and PM10 prediction; higher wind speeds reduce pollutant concentrations. In NOX prediction, WindSpeed stands out as the second most important feature. The Speed_WindSpeed interaction variable is among the top 10 features for all pollutants, capturing the synergistic effect of traffic speed and wind speed. Figure 8 groups the SHAP values by feature category. Lagging/mobile pollutant characteristics provide the dominant contribution for all pollutants: PM10 79.1%, NO2 79.1%, PM2.5 75.4%, and NOX 70.1%. Traffic variables are the second most important category, contributing 20.5% to NOX, 11.4% to NO2, 11.9% to PM2.5, and 10.2% to PM10. The high contribution of traffic to NOX is consistent with NOX being a pollutant directly from combustion. Meteorological variables rank third, contributing 8.7% to PM2.5 and 6.7% to PM10, indicating that particulate matter is more affected by atmospheric conditions. The contribution of traffic variables to the NOX estimate (20.5%) is approximately twice that of other pollutants. This proves that NOX emissions are largely directly related to traffic. In contrast, the contribution of traffic is lower for PM10 and PM2.5, indicating that particulate matter is sourced from multiple sources (construction, heating, natural dust, etc.). Traffic characteristics have been found to significantly improve the prediction performance of NO2 and NOX in Seoul, Republic of Korea [64]. Pollutant-specific SHAP dependency graphs, local LIME descriptions, PDP, ALE, and PDP-ALE comparison diagrams are provided in Supplementary Material S5. SHAP interaction analysis revealed that the strongest pairwise interactions for NOX estimation occurred between wind speed and traffic volume, and for PM2.5, between temperature and lagged concentrations. These interaction effects confirm that meteorological-traffic synergies are critical factors where the contributions of individual features cannot be fully captured. LIME local explanations were generated for representative high-concentration, medium-concentration, and low-concentration samples of each pollutant to complement the global SHAP analysis (Figures S9–S12). For high-concentration PM10 events, LIME attributed the largest positive contributions to lagged PM10 characteristics and wind speed conditions, while for high-concentration NOX events, traffic volume and velocity-related characteristics dominated the local explanations. This contrast confirms that the peak pollution events are not uniformly driven by the same factors: PM10 peaks are more sensitive to meteorological dispersion conditions, while NOX peaks are primarily traffic-driven. LIME also revealed that individual high-pollution events within the same pollutant may have different driving factors, some traffic-driven while others are meteorological, demonstrating the value of local explanations for designing targeted, event-specific mitigation strategies.

4.4. Temporal Decomposition (STL Analysis)

Figure 9 shows the Seasonal-Trend Decomposition using LOESS decomposition of the PM10 time series into trend, seasonal, and residual components. STL decomposition results for the remaining three pollutants (PM2.5, NOX and NO2) are presented in Supplementary Material S4. STL decomposition has been shown to effectively separate time series into these three components, enabling more accurate capture of long-term trends, seasonal cycles, and random fluctuations in air quality data [65,66]. The stochastic nature of the remainder component makes it the most challenging part to model, as it reflects irregular, instantaneous atmospheric events [67]. The trend component reflects long-term variations in PM10 concentrations, showing significant increases during the winter months (December–February); these increases are likely related to increased heating-related emissions and reduced atmospheric mixing under stable boundary-layer conditions. The seasonal component captures the recurring 24 h daily cycle, while the residual component represents stochastic fluctuations that cannot be attributed to trend or seasonality. Figure 10 shows the STL variance decomposition results for all four pollutants. The trend component accounts for the largest share of the total variance among all pollutants: NO2 (56.3%), PM10 (53.3%), PM2.5 (45.3%), and NOX (45.3%). The dominance of the trend component suggests that long-term factors, including changes in traffic density, emission control policies, and seasonal meteorological patterns, primarily drive variability in pollutant levels. The high trend variance of NO2 (56.3%) is consistent with its sensitivity to traffic-related combustion emissions and the gradual implementation of emission regulations. NO2’s seasonal trend decomposition has been shown to reveal key seasonal patterns and long-term trends necessary for generating accurate forecasts [68].
The seasonal component contributed minimally to the total variance across all pollutants (2.8–5.2%), suggesting that the 24 h daily cycle explains only a limited portion of the hourly concentration variability. This finding may reflect the masking effect of episodic pollution events and meteorological disturbances on regular daily patterns. The residual component was highest for NOX (50.5%), significantly higher than that of PM10 (41.0%), PM2.5 (40.7%), and NO2 (42.1%). The elevated residual variance in NOX suggests that its concentrations are strongly driven by instantaneous, irregular processes, such as transient traffic fluctuations and short-term meteorological variability, rather than by systematic trends or seasonal patterns [69]. This result is also corroborated by the SHAP feature significance analysis, in which traffic variables account for 20.5% of the total contribution to NOX estimation (the highest percentage among all pollutants studied), further supporting the conclusion that NOX dynamics are more sensitive to sudden emission events than to gradual long-term trends.

4.5. Causal Inference: Convergent Cross-Mapping (CCM)

The results of the CCM analysis, applied to determine the direction and strength of causal relationships between traffic and meteorological variables and pollutant concentrations, are presented in Table 4 and Figure 11. According to the analysis, the strongest causal effects were observed for meteorological variables. The causal effect of wind speed on NOX (ρ = 0.370) is the highest CCM value detected in the study, followed by the effect of temperature on NOX (ρ = 0.334), the effect of relative humidity on NO2 (ρ = 0.309), and the effect of wind speed on PM10 (ρ = 0.302). These findings reveal that meteorological conditions are the primary external factors directly controlling pollutant concentrations. When the causal effects of traffic variables were examined, the total number of vehicles had the most significant causal effect on NO2 (ρ = 0.149) and NOX (ρ = 0.138). The causal effect of heavy vehicle ratio (HVR) on NOX (ρ = 0.099) confirms the disproportionate contribution of heavy vehicles to nitrogen oxide emissions. As a noteworthy finding, the causal direction of all traffic variables in NOX was determined as “Variable → Pollutant”; this demonstrates that CCM proves that traffic parameters directly affect NOX concentrations. CCM convergence diagnostics, which validate the stability of cross-map skill predictions with increasing library length, are presented in Supplementary Material S6. In contrast, the causal direction for some traffic variables related to PM10 and PM2.5 was identified as “Pollutant → Variable,” indicating a feedback mechanism in which high pollution levels may influence driver behavior. Figure 12 comparatively examines the relationship between the Pearson correlation coefficient (|r|) and the CCM causal power (ρ). Wind speed and temperature variables are located above the 1:1 reference line for all pollutants, indicating that these meteorological variables have a strong nonlinear causal effect despite low linear correlation. Traffic variables generally fall below or near the 1:1 line, indicating that the traffic-pollutant relationship is relatively linear. Consequently, CCM analysis successfully revealed nonlinear causal relationships that classical correlation approaches failed to capture. Using CCM in conjunction with correlation analysis allows for a more comprehensive examination of causal relationships; correlation alone can sometimes be misleading when the underlying meteorological cyclicality is not extracted from the time series. However, CCM analysis has shown that meteorological variables such as wind speed and boundary layer height have a causal effect on NO2 concentrations, and that conventional correlation analysis can yield misleading results when applied without removing cyclical meteorological patterns [70].
An R2 = 0.9987 was obtained using Random Forest with SHAP and PDP interpretation to estimate daily AQI for Hapur, India [71]. However, this study focused on a composite index at daily resolution rather than focusing on individual pollutants on an hourly timescale. LightGBM was used in conjunction with SHAP to estimate historical PM2.5 concentrations in Thailand [72]. However, this approach was limited to univariate modeling without causal inference and temporal decomposition. Also, impressive R2 values of 0.98–0.999 were reported by applying SHAP in conjunction with RF, GBDT, and XGBoost along the Yangtze River Delta [73]. However, their analysis relied solely on SHAP without any complementary XAI method. Similarly, an R2 of 96.44% was achieved using TPE and an optimized CatBoost to analyze PM2.5 factors in 297 Chinese cities [59]. However, they were also limited to a single pollutant and a single XAI technique. A common limitation in these studies is that they rely on only one or at most two interpretability methods, usually only SHAP. In contrast, M-ETAQI integrates four complementary XAI techniques (SHAP, LIME, PDP, ALE) to cross-validate feature attributions and also includes STL decomposition for temporal pattern isolation and CCM for causal inference; capabilities not found in the aforementioned frameworks. Furthermore, M-ETAQI improves our understanding of traffic-related air quality dynamics by addressing four pollutants (PM10, PM2.5, NOX, NO2) simultaneously at hourly resolution.

5. Conclusions

In this study, an M-ETAQI framework was developed to quantitatively determine the contributions of traffic flow and meteorological conditions to urban air pollution in the corridor of the Fatih Sultan Mehmet Bridge in Istanbul. Five machine learning models (XGBoost, LightGBM, CatBoost, Random Forest, and CNN–BiLSTM–Attention) were trained on a 14,631 h dataset covering the period from January 2022 to October 2023; model predictions were interpreted using SHAP, LIME, PDP, and ALE methods; temporal components were decomposed using STL decomposition; and causal relationships were revealed using CCM analysis. In terms of model performance, tree-based ensemble methods consistently demonstrated high prediction accuracy for all pollutants. The CatBoost model exhibited the best overall performance, achieving the highest accuracy values for PM2.5 (R2 = 0.876), NOX (R2 = 0.866), and NO2 (R2 = 0.818) predictions. XGBoost yielded the best result for PM10 prediction (R2 = 0.838), while LightGBM showed competitive performance (R2 = 0.845). According to the Friedman statistical test results, the performance difference between the tree-based models was statistically significant only for NOX prediction (p = 0.014); for other pollutants, the models showed similar accuracy. This finding confirms that ensemble tree-based methods are reliable and consistent tools in hourly air quality prediction. The CNN–BiLSTM–Attention deep learning model lagged behind tree-based models in independent training (mean R2 ≈ 0.44). This situation can be explained by the known superiority of tree-based models in tabular data structures and the need for larger datasets for deep learning models. However, model performance was significantly improved by a knowledge distillation approach that leverages CatBoost outputs (PM10: R2 = 0.640; PM2.5: R2 = 0.717). This result shows that hybrid teacher-student architectures are a promising approach in air quality modeling.
SHAP-based feature significance analysis revealed that a large portion of the variability in pollutant concentrations was explained by lagged and moving average features (PM10: 79.1%, NO2: 79.1%, PM2.5: 75.4%, NOX: 70.1%). This finding confirms that air pollution time series have a strong autocorrelation structure. Traffic variables were identified as the second most important category, contributing 20.5% to NOX prediction. This ratio is approximately twice that of other pollutants (PM10: 10.2%, PM2.5: 11.9%, NO2: 11.4%). The dominance of traffic-related contributions to NOX clearly reveals a direct association between this pollutant and combustion-related emissions. In contrast, the relatively low traffic contribution to PM10 and PM2.5 is consistent with particulate matter being sourced from multiple sources, such as construction, heating, and natural dust. STL temporal decomposition analysis showed that the trend component accounted for the largest share of the total variance for all four pollutants. NO2 had the highest trend variance (56.3%), while NOX exhibited the highest residual variance (50.5%). The high residual variance in NOX indicates that this pollutant is more sensitive to instantaneous traffic fluctuations and short-term meteorological changes than other pollutants. This finding aligns with the high traffic contribution to NOX obtained in the SHAP analysis and confirms the consistency of the two independent analysis methods. The low seasonal component across all pollutants (2.8–5.2%) indicates that the 24 h cycle at hourly resolution contributes little to the total variance. The XAI framework enabled us to understand the reasoning behind predictions, something black-box models alone cannot provide. For example, the SHAP analysis showed that wind speed strongly reduced NOX concentrations but had a weaker effect on PM2.5, while temperature was more influential. This distinction would have remained hidden in a purely predictive model. The LIME also showed that individual high pollution events were not driven by the same factors; some were traffic-driven while others were meteorological-driven, allowing for targeted policy responses rather than general policy responses. The agreement of four independent XAI methods (SHAP, LIME, PDP, ALE) on the underlying factors lends far more credibility to these findings than any single method could provide.
Causal inference analysis using CCM successfully revealed nonlinear causal relationships that correlation-based approaches cannot detect. The strongest causal effects were observed in meteorological variables: the effect of wind speed on NOX (ρ = 0.370), the effect of temperature on NOX (ρ = 0.334), the effect of relative humidity on NO2 (ρ = 0.309), and the effect of wind speed on PM10 (ρ = 0.302). Regarding traffic variables, the total number of vehicles had a statistically significant causal effect on NO2 (ρ = 0.149) and NOX (ρ = 0.138). In particular, the determination of the causal direction of all traffic variables as “Variable → Pollutant” for NOX is strong evidence that traffic emissions directly control NOX concentrations. A comparison of Pearson correlation with CCM showed that meteorological variables exhibit a strong nonlinear causal effect despite low linear correlation; the traffic-pollutant relationship was relatively more linear. The study has some limitations. The data collection period is limited to 22 months, and longer-term datasets would allow for a more robust assessment of interannual variability. Data from a single air quality monitoring station were used, and the study lacks sufficient data. Also, the CNN–BiLSTM–Attention deep learning model performed poorly relative to tree-based models, indicating the need for further architectural optimization and larger datasets. Although the CCM sensitivity analysis confirmed that causal rankings are robust across embedding dimensions m = 2–6 (Table S3), the analysis was limited to a single time delay (τ = 1 h); future work could explore multi-scale temporal delays. Moreover, the analysis focuses on traffic and meteorological contributions to air pollution; however, other emission sources such as industrial activities, residential heating, and maritime transport are not included due to the lack of spatially resolved emission inventory data for the study area. Furthermore, the study covers only one corridor (FSM Bridge), which may limit the generalizability of the findings to other urban environments with different traffic and land-use characteristics. Future studies should integrate multi-source emission inventories to provide a more comprehensive assessment of urban air quality dynamics and expand spatial coverage to multiple monitoring stations across Istanbul. Furthermore, while hourly average vehicle speed data can be obtained from RTMSs, instantaneous acceleration data is not recorded; this limits the ability to capture the full spectrum of driving dynamics that affect emission rates.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/atmos17060591/s1, Table S1: Optimized hyperparameters for each model–pollutant combination; Table S2: Five-fold temporal cross-validation R2 scores for each model–pollutant combination; Table S3: CCM sensitivity analysis: causal strength (ρ) across embedding dimensions m = 2–6 for key variable–pollutant pairs; Figure S1: STL decomposition of hourly PM10 concentrations; Figure S2: STL decomposition of hourly PM2.5 concentrations; Figure S3: STL decomposition of hourly NOx concentrations; Figure S4: STL decomposition of hourly NO2 concentrations; Figure S5: SHAP dependence plots for top features affecting PM10 concentrations; Figure S6: SHAP dependence plots for top features affecting PM2.5 concentrations; Figure S7: SHAP dependence plots for top features affecting NO2 concentrations; Figure S8: SHAP dependence plots for top features affecting NOX concentrations; Figure S9: LIME local explanation for a representative high-concentration PM10 sample; Figure S10: LIME local explanation for a representative high-concentration PM2.5 sample; Figure S11: LIME local explanation for a representative high-concentration NO2 sample; Figure S12: LIME local explanation for a representative high-concentration NOX sample; Figure S13: Partial Dependence Plots for PM10 predictions; Figure S14: Partial Dependence Plots for PM2.5 predictions; Figure S15: Partial Dependence Plots for NO2 predictions; Figure S16: Partial Dependence Plots for NOX predictions; Figure S17: Accumulated Local Effects plots for PM10; Figure S18: Accumulated Local Effects plots for PM2.5; Figure S19: Accumulated Local Effects plots for NO2; Figure S20: Accumulated Local Effects plots for NOX; Figure S21: PDP vs. ALE comparison for PM10; Figure S22: PDP vs. ALE comparison for PM2.5; Figure S23: PDP vs. ALE comparison for NO2; Figure S24: PDP vs. ALE comparison for NOX; Figure S25: CCM convergence plots for PM10; Figure S26: CCM convergence plots for PM2.5; Figure S27: CCM convergence plots for NOX; Figure S28: CCM convergence plots for NO2; Figure S29: Simplex projection results for optimal embedding dimension selection; Equations (S1)–(S4): Performance metrics (R2, RMSE, MAE, MAPE).

Author Contributions

Conceptualization: E.B.; Methodology: E.B., H.Ö.; Formal analysis and investigation: E.B., H.Ö.; Writing—Original draft preparation: E.B.; Writing—Review and editing: E.B., H.Ö.; Supervision: A.D. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Air pollution data are publicly available online at https://havakalitesi.ibb.gov.tr/, accessed on 20 April 2026.

Acknowledgments

The authors thank the Istanbul Metropolitan Municipality for providing the related data. This work was supported by the Scientific Research Projects Department of Istanbul Technical University (ITU), project Number: 47312.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. WHO. WHO Global Air Quality Guidelines: Particulate Matter (PM2.5 and PM10), Ozone, Nitrogen Dioxide, Sulfur Dioxide and Carbon Monoxide; World Health Organization: Geneva, Switzerland, 2021. [Google Scholar]
  2. Braithwaite, I.; Zhang, S.; Kirkbride, J.B.; Osborn, D.P.; Hayes, J.F. Air Pollution (Particulate Matter) Exposure and Associations with Depression, Anxiety, Bipolar, Psychosis and Suicide Risk: A Systematic Review and Meta-Analysis. Environ. Health Perspect. 2019, 127, 126002. [Google Scholar] [CrossRef]
  3. Guerreiro, C.B.; Foltescu, V.; De Leeuw, F. Air Quality Status and Trends in Europe. Atmos. Environ. 2014, 98, 376–384. [Google Scholar] [CrossRef]
  4. Karagulian, F.; Belis, C.A.; Dora, C.F.C.; Prüss-Ustün, A.M.; Bonjour, S.; Adair-Rohani, H.; Amann, M. Contributions to Cities’ Ambient Particulate Matter (PM): A Systematic Review of Local Source Contributions at Global Level. Atmos. Environ. 2015, 120, 475–483. [Google Scholar] [CrossRef]
  5. Amato, F.; Cassee, F.R.; Van Der Gon, H.A.D.; Gehrig, R.; Gustafsson, M.; Hafner, W.; Harrison, R.M.; Jozwicka, M.; Kelly, F.J.; Moreno, T.; et al. Urban Air Quality: The Challenge of Traffic Non-Exhaust Emissions. J. Hazard. Mater. 2014, 275, 31–36. [Google Scholar] [CrossRef]
  6. Zhang, K.; Batterman, S. Air Pollution and Health Risks Due to Vehicle Traffic. Sci. Total Environ. 2013, 450, 307–316. [Google Scholar] [CrossRef] [PubMed]
  7. Lu, C.; Dong, S.; Huang, S.; Zheng, H.; Liu, J.; Li, J.; Yu, L. Real-Traffic Emissions of CO, NOX, CO2, and PM2.5 from Vehicles Using a Portable Emission Measurement System. Air Qual. Atmos. Health 2025, 18, 2265–2276. [Google Scholar] [CrossRef]
  8. Liang, M.; Chao, Y.; Tu, Y.; Xu, T. Vehicle Pollutant Dispersion in the Urban Atmospheric Environment: A Review of Mechanism, Modeling, and Application. Atmosphere 2023, 14, 279. [Google Scholar] [CrossRef]
  9. Qiu, M.; Zigler, C.; Selin, N.E. Statistical and Machine Learning Methods for Evaluating Trends in Air Quality under Changing Meteorological Conditions. Atmos. Chem. Phys. 2022, 22, 10551–10566. [Google Scholar] [CrossRef]
  10. Singh, S.; Kumar, M.; Sengar, V.; Nagar, H.; Kumar, A.; Mishra, J. Ensemble Learning for Air Quality Index Prediction: Integrating Gradient Boosting, XGBoost, and Stacking with SHAP-Based Interpretability. Sci. Rep. 2026, 16, 8544. [Google Scholar] [CrossRef]
  11. Molnar, C.; Casalicchio, G.; Bischl, B. Interpretable Machine Learning—A Brief History, State-of-the-Art and Challenges. In ECML PKDD 2020 Workshops; Koprinska, I., Kamp, M., Appice, A., Eds.; Springer International Publishing: Cham, Switzerland, 2020; pp. 417–431. [Google Scholar] [CrossRef]
  12. Rudin, C. Stop Explaining Black Box Machine Learning Models for High Stakes Decisions and Use Interpretable Models Instead. Nat. Mach. Intell. 2019, 1, 206–215. [Google Scholar] [CrossRef] [PubMed]
  13. Ribeiro, M.T.; Singh, S.; Guestrin, C. “Why Should I Trust You?”: Explaining the Predictions of Any Classifier. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; pp. 1135–1144. [Google Scholar] [CrossRef]
  14. Friedman, J.H. Greedy Function Approximation: A Gradient Boosting Machine. Ann. Stat. 2001, 29, 1189–1232. [Google Scholar] [CrossRef]
  15. Apley, D.W.; Zhu, J. Visualizing the Effects of Predictor Variables in Black Box Supervised Learning Models. J. R. Stat. Soc. Ser. B Stat. Methodol. 2020, 82, 1059–1086. [Google Scholar] [CrossRef]
  16. Lundberg, S.M.; Lee, S.I. A Unified Approach to Interpreting Model Predictions. In Advances in Neural Information Processing Systems; Curran Associates, Inc.: Red Hook, NY, USA, 2017; Volume 30. [Google Scholar]
  17. Birinci, E.; Ekmekcioğlu, Ö.; Ozdemir, H.; Deniz, A. Interpretable Machine Learning Framework for Air Quality Prediction in Istanbul Using Shapley Additive Explanations (SHAP). Stoch. Environ. Res. Risk Assess. 2026, 40, 37. [Google Scholar] [CrossRef]
  18. Aram, S.A.; Nketiah, E.A.; Saalidong, B.M.; Lartey, P.O.; Kansake, B.A.; Asamoah, G.A. Machine Learning-Based Prediction of Air Quality Index and Air Quality Grade: A Comparative Analysis. Int. J. Environ. Sci. Technol. 2024, 21, 1345–1360. [Google Scholar] [CrossRef]
  19. Natarajan, S.K.; Shanmurthy, P.; Arockiam, D.; Daniel, A.; Ganesan, R. Optimized Machine Learning Model for Air Quality Index Prediction in Major Cities in India. Sci. Rep. 2024, 14, 6795. [Google Scholar] [CrossRef]
  20. Alvarado-Molina, M.; Curto, A.; Wheeler, A.J.; Tham, R.; Cerin, E.; Nieuwenhuijsen, M.; Donaire-Gonzalez, D. Improving Traffic-Related Air Pollution Estimates by Modelling Minor Road Traffic Volumes. Environ. Pollut. 2023, 338, 122657. [Google Scholar] [CrossRef] [PubMed]
  21. Marino, C.; Nucara, A.; Panzera, M.F.; Pietrafesa, M. Assessment of the Road Traffic Air Pollution in Urban Contexts: A Statistical Approach. Sustainability 2022, 14, 4127. [Google Scholar] [CrossRef]
  22. Houdou, A.; El Badisy, I.; Khomsi, K.; Moussafir, M.; Ahmedou, S.O. Interpretable Machine Learning Approaches for Forecasting and Predicting Air Pollution: A Systematic Review. Aerosol Air Qual. Res. 2024, 24, 230151. [Google Scholar] [CrossRef]
  23. Gilik, A.; Ogrenci, A.S.; Ozmen, A. Air Quality Prediction Using CNN+LSTM-Based Hybrid Deep Learning Architecture. Environ. Sci. Pollut. Res. 2022, 29, 11920–11938. [Google Scholar] [CrossRef]
  24. Akiner, M.E.; Katipoğlu, O.M.; Çintaş, E. Predicting Air Quality Index in Başakşehir, Istanbul with Hybrid AI Models: Unveiling Key Drivers through CatBoost-Based SHAP and Feature Importance Analysis. Theor. Appl. Climatol. 2025, 156, 422. [Google Scholar] [CrossRef]
  25. Birinci, E.; Deniz, A.; Özdemir, E.T. The Relationship between PM10 and Meteorological Variables in the Mega City Istanbul. Environ. Monit. Assess. 2023, 195, 304. [Google Scholar] [CrossRef]
  26. Birinci, E.; Denizoğlu, M.; Özdemir, H.; Deniz, A. The Role of Meteorological Variables and Cloud Base Heights in Urban Air Quality. Air Qual. Atmos. Health 2025, 18, 3381–3396. [Google Scholar] [CrossRef]
  27. Yao, T.; Lu, S.; Wang, Y.; Li, X.; Ye, H.; Duan, Y.; Li, J. Revealing the Drivers of Surface Ozone Pollution by Explainable Machine Learning and Satellite Observations in Hangzhou Bay, China. J. Clean. Prod. 2024, 440, 140938. [Google Scholar] [CrossRef]
  28. Aman, N.; Panyametheekul, S.; Pawarmart, I.; Liu, Y.; Rattanapotanan, T.; Tia, W. Machine Learning-Based Quantification and Separation of Emissions and Meteorological Effects on PM2.5 in Greater Bangkok. Sci. Rep. 2025, 15, 14775. [Google Scholar] [CrossRef]
  29. Wong, P.Y.; Su, H.J.; Lung, S.C.C.; Liu, W.Y.; Tseng, H.T.; Adamkiewicz, G.; Wu, C.D. Explainable Geospatial-Artificial Intelligence Models for the Estimation of PM2.5 Concentration Variation during Commuting Rush Hours in Taiwan. Environ. Pollut. 2024, 349, 123974. [Google Scholar] [CrossRef]
  30. Mohapatra, E.; Das, M.; Rath, S. Deep Learning-Based AQI Forecasting: A CNN-LSTM Model with Visual Insights from SHAP-LIME and PDP. Discov. Appl. Sci. 2025, 7, 1326. [Google Scholar] [CrossRef]
  31. Stekhoven, D.J.; Bühlmann, P. MissForest—Non-Parametric Missing Value Imputation for Mixed-Type Data. Bioinformatics 2012, 28, 112–118. [Google Scholar] [CrossRef]
  32. Junger, W.L.; De Leon, A.P. Imputation of Missing Data in Time Series for Air Pollutants. Atmos. Environ. 2015, 102, 96–104. [Google Scholar] [CrossRef]
  33. Dongre, P.K.; Patel, V.; Bhoi, U.; Maltare, N.N. An Outlier Detection Framework for Air Quality Index Prediction Using Linear and Ensemble Models. Decis. Anal. J. 2025, 14, 100546. [Google Scholar] [CrossRef]
  34. Parra-Plazas, J.; Gaona-Garcia, P.; Plazas-Nossa, L. Time Series Outlier Removal and Imputing Methods Based on Colombian Weather Stations Data. Environ. Sci. Pollut. Res. 2023, 30, 72319–72335. [Google Scholar] [CrossRef]
  35. Kousoulidou, M.; Ntziachristos, L.; Mellios, G.; Samaras, Z. Road-Transport Emission Projections to 2020 in European Urban Environments. Atmos. Environ. 2008, 42, 7465–7475. [Google Scholar] [CrossRef]
  36. Seinfeld, J.H.; Pandis, S.N. Atmospheric Chemistry and Physics: From Air Pollution to Climate Change, 3rd ed.; John Wiley & Sons: Hoboken, NJ, USA, 2016. [Google Scholar]
  37. Kumar, V.; Agrawal, A.; Kedam, N.; Gupta, S.; Singh, P. Advancing Air Quality Prediction with Hyperparameter Optimization and Innovative Feature Analysis Using Deep Learning Models in Phoenix, Arizona, USA. Theor. Appl. Climatol. 2026, 157, 60. [Google Scholar] [CrossRef]
  38. Li, J.; An, X.; Li, Q.; Wang, C.; Yu, H.; Zhou, X.; Geng, Y.A. Application of XGBoost Algorithm in the Optimization of Pollutant Concentration. Atmos. Res. 2022, 276, 106238. [Google Scholar] [CrossRef]
  39. Chen, T.; Guestrin, C. XGBoost: A Scalable Tree Boosting System. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; pp. 785–794. [Google Scholar] [CrossRef]
  40. Bergstra, J.; Bardenet, R.; Bengio, Y.; Kégl, B. Algorithms for Hyper-Parameter Optimization. In Advances in Neural Information Processing Systems; Curran Associates, Inc.: Red Hook, NY, USA, 2011; Volume 24. [Google Scholar]
  41. Ayus, I.; Natarajan, N.; Gupta, D. Comparison of Machine Learning and Deep Learning Techniques for the Prediction of Air Pollution: A Case Study from China. Asian J. Atmos. Environ. 2023, 17, 4. [Google Scholar] [CrossRef]
  42. Ke, G.; Meng, Q.; Finley, T.; Wang, T.; Chen, W.; Ma, W.; Ye, Q.; Liu, T.Y. LightGBM: A Highly Efficient Gradient Boosting Decision Tree. In Advances in Neural Information Processing Systems; Curran Associates, Inc.: Red Hook, NY, USA, 2017; Volume 30. [Google Scholar]
  43. Zhong, J.; Zhang, X.; Gui, K.; Wang, Y.; Che, H.; Shen, X.; Zhang, L.; Zhang, Y.; Sun, J.; Zhang, W. Robust Prediction of Hourly PM2.5 from Meteorological Data Using LightGBM. Natl. Sci. Rev. 2021, 8, nwaa307. [Google Scholar] [CrossRef] [PubMed]
  44. Prokhorenkova, L.; Gusev, G.; Vorobev, A.; Dorogush, A.V.; Gulin, A. CatBoost: Unbiased Boosting with Categorical Features. In Advances in Neural Information Processing Systems; Curran Associates, Inc.: Red Hook, NY, USA, 2018; Volume 31. [Google Scholar]
  45. Breiman, L. Random Forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
  46. Vovk, T.; Kryza, M.; Werner, M. Using Random Forest to Improve EMEP4PL Model Estimates of Daily PM2.5 in Poland. Atmos. Environ. 2024, 332, 120615. [Google Scholar] [CrossRef]
  47. Hochreiter, S.; Schmidhuber, J. Long Short-Term Memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef]
  48. Zhou, S.; Wang, W.; Zhu, L.; Qiao, Q.; Kang, Y. Deep-Learning Architecture for PM2.5 Concentration Prediction: A Review. Environ. Sci. Ecotechnol. 2024, 21, 100400. [Google Scholar] [CrossRef]
  49. Lundberg, S.M.; Erion, G.; Chen, H.; DeGrave, A.; Prutkin, J.M.; Nair, B.; Katz, R.; Himmelfarb, J.; Bansal, N.; Lee, S.I. From Local Explanations to Global Understanding with Explainable AI for Trees. Nat. Mach. Intell. 2020, 2, 56–67. [Google Scholar] [CrossRef]
  50. Salih, A.M.; Raisi-Estabragh, Z.; Galazzo, I.B.; Radeva, P.; Petersen, S.E.; Lekadir, K.; Menegaz, G. A Perspective on Explainable Artificial Intelligence Methods: SHAP and LIME. Adv. Intell. Syst. 2025, 7, 2400304. [Google Scholar] [CrossRef]
  51. Cleveland, R.B.; Cleveland, W.S.; McRae, J.E.; Terpenning, I.J. STL: A Seasonal-Trend Decomposition Procedure Based on Loess. J. Off. Stat. 1990, 6, 3–73. [Google Scholar]
  52. Yin, H.; Jin, D.; Hong, H.; Moon, J.; Gu, Y.H. IAQ-STL-ML: A Novel Indoor Air Quality Prediction Pipeline Using Meta-Learning Framework with STL Decomposition. Environ. Technol. Innov. 2025, 38, 104107. [Google Scholar] [CrossRef]
  53. Başakın, E.E.; Ekmekcioğlu, Ö.; Stoy, P.C.; Özger, M. Estimation of Daily Reference Evapotranspiration by Hybrid Singular Spectrum Analysis-Based Stochastic Gradient Boosting. MethodsX 2023, 10, 102163. [Google Scholar] [CrossRef]
  54. Sugihara, G.; May, R.; Ye, H.; Hsieh, C.H.; Deyle, E.; Fogarty, M.; Munch, S. Detecting Causality in Complex Ecosystems. Science 2012, 338, 496–500. [Google Scholar] [CrossRef]
  55. Chai, T.; Draxler, R.R. Root Mean Square Error (RMSE) or Mean Absolute Error (MAE)?–Arguments against Avoiding RMSE in the Literature. Geosci. Model Dev. 2014, 7, 1247–1250. [Google Scholar] [CrossRef]
  56. Kendrick, C.M.; Koonce, P.; George, L.A. Diurnal and Seasonal Variations of NO, NO2 and PM2.5 Mass as a Function of Traffic Volumes alongside an Urban Arterial. Atmos. Environ. 2015, 122, 133–141. [Google Scholar] [CrossRef]
  57. Rowland, O.E. Comparative Analysis of Meteorological Parameters and Their Relationship with NO2, PM10, PM2.5 and O3 Concentrations at Selected Urban Air Quality Monitoring Stations in Krakow, Paris, and Milan. Discov. Environ. 2024, 2, 75. [Google Scholar] [CrossRef]
  58. Wójcik-Gront, E.; Gozdowski, D. Air Pollution Monitoring and Modeling: A Comparative Study of PM, NO2, and SO2 with Meteorological Correlations. Atmosphere 2025, 16, 1199. [Google Scholar] [CrossRef]
  59. Hou, Y.; Wang, Q.; Tan, T. Evaluating Drivers of PM2.5 Air Pollution at Urban Scales Using Interpretable Machine Learning. Waste Manag. 2025, 192, 114–124. [Google Scholar] [CrossRef] [PubMed]
  60. Yin, C.; Li, W.; Li, T.; Zhang, S.; Zhu, Y.; Liu, J. Air Quality Prediction Model Based on Deep Learning Hybrid Framework. Sci. Rep. 2026, 16, 7084. [Google Scholar] [CrossRef]
  61. Martinović, M.; Dokic, K.; Pudić, D. Comparative Analysis of Machine Learning Models for Predicting Innovation Outcomes: An Applied AI Approach. Appl. Sci. 2025, 15, 3636. [Google Scholar] [CrossRef]
  62. Petrić, V.; Hussain, H.; Časni, K.; Vuckovic, M.; Schopper, A.; Andrijić, Ž.U.; Lovrić, M. Ensemble Machine Learning, Deep Learning, and Time Series Forecasting: Improving Prediction Accuracy for Hourly Concentrations of Ambient Air Pollutants. Aerosol Air Qual. Res. 2024, 24, 230317. [Google Scholar] [CrossRef]
  63. Zhalehdoost, A.; Taleai, M. Unravelling the Importance of Spatial and Temporal Resolutions in Modeling Urban Air Pollution Using a Machine Learning Approach. Sci. Rep. 2025, 15, 27708. [Google Scholar] [CrossRef] [PubMed]
  64. Yang, J.; Shi, L.; Lee, J.; Ryu, I. Spatiotemporal Prediction of Particulate Matter Concentration Based on Traffic and Meteorological Data. Transp. Res. Part D Transp. Environ. 2024, 127, 104070. [Google Scholar] [CrossRef]
  65. Ma, G.; Xu, K.; Zhang, Y.; Zhang, L.; Chen, Z. A Novel Deep Learning Model for Air Quality Index Prediction Integrating Time Series Decomposition and Intelligent Optimization. Results Eng. 2025, 27, 106078. [Google Scholar] [CrossRef]
  66. Li, W.; Jiang, X. Prediction of Air Pollutant Concentrations Based on TCN-BiLSTM-DMAttention with STL Decomposition. Sci. Rep. 2023, 13, 4665. [Google Scholar] [CrossRef]
  67. Zhang, Q.; Fang, T.; Yin, J.; Men, Z.; Peng, J.; Wu, L.; Mao, H. Vehicle Non-Exhaust Emissions Significantly Contribute to Urban PM Pollution in New Energy Vehicles Era. J. Geophys. Res. Atmos. 2025, 130, e2024JD042126. [Google Scholar] [CrossRef]
  68. Boddu, Y.; Manimaran, A.; Arunkumar, B.; Sucharitha, M.; Babu, J.S. Advanced Air Quality Forecasting Using an Enhanced Temporal Attention-Driven Graph Convolutional Long Short-Term Memory Model with Seasonal-Trend Decomposition. IEEE Access 2024, 12, 189233–189252. [Google Scholar] [CrossRef]
  69. Sun, M.; Rao, C.; Hu, Z. Air Quality Prediction Using a Novel Three-Stage Model Based on Time Series Decomposition. Environ. Dev. Sustain. 2026, 28, 307–332. [Google Scholar] [CrossRef]
  70. Levi, Y.; Broday, D.M. Revealing Causality in the Associations between Meteorological Variables and Air Pollutant Concentrations. Environ. Pollut. 2024, 345, 123526. [Google Scholar] [CrossRef] [PubMed]
  71. Ansari, A.; Quaff, A.R. Comparative benchmarking of eleven machine learning regressors for daily air quality index (AQI) forecasting: Sensitivity analysis and SHapley Additive exPlanations (SHAP)-Based interpretability framework. Theor. Appl. Climatol. 2025, 156, 620. [Google Scholar] [CrossRef]
  72. Aman, N.; Panyametheekul, S.; Pawarmart, I.; Sudhibrabha, S.; Manomaiphiboon, K. A Visibility-Based Historical PM2.5 Estimation for Four Decades (1981–2022) Using Machine Learning in Thailand: Trends, Meteorological Normalization, and Influencing Factors Using SHAP Analysis. Aerosol Air Qual. Res. 2025, 25, 4. [Google Scholar] [CrossRef]
  73. Zaib, S.; Shahid, M.Z.; Shahid, I. Decadal air quality dynamics in the Yangtze River Delta: Evidence from machine learning–driven analysis. Air Qual. Atmos. Health 2026, 19, 64. [Google Scholar] [CrossRef]
Figure 1. Location of the traffic detectors, AQMS and FSM Bridge in İstanbul.
Figure 1. Location of the traffic detectors, AQMS and FSM Bridge in İstanbul.
Atmosphere 17 00591 g001
Figure 2. Hourly time series of air pollutants, traffic volume and meteorological variables (January 2022–October 2023).
Figure 2. Hourly time series of air pollutants, traffic volume and meteorological variables (January 2022–October 2023).
Atmosphere 17 00591 g002
Figure 3. Average daily (diurnal) profiles and confidence intervals of PM10, PM2.5, NO2, and NOX pollutants.
Figure 3. Average daily (diurnal) profiles and confidence intervals of PM10, PM2.5, NO2, and NOX pollutants.
Atmosphere 17 00591 g003
Figure 4. Pearson correlation matrix between air quality, meteorological and traffic variables.
Figure 4. Pearson correlation matrix between air quality, meteorological and traffic variables.
Atmosphere 17 00591 g004
Figure 5. R2 and RMSE Performance Heat Map of Five Models within the M-ETAQI Framework.
Figure 5. R2 and RMSE Performance Heat Map of Five Models within the M-ETAQI Framework.
Atmosphere 17 00591 g005
Figure 6. Actual Value—Prediction Scatter Plots for All Models and Pollutants.
Figure 6. Actual Value—Prediction Scatter Plots for All Models and Pollutants.
Atmosphere 17 00591 g006
Figure 7. XGBoost Model SHAP Beeswarm Analysis: Feature Contributions and Impact Directions for Four Pollutants.
Figure 7. XGBoost Model SHAP Beeswarm Analysis: Feature Contributions and Impact Directions for Four Pollutants.
Atmosphere 17 00591 g007
Figure 8. Percentage Contribution of Feature Categories to Pollutant Predictions According to SHAP Values.
Figure 8. Percentage Contribution of Feature Categories to Pollutant Predictions According to SHAP Values.
Atmosphere 17 00591 g008
Figure 9. Decomposition of PM10 Time Series into Trend, Seasonal and Residual Components by STL Method.
Figure 9. Decomposition of PM10 Time Series into Trend, Seasonal and Residual Components by STL Method.
Atmosphere 17 00591 g009
Figure 10. STL Variance Decomposition for Four Pollutants: Trend, Seasonal and Residual Component Contribution Rates (%).
Figure 10. STL Variance Decomposition for Four Pollutants: Trend, Seasonal and Residual Component Contribution Rates (%).
Atmosphere 17 00591 g010
Figure 11. Two-Way Causal Effect Heat Map Determined by Convergent Cross-Mapping (CCM).
Figure 11. Two-Way Causal Effect Heat Map Determined by Convergent Cross-Mapping (CCM).
Atmosphere 17 00591 g011
Figure 12. Comparing Pearson Correlation Coefficient (|r|) with CCM Causal Power (ρ): Separating Linear and Nonlinear Relationships.
Figure 12. Comparing Pearson Correlation Coefficient (|r|) with CCM Causal Power (ρ): Separating Linear and Nonlinear Relationships.
Atmosphere 17 00591 g012
Table 1. Descriptive statistics for air quality and meteorological variables (mean, median, standard deviation, quartiles, skewness, and kurtosis).
Table 1. Descriptive statistics for air quality and meteorological variables (mean, median, standard deviation, quartiles, skewness, and kurtosis).
VariableMeanMedianStdMinMaxQ1Q3Missing (%)SkewnessKurtosis
NO2 (µg/m3)18.1314.9012.850.50116.609.1023.8022.111.684.14
NOX (µg/m3)52.2041.9041.361.20606.9021.8070.5022.512.2210.15
PM10 (µg/m3)31.0625.7023.910.10246.3014.0041.003.271.886.56
PM2.5 (µg/m3)15.4412.6011.410.10112.707.2020.803.211.583.78
Solar
Radiation (W/m2)
160.8310.90240.460.00966.100.00258.155.501.460.90
Pressure (hPa)1015.921015.006.32998.301037.001011.501019.205.500.590.41
Temperature (C)15.7516.007.68−4.1038.809.5022.405.68−0.14−0.92
Relative
Humidity (%)
58.5259.0011.9510.2086.0051.5066.605.50−0.400.40
Wind Speed (m/s)2.192.101.480.008.701.003.205.500.38−0.47
Wind Direction
(degree)
129.0767.50104.080.00337.5045.00225.0011.390.66−1.10
Precipitation (mm)0.920.008.110.00496.400.000.005.5031.721487.40
Table 2. Summary of engineered features derived from raw traffic, meteorological, and temporal variables.
Table 2. Summary of engineered features derived from raw traffic, meteorological, and temporal variables.
FeatureFormula/DefinitionDescription
Congestion Index (CI)CI = Vtotal/(Smean × C)Traffic density relative to free-flow capacity;
C = 1800 veh/h/lane × 6 lanes
Heavy Vehicle Ratio (HVR)HVR = Vlong/VtotalProportion of heavy-duty vehicles in total traffic
volume
Atmospheric Stability Index (ASI)Pasquill–Gifford classes (1–6)Based on wind speed, solar radiation, and time of day (1 = very unstable, 6 = very stable)
Hour of day (cyclical)sin(2πh/24), cos(2πh/24)Sinusoidal transformation to preserve circular
periodicity of hours
Day of year (cyclical)sin(2πd/365), cos(2πd/365)Sinusoidal transformation to capture seasonal
patterns
SeasonDJF/MAM/JJA/SONWinter (Dec–Feb), Spring (Mar–May),
Summer (Jun–Aug), Autumn (Sep–Nov)
Weekend indicator1 if Saturday/Sunday, 0 otherwiseBinary indicator for weekday/weekend traffic patterns
Public holiday indicator1 if Turkish public holiday, 0 otherwiseBinary indicator for national holidays
Speed × Wind SpeedSmean × WSInteraction between traffic speed and wind speed
Volume × ASIVtotal × ASIInteraction between traffic volume and atmospheric stability
Congestion × HumidityCI × RHInteraction between congestion index and relative humidity
Lag features (t − k)x(tk), k = 1,2,3,6,12,24Lagged values of pollutant and meteorological
variables
Rolling mean(1/w) Σ x(t − i), w = 3,6,12,24 hMoving average over 3 h, 6 h, 12 h, and 24 h
windows
Rolling stdσ(x(tw + 1)… x(t)), w = 3,6,12,24 hMoving standard deviation over corresponding
windows
Table 3. Performance Comparison of Machine Learning Models for Four Pollutants (R2, RMSE, MAE, MAPE).
Table 3. Performance Comparison of Machine Learning Models for Four Pollutants (R2, RMSE, MAE, MAPE).
ModelPollutantR2RMSEMAEMAPE (%)
XGBoostPM100.847.134.9232.32
PM2.50.873.272.2526.1
NO20.816.284.1729.06
NOX0.8515.178.8120.64
LightGBMPM100.846.994.9733.39
PM2.50.873.342.2827.12
NO20.86.434.1829.22
NOX0.8415.359.0521.35
CatBoostPM100.847.185.1934.74
PM2.50.883.22.2227.8
NO20.826.174.0328.26
NOX0.8714.298.4320.4
Random ForestPM100.817.815.635.22
PM2.50.863.422.3727.69
NO20.796.64.2631.13
NOX0.8315.929.2222.82
CNN-LSTM-AttPM100.4213.5910.685.71
PM2.50.456.765.0963.99
NO20.3711.468.4165.75
NOX0.5227.1918.0549.01
Table 4. Causal Strength (ρ) and Causal Aspects of Traffic and Meteorological Variables on Pollutants in CCM.
Table 4. Causal Strength (ρ) and Causal Aspects of Traffic and Meteorological Variables on Pollutants in CCM.
VariablePM10PM2.5NO2NOX
Traffic Variables
Total Vehicles0.044 (→)0.092 (→)0.149 (→)0.138 (→)
Speed0.079 (←)0.065 (←)0.027 (←)0.082 (→)
Heavy Vehicles0.065 (←)0.066 (←)0.064 (←)0.066 (→)
Congestion Index0.079 (←)0.065 (←)0.027 (←)0.082 (→)
Heavy Vehicle Ratio0.092 (←)0.061 (→)0.039 (→)0.099 (→)
Meteorological Variables
Temperature0.099 (←)0.090 (←)0.147 (←)0.334 (→)
Relative Humidity0.068 (←)0.063 (←)0.309 (→)0.112 (→)
Wind Speed0.302 (→)0.205 (←)0.286 (→)0.370 (→)
Solar Radiation0.190 (→)0.102 (←)0.096 (→)0.143 (→)
Pressure0.053 (←)0.067 (→)0.104 (→)0.149 (→)
ρ = CCM causal strength (Var → Pollutant); → = variable causes pollutant; ← = feedback (pollutant affects variable); bold = strong causality (ρ > 0.20).
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Birinci, E.; Özdemir, H.; Deniz, A. Multi-Method Explainable AI Framework for Quantifying Traffic and Meteorological Contributions to Urban Air Pollution: A Case Study of Istanbul’s Bosphorus Bridge Corridor. Atmosphere 2026, 17, 591. https://doi.org/10.3390/atmos17060591

AMA Style

Birinci E, Özdemir H, Deniz A. Multi-Method Explainable AI Framework for Quantifying Traffic and Meteorological Contributions to Urban Air Pollution: A Case Study of Istanbul’s Bosphorus Bridge Corridor. Atmosphere. 2026; 17(6):591. https://doi.org/10.3390/atmos17060591

Chicago/Turabian Style

Birinci, Enes, Hüseyin Özdemir, and Ali Deniz. 2026. "Multi-Method Explainable AI Framework for Quantifying Traffic and Meteorological Contributions to Urban Air Pollution: A Case Study of Istanbul’s Bosphorus Bridge Corridor" Atmosphere 17, no. 6: 591. https://doi.org/10.3390/atmos17060591

APA Style

Birinci, E., Özdemir, H., & Deniz, A. (2026). Multi-Method Explainable AI Framework for Quantifying Traffic and Meteorological Contributions to Urban Air Pollution: A Case Study of Istanbul’s Bosphorus Bridge Corridor. Atmosphere, 17(6), 591. https://doi.org/10.3390/atmos17060591

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop