Explainable Machine Learning Reveals Distinct Air Pollution Profiles in Two Geographically Adjacent Cities

Aktürk, Cemal

doi:10.3390/app16083784

Open AccessArticle

Explainable Machine Learning Reveals Distinct Air Pollution Profiles in Two Geographically Adjacent Cities

by

Cemal Aktürk

Department of Computer Engineering, Faculty of Engineering and Natural Sciences, Gaziantep Islam Science and Technology University, Gaziantep 27000, Türkiye

Appl. Sci. 2026, 16(8), 3784; https://doi.org/10.3390/app16083784

Submission received: 17 March 2026 / Revised: 9 April 2026 / Accepted: 10 April 2026 / Published: 13 April 2026

(This article belongs to the Section Environmental Sciences)

Download

Browse Figures

Versions Notes

Abstract

Air pollution is one of the fundamental environmental problems that directly threaten public health, ecosystems, and sustainable urban life in regions with high industrialization and urbanization density. This study aims to investigate whether the air pollution dynamics in Gaziantep and Kilis, two neighboring cities in Turkey, exhibit distinctive city-specific characteristics in their time series. In this context, Dynamic Time Warping (DTW) distance matrix and hierarchical clustering approaches were applied to compare the temporal behavior of pollutants from daily time series of PM10, SO₂, CO, and O₃ measurements across provinces between 2021 and 2025. Random Forest (RF), XGBoost, and Support Vector Machines (SVM) models were then developed to examine the separability of cities based solely on pollutant concentrations. The results revealed that the RF and XGBoost models successfully classified the two cities with over 93% accuracy. Additionally, SHAP analysis was used to interpret the contribution of each pollutant within the classification models, indicating that PM10 and SO₂ have relatively higher importance in distinguishing between the two cities. It should be noted that SHAP provides model-based interpretability rather than a direct representation of physical or atmospheric mechanisms. The findings suggest that pollutant time series may exhibit statistically distinguishable structures even between neighboring cities.

Keywords:

air pollution; machine learning; explainable artificial intelligence; adjacent cities

1. Introduction

Air pollution is considered one of the most important environmental risk factors for public health and sustainable urban development [1]. According to World Health Organization (WHO) estimates, outdoor air pollution causes approximately 7 million premature deaths each year [2]. Particularly particulate matter fractions (PM10 and PM2.5) are directly linked to respiratory diseases, cancer, and cardiovascular diseases [3,4]. This public health problem imposes an additional burden of trillions of dollars on the health budgets of countries on a global scale, in parallel with the loss of human lives; for example, the total economic losses caused by air pollution in Turkey in 2025 alone are estimated to be US$138 billion, equivalent to approximately 10% of Turkey’s GDP in the previous year [5]. These losses necessitate the precise identification of location-specific emission sources and the high-accuracy determination of the variables affecting pollution dynamics.

Turkey is a region facing air quality challenges due to internal and external migration, rapid urbanization, increasing energy demand, and its geopolitical location. Our study area (Gaziantep and Kilis) is a critical area where PM10 concentrations are particularly high due to industrial activities, traffic, heating requirements, and dust transport. This regional heterogeneity demonstrates that air pollution is shaped not only by seasonal factors but also by regional factors. In this context, a comparative analysis of the air pollution regimes of two provinces located in close proximity but with different urban and industrial structures provides a scientific basis for targeted pollution reduction policies.

A significant methodological gap exists in the air pollution modeling literature: while air pollution data are generally used with a focus on prediction (what will happen?), approaches focused on classification and interpretability that aim to identify statistically distinguishable patterns between regions are still limited. Powerful tools are required to definitively prove whether the air pollution profiles of two regions are statistically different. In this context, traditional correlation analyses are insufficient to capture the complex, non-linear temporal dependence of pollutant profiles over time. While high-performance machine learning (ML) classification models offer high predictive accuracy, they generally act as black boxes. This uncertainty hinders the scientific explanation of which pollutants or pollutant interactions create regional differences. This research aims to fill these methodological gaps in the literature by conducting a transparent and interpretable multidimensional analysis of the air pollution profiles of the neighboring provinces of Gaziantep and Kilis using time-series characteristics of PM10, Sulfur Dioxide (SO₂), Carbon Monoxide (CO), and ozone gas (O₃) pollutant concentrations and ML techniques. This study aims to make the following three original and critical contributions to the literature.

In addition to traditional correlation analysis, a Dynamic Time Warping (DTW) distance matrix will be calculated to measure the rhythmic and shape similarity of pollutant profiles over time and visualized with a hierarchical clustering (dendrogram). This will objectively prove that the pollution regimes of the two cities are distinct in terms of non-linear temporal variation.
Regional classification using pollutant data using Random Forest (RF), XGBoost, and Support Vector Machine (SVM) models will be used to assess the usability of air quality data for spatial identification. This will demonstrate whether the air pollution profiles of the two cities are statistically independent and predictably different.
SHAP (SHapley Additive Explanations) analysis is utilized to provide model-based interpretability by quantifying the contribution of individual pollutants to classification outcomes. The analysis enables the identification of relative feature importance and captures interaction effects within the model, thereby enhancing transparency without implying direct physical or mechanistic relationships. This contributes to the development of interpretable data-driven frameworks for air pollution analysis.

In conclusion, by jointly examining the dynamic similarities, levels of dissociation, and structural characteristics of pollutant time series in two neighboring cities, a study rarely addressed in the literature, a new perspective is offered for assessing regional pollution sources. This study provides local governments and environmental policymakers with a data-driven, cost-effective, and reusable analytical approach for identifying the specific dynamics of urban air quality. These comprehensive and interpretable approaches will contribute to the consideration of regional differences and pollutant synergies in the formulation of future air pollution control policies, enabling more effective and targeted policy implementation.

The novelty of this study does not lie solely in the ability to distinguish between two cities, but in proposing an explainable and reproducible framework for identifying city-specific pollution signatures from routine air-quality measurements. Such a framework may support comparative environmental assessment in settings where detailed emission inventories or meteorological integrations are not readily available.

Related Works

The time-varying nature of air pollution and its complex relationship with meteorological variables have led researchers to widely use ML and deep learning (DL)-based approaches in this field. A review of studies in this area reveals that tree-based ML algorithms such as RF, Gradient Boosting (GB), and XGBoost are generally used, as well as advanced DL architectures such as Long Short-Term Memory (LSTM), Gated Recurrent Unit (GRU), and CNN-LSTM designed to capture nonlinear relationships in pollution time series. Many studies examine the performance of ML and DL models based on the dataset, regional variables, and pollutant type. A review of prominent and recent studies shows that Ayus et al. [6] compared various ML models with a regional dataset from China and reported that XGBoost provided higher accuracy in certain cases. Artificial Neural Networks (ANNs) were found to perform best in PM10 prediction for Ankara, Turkey [7]. Cerezuela-Escudero and colleagues [8] compared different ML models for spatial pollution prediction and emphasized that the ranking of the models’ success varies depending on the regional topography and emission sources. In studies on data from Poland and India, ANNs (90%) [9] and XGBoost [10] gave the best results, showing that tree-based models provide stable results, especially for limited datasets. Bekkar et al. [11] and Gilik et al. [12] have shown that architectures created by using hybrid models such as CNN + LSTM provide significant improvements in PM2.5 and PM10 predictions by revealing both spatial and temporal patterns. Spatio-temporal studies on regional data from Sri Lanka, South Africa and India show that DL models have higher generalization capacity in regions with pronounced daily and seasonal cycles [13,14,15]. Kalantari et al. [16] showed that deep learning does not always provide an advantage in air quality (AQI) prediction, and that shallow learning models, with appropriate hyperparameter settings, can outperform DL models in some cases. This supports the view that no single superior model exists in the literature. A review of local studies has shown that Recurrent Neural Network (RNN)-based DL models are successful in predicting SO₂ and PM10 from a dataset from the Sakarya region of Turkey [17]. Studies for South Asian cities (Delhi, Hyderabad) have shown that adding meteorological variables to the model significantly improves performance [10,17]. A study on PM2.5 prediction for South African cities has emphasized that ML models demonstrate varying levels of generalization success in different regions [14]. Zaini et al. [18] also reported that the performance of DL models in time-series air quality predictions depends largely on the data cleaning and model training protocol. Iqbal and Mukherjee [19] compared ML methods extensively and stated that the importance of XAI methods such as SHAP has increased especially in environmental decision-making processes. In recent years, studies on air quality prediction have also been the focus of attention. Gupta et al. [20] compared various ML techniques and showed that AQI prediction performances differ significantly depending on regional variability. Kurnaz and Demir [21] predicted SO₂ and PM10 concentrations in Sakarya province, which has a high industrial density, with an RNN-based model and emphasized that deep learning structures can successfully capture the time-dependent pollutant behavior. In addition, Liang et al. [22] highlighted the role of feature engineering in increasing the prediction accuracy in air quality prediction. Focusing on PM10, Mampitiya et al. [23] systematically compared different machine learning models and reported that sampling density and data quality directly affect model performance. Similarly, Mishra and Gupta [24] showed that deep learning algorithms provide higher accuracy in AQI prediction compared to traditional methods, demonstrating the superiority of LSTM-based architectures in modeling long-term dependencies.

2. Materials and Methods

In this study, a comprehensive ML-based methodology was applied to estimate the air pollution profiles of two neighboring provinces based on air pollution parameters. Data preprocessing was performed using cleaning, deletion, and type conversion filters to remove erroneous, missing, and heterogeneous data in the dataset. Subsequently, models were developed using RF, XGBoost, and SVM algorithms. In this study, the models were trained using the training dataset and evaluated on the hold-out test dataset. The dataset was first divided into training and test subsets. Model development and hyperparameter configuration were carried out using the training data only, whereas final performance metrics were computed on the independent test set. Therefore, the test data were not used during model training. The classification performance of the models was examined and visualized using metrics such as accuracy, ROC-AUC, precision, recall, F1-Score, complexity matrix, and ROC curve. Furthermore, SHAP analyses were performed for the studied models, and the contributions of key parameters to the model were examined in detail. The use of this methodological approach provides a reliable and explainable framework for classification problems, thus demonstrating both the interpretability of the models and the reliability of their high performance. Additionally, DTW analysis was performed to assess how pollutants in the two neighboring provinces changed over time. The methodological process of the research is illustrated in the flow diagram in Figure 1.

2.1. Dataset and Data Preprocessing

The dataset used in this study was obtained online from the Turkish National Air Quality Monitoring Network platform for the provinces of Gaziantep and Kilis. These two neighboring cities are approximately 43 km apart as the crow flies and 60 km by road, as shown in Figure 2. Data were extracted daily from the network’s registration date for these provinces, 24 April 2021, to the data recording date, 9 August 2025, and saved in CSV format. The dataset contains a total of 3138 observations, 1569 for each city. The parameters in the dataset are shown in Table 1. The data in Table 1 served as input in this study. City was used as the estimated output, representing the class variable. The dataset was constructed to ensure temporal alignment between the two cities. Daily observations were matched based on date, and only dates available for both cities were included in the analysis. Therefore, each record represents synchronized measurements for Gaziantep and Kilis, ensuring consistency in comparative analysis.

Prior to model development, the dataset was examined for missing values and outliers. Missing observations were limited and did not exceed a small proportion of the total dataset. Records containing invalid or non-numeric entries were removed during preprocessing. No interpolation was applied in order to preserve the original data distribution. Outliers were not explicitly removed, as they may represent real pollution peaks. Instead, the models used in this study (e.g., tree-based methods) are inherently robust to extreme values.

2.2. Dynamic Time Warping (DTW) Analysis

Dynamic Time Warping (DTW) is an analysis method that measures the similarity of two time series based on shifts in the time axis to reveal the relationship between them [25]. DTW analysis was applied to evaluate the temporal variation in pollutants among provinces. The aim of DTW analysis is to find the least-cost match between two time series X = (X1, X2, …, Xn) and Y = (Y1, Y2, …, Ym).

D (i, j) = |X i - Y j|

D T W (X, Y) = \min_{w} \sum_{k = 1}^{K} D (w_{k})

2.3. Random Forest (RF)

RF, developed by Breiman (2001), is a powerful learning method that trains multiple decision trees on bootstrap samples to reach a final decision through “voting” [26]. RF is commonly used for classification and regression. The model’s operating principle is to reduce correlation between trees by using random subsets of features, thus preventing overfitting. Each tree was grown to a specific depth parameter, and predictions were made by majority voting.

\hat{y}

is the final class prediction output for the input data

x

,

h_{t} (x)

is the prediction of the

t^{t h}

decision tree (with a total of T = 100 trees) and

m o d e

is a majority voting mechanism. In the final implementation, the Random Forest model was configured with 300 decision trees (T = 300), consistent with the hyperparameter settings reported in Section 2.7.

\hat{y} = m o d e \{h_{1} (x), h_{2} (x), \dots, h_{t} (x)\}

2.4. XGBoost

XGBoost is a gradient boosting algorithm developed by Chen and Guestrin (2016) and optimized for speed and performance [27]. A robust classification algorithm, XGBoost operates on the principle that new trees learn from the errors of previous trees [27]. It employs three boosting techniques: gradient, stochastic, and regular boosting. These techniques increase the algorithm’s resistance to overfitting. The model’s key advantages include enabling continuous training of pre-trained models, the ability to handle missing data, and support for parallel processing [28].

{\hat{y}}_{i}^{(t)}

is the prediction value for the

i^{t h}

sample at the

t^{t h}

iteration,

{\hat{y}}_{i}^{(t - 1)}

is the prediction for the

i^{t h}

sample at the previous

{(t - 1)}^{t h}

iteration,

n

is learning rate,

f_{t}

is the decision tree which added at the

t^{t h}

iteration and

x_{i}

is the feature vector of the

i^{t h}

sample.

{\hat{y}}_{i}^{(t)} = {\hat{y}}_{i}^{(t - 1)} + n \cdot f_{t} (x_{i})

2.5. Support Vector Machine (SVM)

SVM, developed by Cortes and Vapnik (1995) and used for purposes such as classification and regression [29], maximizes the margin between classes using a hyperplane. The SVM in this study used the Radial Basis Function (RBF) kernel, allowing nonlinear separation. For data that cannot be linearly separated, linear separation was achieved in high-dimensional spaces using kernel functions such as RBF. Optimization followed a standard objective function, ensuring accurate classification even for nonlinearly separable data:

\underset{\propto}{m a x} \sum_{i = 1}^{n} \propto_{i} - \frac{1}{2} \sum_{i, j = 1}^{n} \propto_{i} \propto_{j} y_{i} y_{j} K (x_{i}, x_{j})

RBF kernel function:

K (x_{i}, x_{j}) = e x p (- γ ‖x_{i} - {x_{j}‖}^{2})

2.6. SHAP (SHapley Additive Explanations)

SHAP is an interpretability method developed by Lundberg and Lee (2017) that calculates the contribution of each feature to the model based on game theory [30]. It quantitatively reveals how each feature contributes to the prediction for each observation, whether positive or negative. Thus, SHAP allows us to evaluate AI techniques in terms of explainability by clarifying how the algorithms used make decisions and how the features contribute to the classification prediction. This contribution increases the transparency, verifiability, and reliability of scientific studies, rather than “black box” science. The SHAP method offers the advantages of being model-independent and presenting the contributions of features in a quantitative and decomposable manner. The variation in feature importance across models reflects differences in model structure and learning mechanisms. Tree-based models capture hierarchical interactions, while SVM captures boundary-based relationships, which may lead to different importance rankings.

2.7. Hyperparameter Optimization

The machine learning models used in this study were implemented with hyperparameters optimized according to configurations recommended in the literature. The dataset was divided into training and test sets after preprocessing. Five-fold cross-validation was applied within the training set to improve robustness during model development and to reduce sensitivity to sampling variation. Following model fitting, final model performance was evaluated on the independent test set. Accordingly, the confusion matrices, ROC curves, and classification metrics reported in the Results Section correspond to the test-set evaluation, whereas cross-validation was used as an internal model development step. The number of trees was set to 300 in the RF model. To ensure the model’s learning flexibility, the max_depth parameter was unrestricted, and the default settings of min_samples_split = 2 and min_samples_leaf = 1 were used. In the XGBoost algorithm, the hyperparameters n_estimators = 300, max_depth = 4, learning_rate = 0.05, subsample = 0.9, and colsample_bytree = 0.9 were chosen to balance performance and generalization capacity. The SVM model was configured with an RBF kernel to capture nonlinear relationships. The regularization coefficient was set to C = 1.0, and the automatic scaling option “scale” was used for the kernel width parameter gamma. These hyperparameters contribute to more stable and reliable classification performance by improving the fit of each model to the data structure. Furthermore, the parameter values were selected considering the balance between performance and computational cost. The train–test split was performed using random sampling, without preserving temporal order, as the study is designed within a pattern recognition framework rather than a time-series forecasting setting. All performance metrics, including ROC curves, confusion matrices, and classification scores, are reported according to the test dataset.

2.8. Evaluation Metrics

To comprehensively and reliably evaluate the model’s classification performance, a set of standard metrics was used that measure the model’s discrimination and stability rather than its accuracy alone. The relationship between model outputs and actual label values was evaluated using True Positive (TP), False Positive (FP), True Negative (TN), and False Negative (FN) values.

TP (True Positive): Examples that the model predicted as positive but are actually positive.
FP (False Positive): Examples that the model predicted as positive but are actually negative.
TN (True Negative): Examples that the model predicted as negative but are actually negative.
FN (False Negative): Examples that the model predicted as negative but are actually positive.
Precision: It expresses the ratio of true positive predictions to total positive predictions and shows how accurately the model predicted positive results.
Recall: It shows how many of the true positive examples were predicted correctly and how well the model captured all positive examples.
F-Measure: It is the harmonic mean that measures the balance between precision and Recall, evaluates the overall performance of the model and is calculated as 2 × (Precision × Recall)/(Precision + Recall).
ROC Area: It represents the area under the ROC curve and measures the classification performance of the model, especially the relationship between the true positive rate and the false positive rate. AUC ≥ 0.9 is considered high performance.
Accuracy: It shows the ratio of correct predictions to total predictions and evaluates the overall success of the model.
Confusion Matrix: It shows the model’s correct and incorrect classifications for each class in detail. This allows for detailed analysis of classification performance by class, including the number of correct and incorrect classifications.

3. Results

3.1. Density and Correlation Analysis

Figure 3 shows the ability of air pollutants to distinguish between cities as a concentration distribution plot between pollutants and city variables. Figure 3 shows that O₃ levels are similar for both provinces. The CO distribution exhibits a wider variation in Gaziantep, while it is more symmetric and concentrated in Kilis. SO₂ is more prevalent in Gaziantep, while its concentration is lower in Kilis. There is a sharp difference in SO₂ concentration between the two provinces. Similarly, for PM10, Gaziantep has a higher concentration, while Kilis exhibits a narrower distribution. These significant concentration differences, particularly for PM10 and SO₂, allow algorithms to leverage important insights as a facilitator during classification. To statistically support the observed distribution differences, a Mann–Whitney U test was conducted. The results indicated that all pollutants (PM10, SO₂, CO, and O₃) showed statistically significant differences between the two cities (p < 0.05), confirming that the observed variations are not due to random fluctuations.

The findings of the correlation analysis conducted to examine the relationship between PM10 and other pollutants in Gaziantep and Kilis are presented in the correlation matrix in Figure 4. As shown in Figure 4, the correlation coefficients between PM10 and other pollutants are generally low in both cities, indicating weak linear relationships among variables. Correlation matrices were computed separately for each city to better capture city-specific pollutant relationships. The comparison of these matrices highlights differences in correlation structures between Gaziantep and Kilis. In Gaziantep, the highest correlation between PM10 and another pollutant is observed with CO (0.15), while in Kilis, the PM10–CO correlation is slightly higher (0.21). Similarly, the PM10–O₃ correlation in Kilis is calculated as 0.18. When considering other pollutant pairs, the CO–O₃ correlation in Gaziantep (0.31) is relatively more pronounced compared to other relationships, although it still reflects a moderate association. Overall, the correlation coefficients suggest that linear dependencies among pollutants are limited, and therefore the results should be interpreted as descriptive rather than explanatory. The observed differences in correlation structures between Gaziantep and Kilis may indicate variations in pollutant interaction patterns; however, these differences cannot be directly attributed to specific emission sources or atmospheric processes based on correlation analysis alone. In this context, the correlation analysis serves as an exploratory step that provides an initial overview of pollutant relationships. More advanced analyses, such as Dynamic Time Warping (DTW) and machine learning-based classification, are required to capture non-linear patterns and temporal structures that are not reflected in simple correlation measures.

3.2. Classification Results

The performance metrics for the RF, XGBoost, and SVM algorithms used to classify air pollution profiles for Gaziantep and Kilis are shown in Table 2. The classification accuracies of the algorithms were calculated for RandomForest (0.93), XGBoost (0.92), and SVM (0.87). The high classification performance indicates that the pollutant profiles of the two provinces exhibit statistically distinguishable patterns. However, these findings should not be interpreted as direct evidence of different physical pollution mechanisms without additional meteorological and source-specific data. All three machine learning algorithms used easily distinguished the pollutant behaviors of the two provinces. The relatively low performance of SVM compared to the others may indicate that the data distribution is partially far from linearly separable. This suggests that the pollutant profiles between the two provinces are not linear, but contain non-linear and high-dimensional patterns.

The algorithms’ performance in predicting each class is illustrated by the confusion matrices in Figure 5. Figure 5 shows that RF correctly classified 445 data points from Gaziantep and incorrectly classified 32 data points as Kilis. XGBoost correctly predicted 439 Gaziantep classes, while SVM correctly predicted 406 in this class. In predicting the Kilis class, XGBoost correctly classified more data points than RF (431), with 433 correct predictions. SVM correctly predicted 412 Kilis classes. Figure 5 also demonstrates that the algorithms achieved high success for both classes and that the test data was evenly distributed.

3.3. DTW Distance Analysis

The matrix data obtained from the DTW analysis is shown in Table 3, and the DTW-based hierarchical clustering results are shown in the dendrogram in Figure 6. The dendrogram is based on DTW distance values, and the vertical axis represents the dissimilarity between pollutant time series. Higher linkage distances indicate lower similarity between clusters. The data in Table 3 examines the similarities and differences between the time series of the four main pollutants in Gaziantep and Kilis provinces and presents this with numerical data. Pollutant parameters initialized with “G” are designated as those for Gaziantep, while those initialized with “K” are designated as those for Kilis. The advantage of DTW, in addition to classical correlation approaches, is that it considers not only synchronous relationships but also time-shifted or irregular behaviors. Therefore, it is useful as a tool for robustly assessing the structural similarity of air pollution profiles from two different stations. Table 3 shows that the DTW distance between the O₃ time series in both provinces, at 554.497, is quite low compared to the other pollutants. This can be seen in Figure 6, where the first and strongest cluster consists of the O₃ pollutants from both provinces. This suggests that O₃ exhibits relatively similar temporal behavior across the two cities. DTW distances are higher for PM10, CO, and SO₂. The DTW distance matrix indicates that pollutant time series differ in their temporal structure between the two cities. These differences may indicate differences in statistical patterns across the two cities and urban emission contexts; however, the present analysis does not directly identify the physical drivers underlying these temporal divergences. The very high calculation of 1088.776 for CO reveals a strong divergence between the two provinces, influenced by different dynamics. Figure 6 also shows that the CO series of the two provinces exhibits the most extreme divergence. This result indicates that the CO time series exhibits a distinct statistical pattern between the two cities, which contributes to the separability observed in the classification results. This result demonstrates the spatial heterogeneity of air quality in these provinces by systematically separating the pollution patterns of the provinces by pollutant in the dendrogram in Figure 4.

The annual changes in air pollutants are shown in Figure 7. In the graph in Figure 7, Gaziantep is represented in blue, and Kilis in orange. It appears that PM10 concentrations in Gaziantep follow a more fluctuating and high-variance trend. The annual trends are presented as descriptive summaries. Confidence intervals were not included, and therefore the results should be interpreted as indicative rather than inferential. This may be due to Gaziantep’s rapid urbanization process, the growth of organized industrial zones, and different meteorological influences. In Kilis, PM10 exhibits a more stable and low-variance trend over the years. Sudden increases in some years may be due to specific meteorological events or regional transport effects along the Syrian border. Analysis of the annual changes in PM10 pollutants reveals that the pollutant dynamics in Gaziantep and Kilis differ, and that Gaziantep has a much more complex pollutant profile than Kilis. This may be due to multifactorial influences in Gaziantep.

3.4. Explainability Analysis

The graph obtained from the SHAP analysis is shown in Figure 8, and the X-axis represents the SHAP values of the algorithms. The X-axis shows the direction and degree of influence of the features used in the classification on the model’s prediction. Color gradients indicate the magnitude of the value of the relevant variable (red: high, blue: low). In the SVM classification, PM10 and CO are seen to be more strongly involved in model decisions. In the XGBoost, PM10 and SO₂ are dominant. Both PM10 and SO₂ exhibited a strong discriminatory effect in both negative and positive samples. In the RF, the contributions of the variables are more evenly distributed, but PM10 remains a dominant predictor. CO and O₃ also contribute to the model, albeit limitedly. This confirms that PM10 and SO₂ show higher importance within the model in distinguishing between the two cities.

A summary SHAP plot of the interaction analysis, which explains how air pollution is classified by classification algorithms across provinces, is shown in Figure 9. Figure 9 illustrates the importance of individual pollutants in classification models, while the graph in Figure 8 demonstrates the synergistic effect of classification as a more comprehensive tool. Figure 9 demonstrates the strong interaction between PM10 and SO₂. This is evident from the fact that the points spread out to approximately ±0.2% on the X-axis, rather than being trapped at zero. This spread demonstrates that our model controls not only whether PM10 or SO₂ is high or low in the classification, but also whether both pollutants are present at a certain level. The observed spread of SHAP values indicates that PM10 and SO₂ contribute more prominently to the classification model compared to other pollutants. In addition, the interaction patterns observed between pollutants reflect how the model captures combined feature effects during classification. It is important to note that these interaction patterns represent model-based relationships rather than direct temporal or spatial correlations between pollutants. Similarly, the observed feature contributions indicate the relative importance of pollutants within the model, rather than implying causal or physical relationships. Overall, SHAP analysis enhances the transparency of the machine learning models by providing a quantitative interpretation of feature contributions and their relationships within the model structure, without implying underlying atmospheric or emission mechanisms. Interaction patterns observed in SHAP plots should be interpreted as model-based relationships rather than statistically validated interactions.

The ROC curves of the ML algorithms are presented in Figure 10. Figure 10 shows that all algorithms have a discriminatory structure in distinguishing air pollutants in the provinces of Gaziantep and Kilis, with the AUC values being quite high. RF and XGBoost, in particular, exhibit higher classification performance, as the curves are closer to the upper left corner than the reference line. SVM also has reasonable classification performance, but Figure 10 demonstrates that the SVM curve performs relatively lower than the other models.

4. Conclusions

The findings of this study demonstrate that daily air pollutant concentrations from two geographically adjacent cities can exhibit statistically distinguishable temporal and compositional patterns. Dynamic Time Warping (DTW) analysis revealed differences in the temporal structures of pollutant time series, while machine learning models, particularly Random Forest and XGBoost, achieved high classification performance in distinguishing between city-level pollution profiles. These results suggest that air pollutant observations may contain reproducible statistical signatures that enable comparative characterization of urban environments. From a methodological perspective, this study highlights the complementary roles of DTW and machine learning approaches. DTW captures temporal similarity and structural differences in pollutant sequences, whereas classification models identify non-linear patterns that support city-level differentiation. In addition, SHAP analysis enhances model transparency by quantifying the contribution of each pollutant to classification outcomes. However, SHAP provides model-based interpretability rather than a direct representation of physical or atmospheric processes. Accordingly, the findings should be interpreted as reflecting the internal decision structure of the models rather than causal relationships between pollutants and emission sources. The correlation analysis further supports these findings by providing a preliminary overview of pollutant relationships. However, the generally weak correlation coefficients indicate that linear dependencies among pollutants are limited. Therefore, correlation results should be interpreted as descriptive rather than explanatory, reinforcing the need for more advanced techniques such as DTW and machine learning to capture complex pollutant dynamics. Importantly, this study is not designed as a time-series forecasting or sequential prediction task. Instead, it adopts a pattern recognition framework to evaluate whether pollutant observations contain statistically distinguishable features that allow classification of cities. Accordingly, the results should be interpreted as evidence of structural differences in pollutant distributions rather than predictive or causal relationships.

Despite the promising results, several limitations should be acknowledged. PM2.5 data were not included due to the lack of consistently available and comparable measurements across both cities during the study period. Since the study adopts a comparative design, only pollutants jointly available for both locations (PM10, SO₂, CO, and O₃) were considered. In addition, meteorological variables such as wind speed, temperature, and humidity were not incorporated, although they are known to influence pollutant dynamics. Other unobserved factors, including industrial activity, traffic patterns, and residential heating, may also contribute to the observed differences between cities but were not explicitly modeled. Therefore, the findings should be interpreted as reflecting statistically distinguishable patterns rather than direct physical or causal mechanisms. Furthermore, the primary objective of this study is to demonstrate the distinguishability of pollution profiles rather than to compare model families. For this reason, baseline models were not included; however, future research may incorporate simpler models and time-series-aware validation strategies to further strengthen methodological robustness. Future studies should also integrate meteorological parameters, additional pollutants such as PM2.5, and extended modeling approaches to improve the physical interpretability and generalizability of the findings. From an environmental management perspective, the present study demonstrates that routine air quality monitoring data can be used not only for descriptive reporting but also for comparative analysis of urban pollution structures. The proposed framework may support preliminary screening of cities, inter-city benchmarking, and the identification of regions requiring more detailed mechanistic investigation. In addition, the temporal dependence inherent in daily time-series data was not explicitly modeled, which may influence the independence assumptions of the classification framework.

In conclusion, this study presents an explainable machine learning-based framework for identifying city-specific pollution signatures using time-series data. The findings indicate that even neighboring cities may exhibit distinct statistical patterns in pollutant behavior, emphasizing the importance of localized environmental analysis. While the results do not establish causal relationships, they highlight the potential of combining temporal similarity analysis and interpretable machine learning for comparative urban air quality research.

Funding

The author declares that this study was not funded by any institution.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data used in this study are openly available on a provincial basis on the National Air Quality Monitoring Network Platform by the Ministry of Environment, Urbanization and Climate Change of the Republic of Turkey (Link: https://www.turkiye.gov.tr/cevre-ve-sehircilik-ulusal-hava-kalite-izleme-agi (accessed on 11 August 2025)). All Turkish citizens can access the platform using their national ID numbers and e-government passwords. Researchers who are not Turkish citizens can obtain the data from the author if they require it.

Conflicts of Interest

The author declares no conflicts of interest.

References

Landrigan, P.J.; Fuller, R.; Acosta, N.J.R.; Adeyi, O.; Arnold, R.; Basu, N.; Baldé, A.B.; Bertollini, R.; Bose-O’Reilly, S.; Boufford, J.I.; et al. The Lancet Commission on pollution and health. Lancet 2018, 391, 462–512. [Google Scholar] [CrossRef]
Carvalho, H. New WHO global air quality guidelines: More pressure on nations to reduce air pollution levels. Lancet Planet. Health 2021, 5, e760–e761. [Google Scholar] [CrossRef]
Cohen, A.J.; Brauer, M.; Burnett, R.; Anderson, H.R.; Frostad, J.; Estep, K.; Balakrishnan, K.; Brunekreef, B.; Dandona, L.; Dandona, R.; et al. Estimates and 25-year trends of the global burden of disease attributable to ambient air pollution. Lancet 2017, 389, 1907–1918. [Google Scholar] [CrossRef]
Burnett, R.; Chen, H.; Szyszkowicz, M.; Fann, N.; Hubbell, B.; Pope, C.A., 3rd; Apte, J.S.; Brauer, M.; Cohen, A.; Weichenthal, S.; et al. Global estimates of mortality associated with long-term exposure to outdoor fine particulate matter. Proc. Natl. Acad. Sci. USA 2018, 115, 9592–9597. [Google Scholar] [CrossRef]
Forbes Türkiye. Türkiye’de Hava Kirliliğinin Ekonomiye Maliyeti Yılda 138 Milyar Dolar. Available online: https://www.forbes.com.tr/surdurulebilirlik/turkiye-de-hava-kirliliginin-ekonomiye-maliyeti-yilda-138-milyar-dolar (accessed on 12 December 2025).
Ayus, I.; Natarajan, N.; Gupta, D. Comparison of machine learning and deep learning techniques for the prediction of air pollution: A case study from China. Asian J. Atmos. Environ. 2023, 17, 4. [Google Scholar] [CrossRef]
Bozdağ, A.; Dokuz, Y.; Gökçek, Ö.B. Spatial prediction of PM10 concentration using machine learning algorithms in Ankara, Turkey. Environ. Pollut. 2020, 263, 114599. [Google Scholar] [CrossRef] [PubMed]
Cerezuela-Escudero, E.; Montes-Sanchez, J.M.; Dominguez-Morales, J.P.; Duran-Lopez, L.; Jimenez-Moreno, G. A systematic comparison of different machine learning models for the spatial estimation of air pollution. Appl. Intell. 2023, 53, 29604–29619. [Google Scholar] [CrossRef]
Kujawska, J.; Kulisz, M.; Oleszczuk, P.; Cel, W. Machine Learning Methods to Forecast the Concentration of PM10 in Lublin, Poland. Energies 2022, 15, 6428. [Google Scholar] [CrossRef]
Gokul, P.; Mathew, A.; Bhosale, A.; Nair, A.T. Spatio-temporal air quality analysis and PM2.5 prediction over Hyderabad City, India using artificial intelligence techniques. Ecol. Inform. 2023, 76, 102045. [Google Scholar] [CrossRef]
Bekkar, A.; Hssina, B.; Douzi, S.; Douzi, K. Air-pollution prediction in smart city: Deep learning approach. J. Big Data 2021, 8, 35. [Google Scholar] [CrossRef]
Gilik, A.; Ogrenci, A.S.; Ozmen, A. Air quality prediction using CNN+LSTM-based hybrid deep learning architecture. Environ. Sci. Pollut. Res. 2022, 29, 11920–11938. [Google Scholar] [CrossRef] [PubMed]
Mampitiya, L.; Rathnayake, N.; Hoshino, Y.; Rathnayake, U. Forecasting PM10 levels in Sri Lanka: A comparative analysis of machine learning models. J. Hazard. Mater. Adv. 2024, 13, 100597. [Google Scholar] [CrossRef]
Morapedi, T.D.; Obagbuwa, I.C. Air pollution particulate matter (PM2.5) prediction in South African cities using machine learning techniques. Front. Artif. Intell. 2023, 6, 1197004. [Google Scholar] [CrossRef]
Pande, C.B.; Radhadevi, L.; Satyanarayana, M.B. Evaluation of machine learning and deep learning models for daily air quality index prediction in Delhi city, India. Environ. Monit. Assess. 2024, 196, 847. [Google Scholar] [CrossRef] [PubMed]
Kalantari, E.; Gholami, H.; Malakooti, H.; Nafarzadegan, A.R.; Moosavi, V. Machine learning for air quality index (AQI) forecasting: Shallow learning or deep learning? Environ. Sci. Pollut. Res. 2024, 31, 62962–62982. [Google Scholar] [CrossRef]
Patel, P.; Patel, S.; Shah, K.; Desai, K.; Patel, S.; Shah, M.; Patel, S. A systematic study on PM2.5 and PM10 concentration prediction using machine learning and deep learning model. Environ. Chem. Ecotoxicol. 2025, 7, 1401–1415. [Google Scholar] [CrossRef]
Zaini, N.; Ean, L.W.; Ahmed, A.N.; Malek, M.A. A systematic literature review of deep learning neural network for time series air quality forecasting. Environ. Sci. Pollut. Res. 2022, 29, 4958–4990. [Google Scholar] [CrossRef]
Iqbal, A.; Mukherjee, N. A Systematic Review and Comparative Study of Machine Learning Techniques for Air Quality Prediction. Water Air Soil Pollut. 2025, 236, 119. [Google Scholar] [CrossRef]
Gupta, N.S.; Mohta, Y.; Heda, K.; Armaan, R.; Valarmathi, B.; Arulkumaran, G. Prediction of Air Quality Index Using Machine Learning Techniques: A Comparative Analysis. J. Environ. Public Health 2023, 2023, 6691301. [Google Scholar] [CrossRef]
Kurnaz, G.; Demir, A.S. Prediction of SO₂ and PM10 air pollutants using a deep learning-based recurrent neural network: Case of industrial city Sakarya. Urban Clim. 2022, 41, 101036. [Google Scholar] [CrossRef]
Liang, Y.C.; Maimury, Y.; Chen, A.H.L.; Juarez, J.R.C. Machine learning-based prediction of air quality. Appl. Sci. 2020, 10, 9151. [Google Scholar] [CrossRef]
Mampitiya, L.; Rathnayake, N.; Hoshino, Y.; Rathnayake, U. Performance of machine learning models to forecast PM10 levels. MethodsX 2024, 12, 102345. [Google Scholar] [CrossRef] [PubMed]
Mishra, A.; Gupta, Y. Comparative analysis of Air Quality Index prediction using deep learning algorithms. Spat. Inf. Res. 2024, 32, 63–72. [Google Scholar] [CrossRef]
Sakoe, H.; Chiba, S. Dynamic programming algorithm optimization for spoken word recognition. IEEE Trans. Acoust. Speech Signal Process. 1978, 26, 43–49. [Google Scholar] [CrossRef]
Breiman, L. Random Forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
Chen, T.; Guestrin, C. XGBoost: A scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining; ACM: New York, NY, USA, 2016; pp. 785–794. [Google Scholar] [CrossRef]
Dhaliwal, S.S.; Nahid, A.A.; Abbas, R. Effective intrusion detection system using XGBoost. Information 2018, 9, 149. [Google Scholar] [CrossRef]
Cortes, C.; Vapnik, V. Support–vector networks. Mach. Learn. 1995, 20, 273–297. [Google Scholar] [CrossRef]
Lundberg, S.; Lee, S.I. A unified approach to interpreting model predictions. Adv. Neural Inf. Process. Syst. (NeurIPS) 2017, 30, 4765–4774. [Google Scholar]

Figure 1. Flowchart of proposed methodology.

Figure 2. Distance map of Gaziantep and Kilis cities.

Figure 3. Distribution of air pollutant concentrations by city.

Figure 4. Correlation heatmap for cities and air pollutants.

Figure 5. Confusion matrix for classification performance of algorithms.

Figure 6. DTW distance dendogram.

Figure 7. Annual PM10 Change by Province.

Figure 8. SHAP plot for ML algorithms.

Figure 9. SHAP plot for pollutant interaction analysis.

Figure 10. ROC curve of ML algorithms.

Table 1. Dataset attribute information.

Attribute	Unit	Range	Mean	Description
Date	—	24 April 2021–9 August 2025	—	It represents the daily dates of the measurements. Data were recorded simultaneously for both cities.
PM10	µg/m³	2.53–658.6	56.11	It represents particulate matter smaller than 10 micrometers.
SO₂	µg/m³	0.79–96.32	7.21	It is sulfur dioxide, which is produced by fossil fuel burning, industrial emissions, and heating activities.
CO	µg/m³	18.08–12,034.59	560.9	Carbon monoxide is a colorless and odorless gas resulting from incomplete combustion.
O₃	µg/m³	6.23–167.93	38.13	It is tropospheric ozone, formed at ground level. It is formed by the photochemical reactions of nitrogen oxides and volatile organic compounds. It varies depending on sunlight.
City	—	Gaziantep: 1569 record Kilis: 1569 record	—	It is a categorical (class) variable indicating the city where the observations are from. The study is based on a comparison of the air quality profiles of these two neighboring provinces.

Table 2. Performance analysis of classification algorithms.

Model	Class	Precision	Recall	F1-Score	Accuracy	F1-Score (Weighted)
RF	Gaziantep	0.93	0.93	0.93	0.9299	0.9299
RF	Kilis	0.93	0.93	0.93	0.9299	0.9299
XGBoost	Gaziantep	0.93	0.92	0.93	0.9257	0.9257
XGBoost	Kilis	0.92	0.93	0.93	0.9257	0.9257
SVM	Gaziantep	0.88	0.85	0.87	0.8684	0.8683
SVM	Kilis	0.85	0.89	0.87	0.8684	0.8683

Table 3. DTW distance matrix.

	G_PM10	G_SO₂	G_CO	G_O₃	K_PM10	K_SO₂	K_CO	K_O₃
G_PM10	0.0	706,201.0	745,879.0	828,986.0	822,127.0	854,691.0	800,104.0	785,270.0
G_SO₂	706,201.0	0.0	573,040.0	821,306.0	912,979.0	530,334.0	629,503.0	507,623.0
G_CO	745,879.0	573,040.0	0.0	849,330.0	821,695.0	948,980.0	1,088,776.0	755,274.0
G_O₃	828,986.0	821,306.0	849,330.0	0.0	660,658.0	670,698.0	671,914.0	554,497.0
K_PM10	822,127.0	912,979.0	821,695.0	660,658.0	0.0	994,119.0	616,372.0	580,580.0
K_SO₂	854,691.0	530,334.0	948,980.0	670,698.0	994,119.0	0.0	717,057.0	621,264.0
K_CO	800,104.0	629,503.0	1,088,776.0	671,914.0	616,372.0	717,057.0	0.0	514,100.0
K_O₃	785,270.0	507,623.0	755,274.0	554,497.0	580,580.0	621,264.0	514,100.0	0.0

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Aktürk, C. Explainable Machine Learning Reveals Distinct Air Pollution Profiles in Two Geographically Adjacent Cities. Appl. Sci. 2026, 16, 3784. https://doi.org/10.3390/app16083784

AMA Style

Aktürk C. Explainable Machine Learning Reveals Distinct Air Pollution Profiles in Two Geographically Adjacent Cities. Applied Sciences. 2026; 16(8):3784. https://doi.org/10.3390/app16083784

Chicago/Turabian Style

Aktürk, Cemal. 2026. "Explainable Machine Learning Reveals Distinct Air Pollution Profiles in Two Geographically Adjacent Cities" Applied Sciences 16, no. 8: 3784. https://doi.org/10.3390/app16083784

APA Style

Aktürk, C. (2026). Explainable Machine Learning Reveals Distinct Air Pollution Profiles in Two Geographically Adjacent Cities. Applied Sciences, 16(8), 3784. https://doi.org/10.3390/app16083784

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Explainable Machine Learning Reveals Distinct Air Pollution Profiles in Two Geographically Adjacent Cities

Abstract

1. Introduction

Related Works

2. Materials and Methods

2.1. Dataset and Data Preprocessing

2.2. Dynamic Time Warping (DTW) Analysis

2.3. Random Forest (RF)

2.4. XGBoost

2.5. Support Vector Machine (SVM)

2.6. SHAP (SHapley Additive Explanations)

2.7. Hyperparameter Optimization

2.8. Evaluation Metrics

3. Results

3.1. Density and Correlation Analysis

3.2. Classification Results

3.3. DTW Distance Analysis

3.4. Explainability Analysis

4. Conclusions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI