Next Article in Journal
Prioritizing Pharmaceuticals for Environmental Monitoring in Greece: A Comprehensive Review of Consumption, Occurrence, and Ecological Risk
Previous Article in Journal
Protective Effects of Olea europaea L. Leaves and Equisetum arvense L. Extracts Against Testicular Toxicity Induced by Metronidazole Through Reducing Oxidative Stress and Regulating NBN, INSL-3, STAR, HSD-3β, and CYP11A1 Signaling Pathways
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Forecasting Daily Ambient PM2.5 Concentrations in Qingdao City Using Deep Learning and Hybrid Interpretable Models and Analysis of Driving Factors Using SHAP

1
School of Geography and Environment, Liaocheng University, Liaocheng 252000, China
2
Institute of Huanghe Studies, Liaocheng University, Liaocheng 252000, China
3
State Key Laboratory of Loess Science, Institute of Earth Environment, Chinese Academy of Sciences, Xi’an 710061, China
4
National Ecosystem Science Data Center, Key Laboratory of Ecosystem Network Observation and Modeling, Institute of Geographic Sciences and Natural Resources Research, Chinese Academy of Sciences, Beijing 100101, China
*
Author to whom correspondence should be addressed.
Toxics 2026, 14(1), 44; https://doi.org/10.3390/toxics14010044 (registering DOI)
Submission received: 28 November 2025 / Revised: 25 December 2025 / Accepted: 29 December 2025 / Published: 30 December 2025

Abstract

With the acceleration of urbanization in China, air pollution is becoming increasingly serious, especially PM2.5 pollution, which poses a significant threat to public health. The study employed different deep learning models, including recurrent neural network (RNN), artificial neural network (ANN), convolutional Neural Network (CNN), bidirectional Long Short-Term Memory (BiLSTM), Transformer, and novel hybrid interpretable CNN–BiLSTM–Transformer architectures for forecasting daily PM2.5 concentrations on the integrated dataset. The dataset of meteorological factors and atmospheric pollutants in Qingdao City was used as input features for the model. Among the models tested, the hybrid CNN–BiLSTM–Transformer model achieved the highest prediction accuracy by extracting local features, capturing temporal dependencies in both directions, and enhancing global pattern and key information, with low root Mean Square Error (RMSE) (5.4236 μg/m3), low mean absolute error (MAE) (4.0220 μg/m3), low mean absolute percentage error (MAPE) (22.7791%) and high correlation coefficient (R) (0.9743) values. Shapley additive explanations (SHAP) analysis further revealed that PM10, CO, mean atmospheric temperature, O3, and SO2 are the key influencing factors of PM2.5. This study provides a more comprehensive and multidimensional approach for predicting air pollution, and valuable insights for people’s health and policy makers.

1. Introduction

Air pollution affects ecosystems, climate change, food security, transportation, tourism, residents’ health, and sustainable economic development [1,2,3,4]. Air pollution has become a global environmental problem, with fine particulate matter (PM2.5) receiving widespread attention due to its serious harm to human health [5,6,7]. PM2.5 can rapidly enter the respiratory system, lungs, and bloodstream, significantly increasing respiratory and cardiovascular diseases, such as chronic bronchitis, emphysema, and lung cancer [8,9]. In 2019, the number of deaths due to exposure to ambient PM2.5 pollution in the world exceeded about 4 million, more than twice the total number of deaths reported in COVID-19 in 2020 (about 2 million) [10]. Sustainable Development Goal (SDG) 3.9 invites a significant reduction in global PM2.5-related deaths. Therefore, accurately predicting PM2.5 concentration is of great significance for developing effective air pollution control strategies and issuing health warnings.
Traditional air pollution prediction methods are mainly based on statistical models and numerical simulations, but these methods have limitations in dealing with complex nonlinear relationships [11]. Air pollution is influenced by temperature, humidity, wind speed, wind direction, air pressure, air pollutants (NO2, O3, and SO2), and spatiotemporal patterns [12,13,14]. Artificial intelligence (AI) has a powerful ability to handle nonlinear problems [15]. Deep learning (DL) technology has made significant progress in the field of time series forecasting [16,17,18,19]. An artificial neural network (ANN) is used to forecast daily PM2.5 concentrations in Ahvaz, Iran, with R (0.90) and RMSE (48.73) for the testing set [20]. The improved particle swarm optimized backpropagation neural network (IPSO-BP) ANN model is proposed to forecast daily PM2.5 concentrations in Nanchang City with the correlation coefficient R2 (0.9573), and the RMSE (5.2407) [21]. However, BP ANN has problems such as slow learning speed, easy to fall into local optima, and poor generalization ability.
The accuracy of the recurrent neural network (RNN) method is about 81% for PM2.5 forecasting, higher than the CMAQ (Community Multiscale Air Quality) forecast [22]. Two-State Gated Recurrent Unit (GRU) (Variants of RNN) is proposed for estimating the PM2.5 concentration in Florida with MAE (1.45), RMSE (3.48), and MAPE (16.80%) [23]. Similarly, as a variant, long short-term memory (LSTM) is developed for forecasting PM2.5 with (R2  =  0.973) in Delhi [24]. LSTM is also applied for PM2.5 prediction in India with a coefficient of determination (R2) (0.9) [25]. LSTM and GRU are applied to predict indoor PM2.5 levels, achieving RMSEs of 3.491 and 3.327, respectively [26]. A bi-directional LSTM (BiLSTM) network is proposed to predict long-term PM2.5 concentrations spatiotemporally with R2 (0.62), RMSE (12.887) [27]. The BiLSTM model outperforms other models (LSTM, GRU, TCN, ARIMA, and SARIMA) with R2 (0.947) and MAE (11.68) [28]. The accuracy of the convolutional neural network (CNN) for PM2.5 concentrations in Hefei exceeds that of the ANN by about 14.2% [29]. To better capture these relationships between PM2.5 concentrations and multiple variables, nine models are compared: RF (random forest), SVM (support vector machine), XGBoost (extreme gradient boosting), general regression neural network (GRNN), light gradient boosting machine (LGBM), DNN, adaptive boosting (Adaboost), DBN (deep belief networks), and transformer. The optimal model transformer is selected to estimate daily PM2.5 concentrations in Tianjin, with R2 (0.88), RMSE (15.30 μg/m3), MAE (9.55 μg/m), and MAPE (21.07 μg/m3) [30]. Transformer is constructed based on a self-attention mechanism, and has a powerful capability of modeling short-term and long-term dependencies of complex data [31]. Transformer has shown better performance than RNN and LSTM in capturing long-term dependencies in forecasting PM2.5 concentrations [32]. In summary, a single deep learning model can forecast PM2.5 concentrations.
Recently, hybrid models have achieved promising results in PM2.5 prediction tasks [33,34]. The bifold-attention LSTM (BA-LSTM) model is introduced to enhance PM2.5 forecasting accuracy in Beijing, achieving RMSE (0.013) and MAPE (3.891). BA-LSTM outperforms multilayer perceptron (MLP), LSTM, attention–LSTM, and attention BiLSTM [35]. A 1D-CNN–BiLSTM is constructed for hourly PM2.5 forecasting, with RMSE (3.88), MAE (2.52), and R2 (0.94) [36]. The PSO–CNN–BiLSTM model demonstrates strong short-term PM2.5 predictive capabilities with RMSE (16.73 μg/m3), R2 (0.84) in Jiaozuo City [37]. CNN–BiLSTM outperforms LSTM, CNN, and XGBoost. The multi-view stacked (MvS) CNN–BiLSTM outperforms RNN, GRU, and BiLSTM [38]. A hybrid CLSTM–GPR (gaussian process regression) model is used to forecast PM2.5 concentrations in Jiangmen and Huizhou, with R increasing by about 4.4% and RMSE decreasing by about 4.7% compared to LSTM–GPR, GPR, and CNN–GPR models [39]. STL–CNN–BILSTM–AM is introduced to forecast future PM2.5 concentrations in Delhi, India, with RMSE (3.51), MAE (2.52), and R2 (0.998) [40]. The R2 value of Transformer is about 30% higher than that of the CNN–LSTM–Attention [41]. The hybrid Transformer–BiGRU model is adopted to predict PM2.5 concentrations in Seoul subway stations with RMSE (2.03), MAE (0.56), and MAPE (1.6%) [42]. CNN–Transformer is established to predict high-resolution PM2.5 concentrations in Cangzhou with R2 (0.887) [43]. These studies have advanced the development of PM2.5 prediction by adopting hybrid models. The modeling combination of different deep learning architectures is the future research in this key field.
However, the “black box” nature of DL makes it difficult to interpret. The Shapley additive explanations (SHAP) method becomes an important tool for us to understand the model. SHAP-based interpretability analysis reveals that PM10, temperature, and relative humidity are the key drivers for PM2.5 in Jiaozuo City [37]. SHAP reveals that the 3-day rolling average, daily variation, and 1-day lag dominate the predictive power of PM2.5 in the UK [44]. SHAP analysis identified CO, SO2, and O3 as the key contributors to PM2.5 levels in Shanghai [45].
Although DL methods have made progress in PM2.5 forecasting, there are still problems, such as a lack of interpretability, improper network structure processing, incomplete information transmission of sparse structures, and dynamic dependency limitations. This study proposes an innovative hybrid CNN–BiLSTM–Transformer model that aims to comprehensively utilize the advantages of three architectures: CNN extracts local features, BiLSTM captures bidirectional long-term dependencies, and Transformer extracts long-term dependencies, as well as global features of input data. This multi-level feature extraction mechanism enables it to better understand the complex patterns of PM2.5 changes. To conduct interpretable artificial intelligence and extended analysis, the SHAP framework is used to quantify the contribution of each input variable, revealing the most influential features. The purpose of the research is to improve the prediction accuracy of PM2.5 and identify key influencing factors.
The main contributions of this study include the following: (1) a novel SHAP explainable CNN–BiLSTM–Transformer hybrid model is constructed for PM2.5 prediction; (2) the performance of the hybrid model and the single models is evaluated systematically; (3) the key factors affecting PM2.5 prediction are revealed through SHAP analysis; (4) practical predictive tools and scientific insights are provided for air quality management in Qingdao City.

2. Materials and Methods

2.1. Data

Qingdao is an important coastal city in China. It has a population of 10.4425 million people. In 2024, the GDP of Qingdao was 1671.946 billion yuan. Meteorological data are sourced from the platform (https://data.cma.cn, accessed on 17 July 2024), while air quality data are obtained from the platform (https://www.cnemc.cn, accessed on 21 August 2024). To ensure a realistic assessment of model generalization under operational forecasting conditions, a strict chronological split was adopted. These are divided into three parts: training part (from 1 January 2014 to 6 August 2019), validation part (from 7 August 2019 to 18 April 2020), and testing part (from 19 April 2020 to 31 December 2020), avoiding any information leakage across periods. This temporal partitioning reflects the practical scenario in which future PM2.5 concentrations are predicted using only historical observations. In order to improve model performance, all variables are normalized utilizing the minimum–maximum scale method, which converts the data into a range of [0, 1]. This standardization helps the deep learning and hybrid models to more effectively handle input variables of different scales.

2.2. Artificial Neural Network (ANN)

ANN approximates complex nonlinear relationships between input features and the target through a series of interconnected neurons arranged in layers. The network includes fully connected hidden layers with activations and a learning rate, trained using mean squared error loss [46].

2.3. Recurrent Neural Network (RNN)

Unlike the ANN, RNN processes sequential inputs through recurrent units, enabling the model to retain information across consecutive time steps [47]. The network comprises two RNN layers, followed by a fully connected feedforward head with activation.

2.4. Convolutional Neural Network (CNN)

CNN applies convolutional filters across feature sequences to extract spatial correlations, followed by a fully connected feedforward head with ReLU activation and a pooling layer to produce a single output [48].

2.5. Bidirectional Long Short-Term Memory (BiLSTM)

BiLSTM captures both past and future temporal dependencies in daily PM2.5 concentrations. The BiLSTM processes sequential inputs through forward and backward LSTM layers, enabling the model to integrate information from both preceding and subsequent time steps [49].

2.6. Transformer

Transformer represents a major advancement in sequence modeling by replacing recurrence with a multi-head self-attention mechanism [50]. This design enables efficient representation of long-term temporal relationships, making the Transformer a powerful tool for time-series prediction tasks. Unlike recurrent architectures that process the input sequentially, the Transformer employs a multi-head self-attention mechanism that enables each time step to directly attend to all other steps within the input sequence. This property allows the model to more effectively learn long-range interactions among variables even when the sequence length is short, which fits the characteristics of daily air-pollution data. This module includes position encoding, multi head self-attention layer, and a feed-forward neural network. In standard Transformer architectures, the encoder–decoder structure is designed for autoregressive sequence generation. However, deterministic regression tasks, such as forecasting a single-step PM2.5, do not require sequential output generation. Accordingly, this study employs an encoder-only Transformer, which focuses on extracting stable and informative representations from the input sequence through multi-head self-attention and feed-forward transformations. Removing the decoder reduces computational complexity and avoids error accumulation associated with autoregressive decoding, resulting in a more efficient and robust modeling framework for environmental time-series prediction. The input meteorological and pollutant variables are projected into a latent representation, enhanced with positional encoding, and processed by multiple self-attention layers to capture intrinsic temporal dependencies and nonlinear interactions. The aggregated encoder output is then directly mapped to a scalar PM2.5 prediction through a regression head [41].

2.7. CNN–BiLSTM–Transformer

The CNN–BiLSTM–Transformer takes advantage of extracting local features and capturing both past and future context, Transformer encoder for capturing global, long-range dependencies. The hybrid CNN–BiLSTM–Transformer model can capture both local feature interactions and global temporal dependencies in daily PM2.5 prediction. The model first applies a stack of 1D convolutional layers to extract hierarchical local patterns across meteorological and pollutant variables. The convolutional outputs are then fed into a BiLSTM, enabling the integration of past and future temporal information, followed by a Transformer encoder that models long-range dependencies and contextual interactions. Positional encoding is applied to the Transformer inputs to retain sequence order, and the final representation is aggregated via adaptive average pooling and mapped to a single regression output through a fully connected head (Figure 1).
In this study, we adopt a modified Transformer design tailored for single-step atmospheric pollutant prediction. Specifically, we employ an encoder-only Transformer, omitting the decoder component since the task does not require autoregressive sequence generation. The input meteorological and pollutant variables are projected into a latent representation, enhanced with positional encoding, and processed by multiple self-attention layers to capture intrinsic temporal dependencies and nonlinear interactions. The aggregated encoder output is then directly mapped to a scalar PM2.5 prediction through a regression head. Unlike the original encoder–decoder Transformer, this regression-oriented configuration relies solely on the encoder block, which provides a computationally efficient representation learning process. The aggregated encoder output is subsequently passed through a regression head to generate the final PM2.5 prediction. This design allows the model to effectively exploit multi-scale temporal relationships without the overhead of a decoding module. This streamlined design improves robustness and enhances predictive performance while preserving the essential strengths of self-attention-based modeling.
We employ a one-day lag structure. The model uses daily meteorological and pollutant data from the previous day (t − 1) as input features to predict the daily average PM2.5 concentration for the current day (t). Therefore, all input variables are explicitly the one-day-lagged versions of the observed predictors. Within this explicit lag framework, the role of our CNN–BiLSTM–Transformer architecture is to implicitly learn the complex, nonlinear mapping from yesterday’s atmospheric state to today’s PM2.5 level. The model learns the interactions and relative importance among these lagged variables.

2.8. Shapley Additive Explanations (SHAP)

SHAP calculates the contribution of each feature to the model output, providing a fair interpretation for each data point. It not only reveals the impact of individual features on prediction, but also allows us to better understand the interactions between features.
To interpret the relative influence of meteorological and pollutant predictors on the hybrid CNN–BiLSTM–Transformer model, we use the SHAP framework. SHAP provides an additive and locally accurate decomposition of the model output:
f ( x ) = 0 + i = 1 d i  
where 0 represents the expected model prediction and i   denotes the marginal contribution of predictor i across all possible feature combinations. This game-theoretic formulation ensures consistency and enables transparent interpretation of nonlinear and high-capacity models commonly applied in PM2.5 predictions. Overall importance is summarized using the mean absolute SHAP value:
M e a n j = 1 m m 1 i , j
This interpretability approach offers a robust, model-agnostic assessment of how meteorological and emission-related variables shape the model’s PM2.5 predictions, strengthening the transparency and policy relevance of the analysis [51].

2.9. Evaluation Indices

To evaluate the accuracy of the deep learning and hybrid models, several performance indices are employed, including R, RMSE, MAPE, and MAE [52]. The calculation formulas are as follows:
R = i = 1 n ( x i x ¯ ) ( y i y ¯ ) i = 1 n ( x i x ¯ ) 2 i = 1 n ( y i y ¯ ) 2
M A E = 1 n i = 1 n | x i y i |
R M S E = 1 n i = 1 n ( x i y i ) 2
M A P E = 100 n i = 1 n x i y i x i
where x i represents the actual PM2.5 concentrations, y i represents the predicted PM2.5 concentrations, x and y are, respectively, the mean of the actual and forecasted values, and n is the total number of observations. The technical roadmap is shown in Figure 2.

3. Results

3.1. Correlation Between Input Variables and PM2.5

To establish deep learning prediction models, we analyze the relationship between the input predictors and PM2.5 (Table 1). The order of R for each predictor is PM10, CO, NO2, SO2, MINAT, MAT, MAXAT, and MWP. The correlation between PM2.5 and PM10 is the highest, followed by CO, NO2, and SO2. Among meteorological elements, PM2.5 has the highest correlation with MINAP, followed by MAT, MAXAT, and MWP. PM2.5 is positively correlated with MAP, MINAP, and MAXAP, and negatively correlated with other meteorological predictors. These results confirm the substantial impact of predictors on PM2.5. Therefore, 19 predictors are selected as input variables for deep learning and hybrid models.

3.2. Hyperparameter Optimization

The optimized hyperparameters of deep learning models mainly include learning rate, optimizer, batch size, and iteration cycle based on grid search (Table 2). The unit of hidden layers in deep learning models is 64. The number of iterations is 100, the learning rate is 0.0001, the CNN kernel size is 3 × 1, the batch size is 32, the convolutional filters are 64 and 96, the max pooling is 2 × 1, and the Adam optimizer is used for the deep learning models. The hyperparameters of the hybrid model are combinations of individual models. The hyperparameters of the CNN–BiLSTM–Transformer model are as follows: the kernel size of the convolutional layer is 3 × 1, the convolutional filters are 64 and 96, the max pooling is 2 × 1, the unit of the hidden layer in BiLSTM is 64, the learning rate is 0.0001, and the batch size is 32. The hidden layer dimension (embedding dimension) of the Transformer is 128, the number of heads with attention is 8, the number of encoder layers is 3, and the dropout rate is 0.1.

3.3. Performance Comparison of the Deep Learning and Hybrid Models

Table 3 shows the performance of the deep learning and hybrid models using the training, validation, and test datasets along with key statistical performance metrics, including R, MAE, MAPE, and RMSE. The hybrid CNN–BiLSTM–Transformer model for forecasting PM2.5 performs the best in all evaluation metrics. The hybrid CNN–BiLSTM–Transformer model is superior to single CNN, BiLSTM, and Transformer. Compared to other models, the CNN–BiLSTM–Transformer model had the lowest RMSE and MAE values.
For the training dataset, the hybrid CNN–BiLSTM–Transformer model showed high prediction accuracy with R (0.9832), indicating an almost perfect linear relationship between predicted PM2.5 values and actual PM2.5 values. The MAE is 4.6801 μg/m3, while the MAPE is 20.2955%, highlighting the robustness of the model. In addition, the RMSE of 6.5734 μg/m3 demonstrates the high accuracy of the hybrid CNN–BiLSTM–Transformer model during the training phase. However, the corresponding R values are, respectively, 0.9638 for CNN, 0.9627 for BiLSTM, and 0.9655 for Transformer. The corresponding RMSE values are, respectively, 10.8414 μg/m3 for CNN, 11.3422 μg/m3 for BiLSTM, and 10.4600 μg/m3 for Transformer.
For the validation dataset, the hybrid CNN–BiLSTM–Transformer model maintained relatively strong predictive performance, despite slightly higher error values compared to the training data. The R value is 0.9687, MAE is 5.6748 μg/m3, MAPE is 18.1750%, and RMSE is 8.1896 μg/m3. These values are very consistent with those observed in the validation set, providing evidence that the hybrid CNN–BiLSTM–Transformer model effectively generalizes unseen data during the validation process.
On the independent testing dataset, the hybrid CNN–BiLSTM–Transformer model maintained strong predictive performance, as expected by independent validation. The R value of the hybrid CNN–BiLSTM–Transformer model is 0.9743, while the MAE and MAPE are 4.0220 μg/m3 and 22.7791%, respectively, indicating a slightly lower prediction error compared to Transformer. Similarly, RMSE is observed at 5.4236 μg/m3, indicating a lower expansion of the error distribution compared to Transformer. The performance between the training, validation, and testing subsets is very close, indicating that the hybrid CNN–BiLSTM–Transformer model effectively generalizes, and high prediction accuracy is not the result of overfitting. However, the corresponding R values are, respectively, 0.9711 for CNN, 0.9709 for BiLSTM, and 0.9712 for Transformer. The corresponding RMSE values are, respectively, 5.9106 μg/m3 for CNN, 6.1696 μg/m3 for BiLSTM, and 5.6769 μg/m3 for Transformer during the predicting period.
Both CNN–BiLSTM–Transformer and Transformer models have achieved powerful forecast performance, with an R value exceeding 0.9. Compared with the Transformer, the CNN–BiLSTM–Transformer model achieves slightly higher R and shows slightly higher overall accuracy in MAPE. Its outstanding performance may be attributed to its robustness to noise and ability to capture nonlinear mappings. Although the Transformer model has slightly lower accuracy, it still has competitiveness. It effectively simulates the time structure in the data, which may be more advantageous in more complex environments. This comparison highlights the potential of data-driven time series models for PM2.5 predictions.
The simulated PM2.5 concentrations using deep learning and hybrid models in the training, validation, and test datasets are shown in Figure 3, Figure 4 and Figure 5. Although the performance metrics in Table 3 indicate only slight differences between the performance of all prediction models, Figure 3 depicts a more nuanced picture. All forecast models have successfully simulated the long-term temporal trend of PM2.5 concentrations. The scatter plot and line graph of the RNN model show high bias, with loosely distributed points underestimating PM2.5 values, indicating weak ability to capture nonlinear input–output relationships (Figure 3a, Figure 4a and Figure 5a). Similarly, ANN, BiLSTM, CNN, Transformer, and CNN–BiLSTM models also underestimated PM2.5 values (Figure 3b–e, Figure 4b–e and Figure 5b–e). In contrast, the scattering points of the CNN–BiLSTM–Transformer model are closest to the diagonal, and the observation points and simulation points almost coincide (Figure 3f, Figure 4f and Figure 5f). During the training, validation, and prediction stages, the CNN–BiLSTM–Transformer method outperforms other deep learning models in terms of simulation performance.
The ability of the hybrid CNN–BiLSTM–Transformer model has significantly improved in generating PM2.5 peaks, indicating that the hybrid CNN–BiLSTM–Transformer model helps alleviate the imbalance problem by increasing the extreme cases in the training set. This improvement indicates that the modification of the hybrid CNN–BiLSTM–Transformer model enables the model to better analyze and respond to complex patterns related to extreme PM2.5 pollution events. In summary, applying the hybrid CNN–BiLSTM–Transformer model helps the model make better predictions, especially for extreme PM2.5 pollution events, while also slightly improving the accuracy of normal events.
Specifically, the performance of the hybrid CNN–BiLSTM–Transformer is evaluated during heavy pollution in winter. Despite the significant fluctuations in PM2.5 during this period, the CNN–BiLSTM–Transformer is still able to capture the changing trend well, with only a slight underestimation at the peak concentration. This indicates that CNN–BiLSTM–Transformer has a certain predictive ability for extreme pollution events, but there is still room for improvement. The hybrid CNN–BiLSTM–Transformer model always provides accurate predictions. On the other hand, the RNN models seem somewhat inconsistent. Overall, the research results indicate that the hybrid CNN–BiLSTM–Transformer has the potential for practical deployment in air pollution prediction.

3.4. Interpretability Analysis

SHAP is utilized for feature importance analysis of the hybrid CNN–BiLSTM–Transformer model (Figure 6). PM10, CO, MAT, O3, SO2, and MRH are identified as the top six factors affecting PM2.5 concentrations. This result highlights their crucial role as the main factors in PM2.5 changes. After these key predictive factors, the effects of NO2, MINAP, and MINRH are moderate. EWV and precipitation are the least influential features in this model. The difference in the importance ranking of other features is very small. These results indicate that specific air pollutants (PM10, CO, O3, and SO2) are the main drivers of PM2.5 concentrations, highlighting the need for targeted emission reduction measures.

3.5. Visualizing Feature Contributions

Figure 7 shows the SHAP heatmap, with the y-axis representing various influencing features and the x-axis representing PM2.5 data samples during the prediction period. The features are ranked in descending order of global importance, with the top feature having the greatest overall impact on all predicted results. PM10, CO, and MAT are at the top, indicating that they are the most stable and important driving factors affecting PM2.5 predictions on all days.
The red color indicates that features increase the prediction of PM2.5 (positive contribution, SHAP value > 0). While the blue indicates that features decrease it (SHAP value < 0). Color intensity indicates the magnitude of influence. In spring, MINRH, MRH, MAXAP, and MWP are the main positive factors of PM2.5 concentrations. In summer, O3 and MAT have a positive impact on the prediction. In autumn, PM10, CO, and NO2 positively influence the prediction. In winter, MINRH and MRH are key factors that significantly contribute positively to PM2.5 concentrations. These findings emphasize the intermittent impact of meteorological parameters on the PM2.5 forecast.

4. Discussion

The hybrid interpretable CNN–BiLSTM–Transformer model represents an important advancement in PM2.5 air pollution prediction. Compared with the single model (CNN, BiLSTM, Transformer), the hybrid CNN–BiLSTM–Transformer model can more comprehensively capture the complex characteristics of PM2.5 changes. CNN–BiLSTM–Transformer can also better understand the time relationships of PM2.5 data. This multi-level understanding enables the model to adapt to different PM2.5 pollution scenarios, ranging from steady changes to drastic fluctuations.
The sensitivity of the model to key Transformer hyperparameters (number of heads, encoder layers, and embedding dimension) would help us better understand the robustness of the proposed architecture. We performed controlled experiments by varying the following hyperparameters while keeping other settings and datasets constant: number of heads (4, 8, and 16), encoder layers (2, 3, and 4), and embedding dimensions (64, 128, and 256) (Table 4). The optimal combination of 128 embedding dimensions, 8 attention heads, and 3 encoder layers achieved the best overall performance, including the highest R, reflecting stronger agreement between predictions and observations; the lowest RMSE and MAE, indicating smaller prediction deviations; and a relatively low MAPE, demonstrating good control over percentage error. The model is sensitive to hyperparameter selection, with intermediate values (e.g., 8 heads, 128 embedding dimensions) providing the best trade-off between expressiveness and generalization. Larger or deeper configurations do not necessarily improve performance and may even degrade it, suggesting that a moderate-sized Transformer is adequate for the complexity of this task. The architecture is robust around the optimal configuration (dim = 128, heads = 8, layers = 3), as slight variations did not lead to significant performance drops, implying a stable optimal region. Through systematic sensitivity experiments, we have demonstrated that the proposed Transformer architecture exhibits a clear dependence on key hyperparameters for PM2.5 prediction. The identified optimal configuration provides an effective balance between model capacity and generalization. This analysis not only validates the design choices but also offers practical guidance for hyperparameter selection in similar temporal prediction tasks.
To specifically evaluate model performance during extreme PM2.5 pollution events, we identified the top 10% of observed PM2.5 concentrations (exceeding the 90th percentile) within the independent test set as a “High PM2.5 Events” subset. The CNN–BiLSTM–Transformer model’s performance on this subset was compared against its overall performance on the entire test set (Table 5). The model retains a strong correlation (R = 0.9621) even for extreme PM2.5 events, demonstrating its fundamental capability to capture the primary drivers and temporal patterns of severe pollution episodes. As expected for higher magnitude predictions, the absolute errors (RMSE, MAE) increase for the high-concentration subset. This is a common characteristic in regression tasks. However, the more informative metric for management, the MAPE, shows a significant improvement, decreasing from 22.7791% to 11.5314%. The substantially lower MAPE for extreme PM2.5 events indicates that the model’s relative prediction accuracy is actually higher during critical high-pollution periods. This is a crucial strength for practical applications, as it provides reliable proportional estimates when PM2.5 concentrations are most hazardous, directly supporting the issuance of accurate health alerts and management interventions. This focused assessment confirms that the proposed model is not merely optimized for general performance but is particularly robust in quantifying high-concentration PM2.5 events. Its superior relative accuracy (MAPE) during extremes enhances its practical utility for forecasting systems aimed at mitigating public health risks during severe pollution episodes.
Meteorological factors and pollutants affect PM2.5 variations. The critical driving factors are PM10, CO, mean atmospheric temperature, O3, SO2, and mean relative humidity based on SHAP analysis in the study. During high temperatures, air temperature affects the diffusion of pollutants and their chemical reactivity, leading to the formation of secondary PM [53]. Rainfall removes PM from the air through wet deposition. However, the accumulation of PM2.5 may intensify in the absence of rainfall [54]. Through photochemical reactions, solar radiation has a crucial effect on atmospheric chemistry [55]. Longer sunshine hours increase the production of secondary pollutants, leading to the formation of PM2.5. Solar radiation accelerates the decomposition of precursor gases such as NO2 and VOC, forming O3. Ozone can interact with PM, leading to an increase in PM2.5 concentrations [55]. Relative humidity plays a dual role in the formation and persistence of PM2.5. Although lower humidity promotes the diffusion of pollutants and reduces the formation of particulate matter, extremely high humidity promotes the growth of aerosols, especially when combined with high levels of precursor gases such as SO2 and CO [54,56]. Air pressure plays a crucial role in atmospheric pollution by affecting the stability of the atmosphere [57]. High-pressure systems can enhance the diffusion ability of particulate matter, thereby reducing the concentration of particulate matter. Wind speed affects the transport and diffusion of pollutants. During low wind speeds, pollutants accumulate in the air, leading to an increase in PM2.5 concentration. However, as wind speed increases and affects diffusion, leading to a decrease in PM2.5 concentrations [56,58].
PM2.5 is a part of PM10, so changes in PM10 concentration usually directly affect PM2.5. PM10 and PM2.5 share common sources, such as coal-fired emissions and industrial processes, and coarse particulate matter can be converted into PM2.5 through physical and chemical processes [59]. CO can react with OH, increasing the formation of secondary pollutants of fine particulate matter [60]. SO2 can oxidize to form secondary particulate matter sulfates [61]. High levels of O3 promote the formation of SOA, leading to high levels of PM2.5. O3 is also a key oxidant in the air, affecting the formation of PM2.5 from precursor pollutants [54,62]. CO, SO2, and NO2 are mainly produced by the combustion of fossil fuels. Therefore, it is urgent to reduce the emissions of major pollutants (CO, SO2, and NO2). Pollution generated by transportation and factories can be reduced by implementing higher emission standards.
From a practical application perspective, the predictive model developed in this study can provide a powerful tool for air quality management in Qingdao City. The environmental protection department can use CNN–BiLSTM–Transformer to predict PM2.5 concentration one day in advance, providing decision support for formulating pollution control measures. In addition, accurate PM2.5 prediction can also be used for public health protection, especially for providing timely health advice to sensitive populations.
This study has some limitations. Firstly, only meteorological factors and historical air pollutant data were considered, and other important factors such as emission source data and regional transmission effects were not included. Secondly, there is still room for improvement in the predictive accuracy of the model in extreme pollution events.

5. Conclusions

This study developed a novel hybrid SHAP explainable CNN–BiLSTM–Transformer model for predicting daily PM2.5 concentrations in Qingdao City. With the help of SHAP values, the contribution of input features is quantified, and the “black box” of the hybrid model is interpretable. The hybrid CNN–BiLSTM–Transformer model shows superior performance in PM2.5 prediction tasks, significantly superior to single models (RNN, ANN, CNN, BiLSTM, Transformer), with lower error values. The model shows stable predictive ability in different seasons, especially in autumn and winter, demonstrating its adaptability to different meteorological conditions. SHAP-based interpretability analysis identified the most contributors to PM2.5 levels, including PM10, CO, MAT, O3, SO2, and MRH, providing new insights into understanding the mechanisms of PM2.5 changes. The SHAP method is adopted to improve the interpretability of deep learning, providing a powerful framework for data-driven air pollution prediction. This hybrid CNN–BiLSTM–Transformer model has potential applications in actual air quality management, providing a scientific basis for pollution control and public health protection.
In the future, we can apply this hybrid interpretable model to other places to improve its practicality and applicability in different urban environments. We will integrate more data sources, such as emission inventories and satellite remote sensing data. Considering the complexity of chemical and physical processes of pollutants, we hope to achieve a nonlinear representation of data through deep learning methods and explore the complex spatiotemporal relationships between pollutant emissions. We will also explore more advanced self-attention mechanisms and dynamic feature selection methods, optimize model structures, and further improve PM2.5 prediction accuracy, especially in extreme pollution events.

Author Contributions

Z.H.: formal analysis, methodology, investigation, writing—original draft preparation, writing—review and editing. Q.G.: data curation, conceptualization, project administration, formal analysis, writing—original draft preparation. Z.Z.: conceptualization, data curation, formal analysis. G.F.: conceptualization, formal analysis. S.Q.: conceptualization, formal analysis. Z.W.: conceptualization, formal analysis. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by Shandong Provincial Natural Science Foundation (Grant No. ZR2023MD075), State Key Laboratory of Loess Science Foundation (Grant No. SKLLQG2419), Liaocheng University Undergraduate Innovation and Entrepreneurship Training Program (cxcy2025035), and the National Natural Science Foundation of China (Grant No. 42575053).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The raw data supporting the conclusions of this article will be made available by the authors on request.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Zhu, Z.; Zhang, X.; Dong, H.; Wang, S.; Reis, S.; Li, Y.; Gu, B. Integrated livestock sector nitrogen pollution abatement measures could generate net benefits for human and ecosystem health in China. Nat. Food 2022, 3, 161–168. [Google Scholar] [CrossRef]
  2. Jia, H.; Quaas, J. Nonlinearity of the cloud response postpones climate penalty of mitigating air pollution in polluted regions. Nat. Clim. Change 2023, 13, 943–950. [Google Scholar] [CrossRef]
  3. Ji, J.S.; Dominici, F.; Gouveia, N.; Kelly, F.J.; Neira, M. Air pollution interventions for health. Nat. Med. 2025, 31, 2888–2900. [Google Scholar] [CrossRef] [PubMed]
  4. Cai, Y.; Wang, J.; An, Y.; Dang, Y.; Ye, L. Research on accumulative time-delay effects between economic development and air pollution based on a novel grey relational analysis model. J. Clean. Prod. 2025, 497, 145128. [Google Scholar] [CrossRef]
  5. Liu, H.; Lei, J.; Liu, Y.; Zhu, T.; Chan, K.; Chen, X.; Wei, J.; Deng, F.; Li, G.; Jiang, Y.; et al. Hospital admissions attributable to reduced air pollution due to clean-air policies in China. Nat. Med. 2025, 31, 1688–1697. [Google Scholar] [CrossRef] [PubMed]
  6. Weber, E.; Daioglou, V.; Vreedenburgh, L.; Doelman, J.; Downward, G.; Matias de Pinho, M.G.; van Vuuren, D. Modelling PM2.5 reduction scenarios for future cardiopulmonary disease reduction. Nat. Sustain. 2025. [Google Scholar] [CrossRef]
  7. Zheng, H.; Wu, D.; Wang, S.; Li, X.; Jin, L.N.; Zhao, B.; Li, S.; Sun, Y.; Dong, Z.; Wu, Q.; et al. Control of toxicity of fine particulate matter emissions in China. Nature 2025, 643, 404–411. [Google Scholar] [CrossRef]
  8. Wang, Y.; Koutrakis, P.; Michanikou, A.; Kouis, P.; Panayiotou, A.G.; Kinni, P.; Tymvios, F.; Chrysanthou, A.; Neophytou, M.; Mouzourides, P.; et al. Indoor residential and outdoor sources of PM2.5 and PM10 in Nicosia, Cyprus. Air Qual. Atmos. Health 2024, 17, 485–499. [Google Scholar] [CrossRef]
  9. Li, Y.; Lin, B.; Hao, D.; Du, Z.; Wang, Q.; Song, Z.; Li, X.; Li, K.; Wang, J.; Zhang, Q.; et al. Short-term PM2.5 exposure induces transient lung injury and repair. J. Hazard. Mater. 2023, 459, 132227. [Google Scholar] [CrossRef]
  10. Yue, H.; He, C.; Huang, Q.; Zhang, D.; Shi, P.; Moallemi, E.A.; Xu, F.; Yang, Y.; Qi, X.; Ma, Q.; et al. Substantially reducing global PM2.5-related deaths under SDG3.9 requires better air pollution control and healthcare. Nat. Commun. 2024, 15, 2729. [Google Scholar] [CrossRef]
  11. Zaini, N.a.; Ean, L.W.; Ahmed, A.N.; Abdul Malek, M.; Chow, M.F. PM2.5 forecasting for an urban area based on deep learning and decomposition method. Sci. Rep. 2022, 12, 17565. [Google Scholar] [CrossRef]
  12. Zhao, N.; Wang, G.; Li, G.; Lang, J. Trends in Air Pollutant Concentrations and the Impact of Meteorology in Shandong Province, Coastal China, during 2013–2019. Aerosol Air Qual. Res. 2021, 21, 200545. [Google Scholar] [CrossRef]
  13. Chen, Z.; Chen, D.; Zhao, C.; Kwan, M.-p.; Cai, J.; Zhuang, Y.; Zhao, B.; Wang, X.; Chen, B.; Yang, J.; et al. Influence of meteorological conditions on PM2.5 concentrations across China: A review of methodology and mechanism. Environ. Int. 2020, 139, 105558. [Google Scholar] [CrossRef] [PubMed]
  14. Qiao, S.; Guo, Q.; He, Z.; Feng, G.; Wang, Z.; Li, X. Spatiotemporal Trends and Drivers of PM2.5 Concentrations in Shandong Province from 2014 to 2023 Under Socioeconomic Transition. Toxics 2025, 13, 978. [Google Scholar] [CrossRef] [PubMed]
  15. Cordova, C.H.; Portocarrero, M.N.L.; Salas, R.; Torres, R.; Rodrigues, P.C.; López-Gonzales, J.L. Air quality assessment and pollution forecasting using artificial neural networks in Metropolitan Lima—Peru. Sci. Rep. 2021, 11, 24232. [Google Scholar] [CrossRef]
  16. Zhou, J.; Zhou, L.; Cai, C.; Zhao, Y. Multi-step ozone concentration prediction model based on improved secondary decomposition and adaptive kernel density estimation. Process Saf. Environ. Prot. 2024, 190, 386–404. [Google Scholar] [CrossRef]
  17. Zheng, C.; Tao, Y.; Zhang, J.; Xun, L.; Li, T.; Yan, Q. TISE-LSTM: A LSTM model for precipitation nowcasting with temporal interactions and spatial extract blocks. Neurocomputing 2024, 590, 127700. [Google Scholar] [CrossRef]
  18. Zhang, J.; Yin, M.; Wang, P.; Gao, Z. A Method Based on Deep Learning for Severe Convective Weather Forecast: CNN-BiLSTM-AM (Version 1.0). Atmosphere 2024, 15, 1229. [Google Scholar] [CrossRef]
  19. Zhang, B.; Chen, W.; Li, M.-Z.; Guo, X.; Zheng, Z.; Yang, R. MGAtt-LSTM: A multi-scale spatial correlation prediction model of PM2.5 concentration based on multi-graph attention. Environ. Model. Softw. 2024, 179, 106095. [Google Scholar] [CrossRef]
  20. Goudarzi, G.; Hopke, P.K.; Yazdani, M. Forecasting PM2.5 concentration using artificial neural network and its health effects in Ahvaz, Iran. Chemosphere 2021, 283, 131285. [Google Scholar] [CrossRef]
  21. Liu, Z.; Hu, Y.; Fang, Z.; Xiong, S.; Wang, L.; Bao, C. Improved prediction model for daily PM2.5 concentrations with particle swarm optimization and BP neural network. Sci. Rep. 2025, 15, 32050. [Google Scholar] [CrossRef]
  22. Chang-Hoi, H.; Park, I.; Oh, H.-R.; Gim, H.-J.; Hur, S.-K.; Kim, J.; Choi, D.-R. Development of a PM2.5 prediction model using a recurrent neural network algorithm for the Seoul metropolitan area, Republic of Korea. Atmos. Environ. 2021, 245, 118021. [Google Scholar] [CrossRef]
  23. Zulqarnain, M.; Ghazali, R.; Shah, H.; Ismail, L.H.; Alsheddy, A.; Mahmud, M. A Deep Two-State Gated Recurrent Unit for Particulate Matter (PM2.5) Concentration Forecasting. Comput. Mater. Contin. 2021, 71, 3051–3068. [Google Scholar] [CrossRef]
  24. Alawi, O.A.; Kamar, H.M.; Alsuwaiyan, A.; Yaseen, Z.M. Temporal trends and predictive modeling of air pollutants in Delhi: A comparative study of artificial intelligence models. Sci. Rep. 2024, 14, 30957. [Google Scholar] [CrossRef] [PubMed]
  25. Balaraman, S.; Pachaivannan, P.; Elamparithi, P.N.; Manimozhi, S. Application of LSTM models in predicting particulate matter (PM2.5) levels for urban area. J. Eng. Res. 2022, 10, 71–90. [Google Scholar] [CrossRef]
  26. He, J.; Zhang, S.; Yu, M.; Liang, Q.; Cao, M.; Xu, H.; Liu, Z.; Liu, J. Predicting indoor PM2.5 levels in shared office using LSTM method. J. Build. Eng. 2025, 104, 112407. [Google Scholar] [CrossRef]
  27. Miao, L.; Tang, S.; Ren, Y.; Kwan, M.-P.; Zhang, K. Estimation of daily ground-level PM2.5 concentrations over the Pearl River Delta using 1 km resolution MODIS AOD based on multi-feature BiLSTM. Atmos. Environ. 2022, 290, 119362. [Google Scholar] [CrossRef]
  28. Karnati, H.; Soma, A.; Alam, A.; Kalaavathi, B. Comprehensive analysis of various imputation and forecasting models for predicting PM2.5 pollutant in Delhi. Neural Comput. Appl. 2025, 37, 11441–11458. [Google Scholar] [CrossRef]
  29. Xia, S.; Zhang, R.; Zhang, L.; Wang, T.; Wang, W. Multi-dimensional distribution prediction of PM2.5 concentration in urban residential areas based on CNN. Build. Environ. 2025, 267, 112167. [Google Scholar] [CrossRef]
  30. Liu, Z.; Zheng, K.; Bao, S.; Cui, Y.; Yuan, Y.; Ge, C.; Zhang, Y. Estimating the spatiotemporal distribution of PM2.5 concentrations in Tianjin during the Chinese Spring Festival: Impact of fireworks ban. Environ. Pollut. 2024, 361, 124899. [Google Scholar] [CrossRef]
  31. Su, L.; Zuo, X.; Li, R.; Wang, X.; Zhao, H.; Huang, B. A systematic review for transformer-based long-term series forecasting. Artif. Intell. Rev. 2025, 58, 80. [Google Scholar] [CrossRef]
  32. Zhang, Z.; Zhang, S. Modeling air quality PM2.5 forecasting using deep sparse attention-based transformer networks. Int. J. Environ. Sci. Technol. 2023, 20, 13535–13550. [Google Scholar] [CrossRef]
  33. Ding, W.; Sun, H. Prediction of PM2.5 concentration based on the weighted RF-LSTM model. Earth Sci. Inform. 2023, 16, 3023–3037. [Google Scholar] [CrossRef]
  34. Xie, X.; Wang, Z.; Xu, M.; Xu, N. Daily PM2.5 concentration prediction based on variational modal decomposition and deep learning for multi-site temporal and spatial fusion of meteorological factors. Environ. Monit. Assess. 2024, 196, 859. [Google Scholar] [CrossRef]
  35. Pranolo, A.; Zhou, X.; Mao, Y. A novel bifold-attention-LSTM for analyzing PM2.5 concentration-based multi-station data time series. Int. J. Data Sci. Anal. 2025, 20, 3337–3354. [Google Scholar] [CrossRef]
  36. Zhu, M.; Xie, J. Investigation of nearby monitoring station for hourly PM2.5 forecasting using parallel multi-input 1D-CNN-biLSTM. Expert Syst. Appl. 2023, 211, 118707. [Google Scholar] [CrossRef]
  37. Lei, K.; Wang, M.; Wang, M.; Liu, Q.; Zhang, F.; Xing, M.; Wu, W.; Jiang, F.; Guo, X.; Han, Q.; et al. SHAP explainable PSO-CNN-BiLSTM for 6-hour prediction analysis of urban PM2.5 and O3 concentrations. Atmos. Pollut. Res. 2025, 16, 102705. [Google Scholar] [CrossRef]
  38. Kumar, S.; Kumar, V. Multi-view Stacked CNN-BiLSTM (MvS CNN-BiLSTM) for urban PM2.5 concentration prediction of India’s polluted cities. J. Clean. Prod. 2024, 444, 141259. [Google Scholar] [CrossRef]
  39. He, J.; Li, X.; Chen, Z.; Mai, W.; Zhang, C.; Wan, X.; Wang, X.; Huang, M. A hybrid CLSTM-GPR model for forecasting particulate matter (PM2.5). Atmos. Pollut. Res. 2023, 14, 101832. [Google Scholar] [CrossRef]
  40. Sreenivasulu, T.; Rayalu, G.M. Enhanced PM2.5 prediction in Delhi using a novel optimized STL-CNN-BILSTM-AM hybrid model. Asian J. Atmos. Environ. 2024, 18, 25. [Google Scholar] [CrossRef]
  41. Cui, B.; Liu, M.; Li, S.; Jin, Z.; Zeng, Y.; Lin, X. Deep learning methods for atmospheric PM2.5 prediction: A comparative study of transformer and CNN-LSTM-attention. Atmos. Pollut. Res. 2023, 14, 101833. [Google Scholar] [CrossRef]
  42. Chen, D.; Liu, H. A new method for predicting PM2.5 concentrations in subway stations based on a multiscale adaptive noise reduction transformer -BiGRU model and an error correction method. J. Infrastruct. Intell. Resil. 2025, 4, 100128. [Google Scholar] [CrossRef]
  43. Wang, Y.-Z.; He, H.-D.; Huang, H.-C.; Yang, J.-M.; Peng, Z.-R. High-resolution spatiotemporal prediction of PM2.5 concentration based on mobile monitoring and deep learning. Environ. Pollut. 2025, 364, 125342. [Google Scholar] [CrossRef] [PubMed]
  44. Malakouti, S.M. From accurate to actionable: Interpretable PM2.5 forecasting with feature engineering and SHAP for the Liverpool–Wirral region. Environ. Chall. 2025, 21, 101290. [Google Scholar] [CrossRef]
  45. Wei, Q.; Chen, Y.; Zhang, H.; Jia, Z.; Yang, J.; Niu, B. Simulation and prediction of PM2.5 concentrations and analysis of driving factors using interpretable tree-based models in Shanghai, China. Environ. Res. 2025, 270, 121003. [Google Scholar] [CrossRef]
  46. Khoshraftar, Z. Modeling of CO2 solubility and partial pressure in blended diisopropanolamine and 2-amino-2-methylpropanol solutions via response surface methodology and artificial neural network. Sci. Rep. 2025, 15, 1800. [Google Scholar] [CrossRef] [PubMed]
  47. Malin, M.; Okkonen, J.; Suutala, J. Snow water equivalent forecasting in sub-arctic and arctic regions: Efficient recurrent neural networks approach. Environ. Model. Softw. 2025, 194, 106695. [Google Scholar] [CrossRef]
  48. Zhou, F.; Liu, X.; Jia, C.; Li, S.; Tian, J.; Zhou, W.; Wu, C. Unified CNN-LSTM for keyhole status prediction in PAW based on spatial-temporal features. Expert Syst. Appl. 2024, 237, 121425. [Google Scholar] [CrossRef]
  49. Lin, G.; Zhao, H.; Chi, Y. A comprehensive evaluation of deep learning approaches for ground-level ozone prediction across different regions. Ecol. Inform. 2025, 86, 103024. [Google Scholar] [CrossRef]
  50. Hussan, U.; Wang, H.; Peng, J.; Jiang, H.; Rasheed, H. Transformer-based renewable energy forecasting: A comprehensive review. Renew. Sustain. Energy Rev. 2026, 226, 116356. [Google Scholar] [CrossRef]
  51. Mvita, M.J.; Zulu, N.G.; Thethwayo, B. Artificial neural network integrated SHapley Additive exPlanations modeling for sodium dichromate formation. Eng. Appl. Artif. Intell. 2025, 158, 111457. [Google Scholar] [CrossRef]
  52. Bose, A.; Chowdhury, I.R. Towards cleaner air in Siliguri: A comprehensive study of PM2.5 and PM10 through advance computational forecasting models for effective environmental interventions. Atmos. Pollut. Res. 2024, 15, 101976. [Google Scholar] [CrossRef]
  53. Zhang, Y.; Sun, Q.; Liu, J.; Petrosian, O. Long-Term Forecasting of Air Pollution Particulate Matter (PM2.5) and Analysis of Influencing Factors. Sustainability 2024, 16, 19. [Google Scholar] [CrossRef]
  54. Masood, A.; Ahmad, K. Data-driven predictive modeling of PM2.5 concentrations using machine learning and deep learning techniques: A case study of Delhi, India. Environ. Monit. Assess. 2022, 195, 60. [Google Scholar] [CrossRef]
  55. Deng, C.; Qin, C.; Li, Z.; Li, K. Spatiotemporal variations of PM2.5 pollution and its dynamic relationships with meteorological conditions in Beijing-Tianjin-Hebei region. Chemosphere 2022, 301, 134640. [Google Scholar] [CrossRef] [PubMed]
  56. Meng, C.; Cheng, T.; Gu, X.; Shi, S.; Wang, W.; Wu, Y.; Bao, F. Contribution of meteorological factors to particulate pollution during winters in Beijing. Sci. Total Environ. 2019, 656, 977–985. [Google Scholar] [CrossRef] [PubMed]
  57. Ning, G.; Wang, S.; Yim, S.H.L.; Li, J.; Hu, Y.; Shang, Z.; Wang, J.; Wang, J. Impact of low-pressure systems on winter heavy air pollution in the northwest Sichuan Basin, China. Atmos. Chem. Phys. 2018, 18, 13601–13615. [Google Scholar] [CrossRef]
  58. Li, Z.; Di, Z.; Chang, M.; Zheng, J.; Tanaka, T.; Kuroi, K. Study on the influencing factors on indoor PM2.5 of office buildings in beijing based on statistical and machine learning methods. J. Build. Eng. 2023, 66, 105240. [Google Scholar] [CrossRef]
  59. Colangeli, C.; Palermi, S.; Bianco, S.; Aruffo, E.; Chiacchiaretta, P.; Di Carlo, P. The Relationship between PM2.5 and PM10 in Central Italy: Application of Machine Learning Model to Segregate Anthropogenic from Natural Sources. Atmosphere 2022, 13, 484. [Google Scholar] [CrossRef]
  60. Metya, A.; Dagupta, P.; Halder, S.; Chakraborty, S.; Tiwari, Y.K. COVID-19 Lockdowns Improve Air Quality in the South-East Asian Regions, as Seen by the Remote Sensing Satellites. Aerosol Air Qual. Res. 2020, 20, 1772–1782. [Google Scholar] [CrossRef]
  61. Wang, Y.; Ge, Q. The positive impact of the Omicron pandemic lockdown on air quality and human health in cities around Shanghai. Environ. Dev. Sustain. 2024, 26, 8791–8816. [Google Scholar] [CrossRef] [PubMed]
  62. Zhang, N.; Guan, Y.; Jiang, Y.; Zhang, X.; Ding, D.; Wang, S. Regional demarcation of synergistic control for PM2.5 and ozone pollution in China based on long-term and massive data mining. Sci. Total Environ. 2022, 838, 155975. [Google Scholar] [CrossRef] [PubMed]
Figure 1. The structural diagram of CNN–BiLSTM–Transformer.
Figure 1. The structural diagram of CNN–BiLSTM–Transformer.
Toxics 14 00044 g001
Figure 2. The technical roadmap.
Figure 2. The technical roadmap.
Toxics 14 00044 g002
Figure 3. Scatter plot of the simulation results for different models during the training period. (a) the simulated results of RNN, (b) the simulated results of ANN, (c) the simulated results of BiLSTM, (d) the simulated results of CNN, (e) the simulated results of Transformer, (f) the simulated results of CNN-BiLSTM-Transformer.
Figure 3. Scatter plot of the simulation results for different models during the training period. (a) the simulated results of RNN, (b) the simulated results of ANN, (c) the simulated results of BiLSTM, (d) the simulated results of CNN, (e) the simulated results of Transformer, (f) the simulated results of CNN-BiLSTM-Transformer.
Toxics 14 00044 g003
Figure 4. The simulation results for different models during the verification period. (a) the simulated results of RNN, (b) the simulated results of ANN, (c) the simulated results of BiLSTM, (d) the simulated results of CNN, (e) the simulated results of Transformer, (f) the simulated results of CNN-BiLSTM-Transformer.
Figure 4. The simulation results for different models during the verification period. (a) the simulated results of RNN, (b) the simulated results of ANN, (c) the simulated results of BiLSTM, (d) the simulated results of CNN, (e) the simulated results of Transformer, (f) the simulated results of CNN-BiLSTM-Transformer.
Toxics 14 00044 g004
Figure 5. The forecasted results for different models during the testing period. (a) the forecasting results of RNN, (b) the forecasting results of ANN, (c) the forecasting results of BiLSTM, (d) the forecasting results of CNN, (e) the forecasting results of Transformer, (f) the forecasting results of CNN-BiLSTM-Transformer.
Figure 5. The forecasted results for different models during the testing period. (a) the forecasting results of RNN, (b) the forecasting results of ANN, (c) the forecasting results of BiLSTM, (d) the forecasting results of CNN, (e) the forecasting results of Transformer, (f) the forecasting results of CNN-BiLSTM-Transformer.
Toxics 14 00044 g005
Figure 6. Feature importance analysis based on the hybrid CNN–BiLSTM–Transformer model. (a) SHAP summary plots, (b) SHAP mean importance plots.
Figure 6. Feature importance analysis based on the hybrid CNN–BiLSTM–Transformer model. (a) SHAP summary plots, (b) SHAP mean importance plots.
Toxics 14 00044 g006
Figure 7. SHAP heatmap of the hybrid CNN–BiLSTM–Transformer model.
Figure 7. SHAP heatmap of the hybrid CNN–BiLSTM–Transformer model.
Toxics 14 00044 g007
Table 1. Correlation between PM2.5 and influencing factors.
Table 1. Correlation between PM2.5 and influencing factors.
Influence FactorAbbreviationR
PM10PM100.8995
SO2SO20.6110
COCO0.7074
NO2NO20.6692
O3O3−0.1478
precipitationP−0.1423
mean atmospheric pressureMAP0.2893
extreme wind velocityEWV−0.1459
mean atmospheric temperatureMAT−0.3860
mean wind velocityMWV−0.0963
mean relative humidityMRH−0.0597
mean water pressureMWP−0.3591
minimum APMINAP0.2909
sunshine hoursSH−0.1180
maximum APMAXAP0.2927
minimum ATMINAT−0.3934
maximum WVMAXWV−0.0953
maximum ATMAXAT−0.3672
minimum RHMINRH−0.1139
Minimum AP represents minimum atmospheric pressure, maximum AP represents maximum atmospheric pressure, minimum AT represents minimum atmospheric temperature, maximum WV represents maximum wind velocity, maximum AT represents maximum atmospheric temperature, and minimum RH represents minimum relative humidity.
Table 2. The hyperparameters of the models.
Table 2. The hyperparameters of the models.
HyperparametersANNRNNBiLSTMCNNTransformer
Units in HL646464 64
Activation functionLogsig-purelinTanh-sigmoidTanh-sigmoidReluGelu
Learning rate0.00010.00010.00010.00010.0001
Batch size3232323232
Epochs100100100100100
OptimizerTrainbrAdamAdamAdamAdam
Kernel size 3
Max-pooling 2
Convolution filters 64-96
Table 3. The four evaluation indicators for the models.
Table 3. The four evaluation indicators for the models.
ModelsRRMSE (μg/m3)MAE (μg/m3)MAPE (%)
TrainingVerificationPredictingTrainingVerificationPredictingTrainingVerificationPredictingTrainingVerificationPredicting
RNN0.96120.95440.967411.942810.85936.30368.10277.56174.658625.456921.882434.2459
ANN0.96170.96290.968011.862710.29986.27238.04257.34554.652924.766821.532033.4613
BiLSTM0.96270.96610.970911.34229.85376.16967.62747.06214.641423.625120.981432.3479
CNN0.96380.96770.971110.84149.03715.91067.49286.41724.585622.897520.957931.5879
Transformer0.96550.96840.971210.46008.19925.67697.06815.68034.491521.183019.175631.3887
CNN–BiLSTM–Transformer0.98320.96870.97436.57348.18965.42364.68015.67484.022020.295518.175022.7791
Table 4. The sensitivity of the CNN–BiLSTM–Transformer model to key Transformer hyperparameters.
Table 4. The sensitivity of the CNN–BiLSTM–Transformer model to key Transformer hyperparameters.
Embedding DimensionNumber of HeadsEncoder LayersRRMSE (μg/m3)MAE (μg/m3)MAPE (%)
64420.96848.76546.208224.3989
64430.96339.38976.414724.1623
64440.96299.34566.329624.3151
64820.96808.67456.139224.4983
64830.96709.53906.769324.2755
64840.96409.18356.650324.9770
641620.96628.89256.280924.7377
641630.96209.12326.208224.8126
641640.96828.87146.367024.9837
128420.97048.45335.743523.7833
128430.97307.82225.471023.7285
128440.96568.85936.163223.8092
128820.96818.47016.063923.3543
128830.97435.42364.022022.7791
128840.96658.40095.424023.3543
1281620.96898.41776.028323.6519
1281630.97257.90305.562323.9408
1281640.97058.15685.696023.0055
256420.96778.29285.902423.6324
256430.97407.50755.263923.1160
256440.97427.83695.744723.8093
256820.97008.45636.140123.8292
256830.97108.41145.859223.8874
256840.97038.10295.700823.7381
2561620.97308.34255.999823.6920
2561630.96778.21035.412423.0151
2561640.97098.15705.710523.7410
Table 5. The model performance for predicting extreme PM2.5 events during the prediction period.
Table 5. The model performance for predicting extreme PM2.5 events during the prediction period.
SubsetRRMSE (μg/m3)MAE (μg/m3)MAPE (%)
Overall test set0.9743 5.42364.022022.7791
High PM2.5 events0.96219.82158.592311.5314
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

He, Z.; Guo, Q.; Zhang, Z.; Feng, G.; Qiao, S.; Wang, Z. Forecasting Daily Ambient PM2.5 Concentrations in Qingdao City Using Deep Learning and Hybrid Interpretable Models and Analysis of Driving Factors Using SHAP. Toxics 2026, 14, 44. https://doi.org/10.3390/toxics14010044

AMA Style

He Z, Guo Q, Zhang Z, Feng G, Qiao S, Wang Z. Forecasting Daily Ambient PM2.5 Concentrations in Qingdao City Using Deep Learning and Hybrid Interpretable Models and Analysis of Driving Factors Using SHAP. Toxics. 2026; 14(1):44. https://doi.org/10.3390/toxics14010044

Chicago/Turabian Style

He, Zhenfang, Qingchun Guo, Zuhan Zhang, Genyue Feng, Shuaisen Qiao, and Zhaosheng Wang. 2026. "Forecasting Daily Ambient PM2.5 Concentrations in Qingdao City Using Deep Learning and Hybrid Interpretable Models and Analysis of Driving Factors Using SHAP" Toxics 14, no. 1: 44. https://doi.org/10.3390/toxics14010044

APA Style

He, Z., Guo, Q., Zhang, Z., Feng, G., Qiao, S., & Wang, Z. (2026). Forecasting Daily Ambient PM2.5 Concentrations in Qingdao City Using Deep Learning and Hybrid Interpretable Models and Analysis of Driving Factors Using SHAP. Toxics, 14(1), 44. https://doi.org/10.3390/toxics14010044

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop