Air Quality Index Prediction Based on Transformer Encoder–CNN–BiLSTM Model
Abstract
1. Introduction
2. Theory of Transformer Encoder–CNN–BiLSTM Hybrid Model
2.1. Transformer Encoder
2.2. Convolutional Neural Network
2.3. Long Short-Term Memory and Bidirectional Long Short-Term Memory
2.4. Proposed Model
- 1.
- Data Preprocessing Layer: Data preprocessing includes outlier detection, missing value filling, data set segmentation and data normalization, which will be introduced in Section 3.
- 2.
- Transformer Encoder Layer:
- Multi-head Attention: In order for the model to simultaneously learn information from different subspaces, the Transformer encoder utilizes a multi-head attention mechanism. This mechanism enables the model to carry out self-attention operations in parallel, where each attention head focuses on a distinct representation subspace of the input data, and the resulting information is then combined.
- Layer Normalization and Feed-forward Networks: After the self-attention layer, the model’s training process is stabilized through layer normalization, followed by a feed-forward network that further processes the representation at each time point. The above operations not only allow Transformer encoder to capture complex time dependencies, but also extract deeper features through nonlinear transformations to enhance model expressiveness.
- 3.
- CNN Layer:
- Convolutional Layers: Convolutional layers slide over the input data with a convolutional kernel, calculating the dot product between the kernel and the local data, thus generating feature maps. This helps the model capture local features within the data.
- Pooling Layers: Pooling layers down-sample feature maps through max or average pooling to reduce spatial dimensions while preserving key information.
- 4.
- BiLSTM Layer: The core of BiLSTM is to use two separate LSTM units to process time series data: one for forward sequences (from the past to the future) and the other for reverse sequences (from the future to the past). The final output is the combination of the outputs from two LSTM units, which allows the network to consider both past and future information. By fusing forward and backward information, BiLSTM is able to provide a comprehensive view of the information at each time point. This allows the model to predict AQI more accurately through cyclical and seasonal trends.
- 5.
- Output Layer: Finally, a weighted sum is performed on the fully connected layer to obtain a final AQI value.
3. Experiment Preparation
3.1. Experiment Environment
3.2. Data Preparation and Dataset Division
3.3. Data Processing
3.3.1. Outlier Detection
3.3.2. Data Normalization
3.4. Evaluation Metrics
- 1.
- Mean Absolute Error (MAE): MAE directly gives the mean magnitude of the deviation between predicted values and actual values. Its mathematical definition is given as follows:where n denotes the number of observations, represents the true value of the i-th observation, and denotes the corresponding predicted value.
- 2.
- Root Mean Square Error (RMSE): RMSE assesses the mean of prediction error, with units matching those of the observed variable, which is given as follows:
- 3.
- Coefficient of Determination (): evaluates the goodness of predicted values to observed values in the regression model, with values ranging from 0 to 1. Its formula is given as follows:where is the average value of the true value.
4. Experiment Result and Analysis
4.1. Single Model Prediction
4.2. Hybrid Model Prediction
- 1.
- Hierarchical feature abstraction: This model integrates the strengths of three architectures—Transformer encoder, BiLSTM and CNN—allowing for the abstraction and fusion of multiple levels of features in stages. The Transformer can capture the long-range dependencies among different features and assign different weights to each input sequence through a self-attention mechanism to extract global contextual information. The CNN enhances the model’s ability to identify local features of a time series. The BiLSTM can consider past and future information in time series. Due to the temporal continuity of environmental data, BiLSTM is capable of capturing the influence of preceding and subsequent associations on AQI. This stepwise refined feature extraction helps the model predict AQI more accurately.
- 2.
- Model ensemble advantage: Hybrid models can address the limitations of single models that exhibit excessive bias towards a particular aspect during prediction. Additionally, single models are susceptible to overfitting when applied to specific data sets, while hybrid models mitigate this risk by employing ensemble learning, thereby enhancing their robustness and generalization capabilities for novel data.
4.3. Generalization Experiment
4.4. Comparison of Activation Functions
4.5. Model Robustness Verification
5. Conclusions
- 1.
- To further account for the impact of natural disasters on AQI, future work may incorporate natural disaster prediction into the model to enable more accurate assessment of uncontrollable factors.
- 2.
- Future studies should include data from cities around Shanghai to enhance air quality prediction by considering spatiotemporal factors.
- 3.
- Future studies will integrate multi-source data, such as meteorological factors (wind speed, temperature, humidity), into the model to construct a comprehensive feature input, thus further improving the AQI prediction accuracy.
Author Contributions
Funding
Institutional Review Board Statement
Data Availability Statement
Conflicts of Interest
Abbreviations
| AQI | Air Quality Index |
| ADD | Addition |
| BiLSTM | Bidirectional Long Short-Term Memory |
| CNN | Convolutional Neural Network |
| LSTM | Long Short-Term Memory |
| MAE | Mean Absolute Error |
| RMSE | Root Mean Square Error |
| RNN | Recurrent Neural Network |
| TCN | Temporal Convolutional Network |
| ReLU | Rectified Linear Unit |
| LeakyReLU | Leaky Rectified Linear Unit |
| GELU | Gaussian Error Linear Unit |
| Swish | Swish activation function |
References
- Goudarzi, G.; Shirmardi, M.; Naimabadi, A.; Ghadiri, A.; Sajedifar, J. Chemical and organic characteristics of PM2.5 particles and their in-vitro cytotoxic effects on lung cells: The Middle East dust storms in Ahvaz, Iran. Sci. Total Environ. 2019, 655, 434–445. [Google Scholar] [CrossRef]
- Lin, B.; Zhu, J. Changes in urban air quality during urbanization in China. J. Clean. Prod. 2018, 188, 312–321. [Google Scholar] [CrossRef]
- Zhu, S.; Lian, X.; Liu, H.; Hu, J.; Wang, Y.; Che, J. Daily air quality index forecasting with hybrid models: A case in China. Environ. Pollut. 2017, 231, 1232–1244. [Google Scholar] [CrossRef] [PubMed]
- Sarkar, N.; Gupta, R.; Keserwani, P.K.; Govil, M.C. Air quality index prediction using an effective hybrid deep learning model. Environ. Pollut. 2022, 315, 120404. [Google Scholar] [CrossRef] [PubMed]
- Michael, E.D.; Uapipatanakul, S. Evaluation of the performance of ADMS in predicting the dispersion of sulfur dioxide from a complex source in Southeast Asia: Implications for health impact assessments. Air Qual. Atmos. Health 2014, 7, 401–405. [Google Scholar] [CrossRef]
- Zhang, R.Q.; Li, M.; Ma, H.C. Comparative study on numerical simulation based on CALPUFF and wind tunnel simulation of hazardous chemical leakage accidents. Front. Environ. Sci. 2022, 10, 1025027. [Google Scholar] [CrossRef]
- Wyat Appel, K.; Napelenok, S.; Hogrefe, C.; Pouliot, G.; Foley, K.M.; Roselle, S.J.; Pleim, J.E.; Bash, J.; Pye, H.O.T.; Heath, N.; et al. Overview and Evaluation of the Community Multiscale Air Quality (CMAQ) Modeling System Version 5.2. In Air Pollution Modeling and its Application XXV; Springer: Cham, Switzerland, 2018; pp. 69–73. [Google Scholar] [CrossRef]
- Li, C.; Du, S.Y.; Bai, Z.P.; Shao-fei, K.; Yan, Y.; Bin, H.; Dao-wen, H.; Li, Z.Y. Application of land use regression for estimating concentrations of major outdoor air pollutants in Jinan, China. J. Zhejiang Univ.-Sci. A 2010, 11, 857–867. [Google Scholar] [CrossRef]
- Janarthanan, R.; Partheeban, P.; Somasundaram, K.; Elamparithi, P.N. A deep learning approach for prediction of air quality index in a metropolitan city. Sustain. Cities Soc. 2021, 67, 102720. [Google Scholar] [CrossRef]
- He, H.; Luo, F. Study of LSTM air quality index prediction based on forecasting timeliness. IOP Conf. Ser. Earth Environ. Sci. 2020, 446, 032113. [Google Scholar] [CrossRef]
- Krishna, K.K.R.; Babu, K.S.; Das, S.K. Multi-directional temporal convolutional artificial neural network for PM2.5 forecasting with missing values: A deep learning approach. Urban Clim. 2021, 36, 100800. [Google Scholar] [CrossRef]
- Liu, X.; Su, K.; Wang, S.; Zhang, Y.; Li, H. Intelligent Prediction of Air Quality Index Based on the Transformer–BiLSTM Model. Sci. Rep. 2025, 15, 41838. [Google Scholar] [CrossRef] [PubMed]
- Dong, J.; Zhang, Y.; Hu, J. Short-Term Air Quality Prediction Based on EMD–Transformer–BiLSTM. Sci. Rep. 2024, 14, 20513. [Google Scholar] [CrossRef] [PubMed]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł; Polosukhin, I. Attention is all you need. In Proceedings of the 31st International Conference on Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; pp. 6000–6010. [Google Scholar] [CrossRef]
- Wang, P.; Tang, K.; Li, Y.; Li, D. An Improved BERT–BiLSTM–CRF Model Integrating CNN and Transformer for Chinese Named Entity Recognition. In Proceedings of the 5th International Conference on Electronic Technology, Communication and Information (ICETCI), Changchun, China, 23–25 May 2025; pp. 512–520. [Google Scholar] [CrossRef]
- Liu, S.; Hu, Y. Air Quality Prediction Based on Factor Analysis Combined with Transformer and CNN–BiLSTM–Attention Models. Sci. Rep. 2025, 15, 20014. [Google Scholar] [CrossRef] [PubMed]
- Subakan, C.; Ravanelli, M.; Cornell, S.; Bronzi, M.; Zhong, J. Attention is all you need in speech separation. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, Toronto, ON, Canada, 6–11 June 2021; pp. 21–25. [Google Scholar] [CrossRef]
- Liu, J.; Wang, G.; Duan, L.Y.; Abdiyeva, K.; Kot, A.C. Skeleton-based human action recognition with global context-aware attention LSTM networks. IEEE Trans. Image Process. 2018, 27, 1586–1599. [Google Scholar] [CrossRef] [PubMed]
- Krizhevsky, A.; Sutskever, I.; Hinton, G.E. ImageNet classification with deep convolutional neural networks. Commun. ACM 2017, 60, 84–90. [Google Scholar] [CrossRef]
- Sun, Y.; Wang, X.; Tang, X. Deep learning face representation from predicting 10,000 classes. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014; pp. 1891–1898. [Google Scholar] [CrossRef]
- Li, Y.T.; Li, R.Y. A hybrid model for daily air quality index prediction and its performance in the face of impact effect of COVID-19 lockdown. Process Saf. Environ. Prot. 2023, 176, 673–684. [Google Scholar] [CrossRef] [PubMed]
- Ma, J.; Li, Z.; Cheng, J.C.; Ding, Y.; Lin, C.; Xu, Z. Air quality prediction at new stations using spatially transferred bi-directional long short-term memory network. Sci. Total Environ. 2020, 705, 135771. [Google Scholar] [CrossRef] [PubMed]
- Saeed, A.; Li, C.; Danish, M.; Rubaiee, S.; Tang, G.; Gan, Z.; Ahmed, A. Hybrid bidirectional LSTM model for short-term wind speed interval prediction. IEEE Access 2020, 8, 182283–182294. [Google Scholar] [CrossRef]
- Liu, F.T.; Ting, K.M.; Zhou, Z.H. Isolation forest. In Proceedings of the Eighth IEEE International Conference on Data Mining, Pisa, Italy, 15–19 December 2008. [Google Scholar] [CrossRef]










| Model | MAE | RMSE | |
|---|---|---|---|
| CNN | 0.8952 | 5.7717 | 8.8245 |
| TCN | 0.9311 | 4.5229 | 7.1559 |
| BiLSTM | 0.9356 | 4.7670 | 6.9163 |
| Transformer | 0.9585 | 3.6765 | 5.5496 |
| Model | MAE | RMSE | |
|---|---|---|---|
| CNN-BiLSTM | 0.9496 | 3.4577 | 6.1213 |
| Transformer Encoder–CNN | 0.9524 | 3.4617 | 5.9463 |
| Transformer Encoder–BiLSTM | 0.9658 | 3.3223 | 5.0417 |
| Proposed model | 0.9781 | 2.4266 | 4.0321 |
| Predicted Good | Predicted Moderate | Predicted Lightly Polluted | Predicted Moderately Polluted | |
|---|---|---|---|---|
| Actual Good | 4254 | 134 | 0 | 0 |
| Actual Moderate | 90 | 2029 | 85 | 0 |
| Actual Lightly Polluted | 0 | 13 | 295 | 7 |
| Actual Moderately Polluted | 0 | 0 | 3 | 63 |
| Activation Function | Rounds | RMSE | MAE | |
|---|---|---|---|---|
| ReLU | 100 | 5.2765 | 3.923 | 0.9625 |
| LeakyReLU | 100 | 4.4434 | 2.6562 | 0.9734 |
| GELU | 87 | 5.1046 | 3.4257 | 0.9649 |
| Swish | 100 | 4.9263 | 2.9497 | 0.9673 |
| Run | Random Seed | RMSE | MAE | Grade Accuracy (%) | |
|---|---|---|---|---|---|
| 1 | 42 | 4.4434 | 2.6562 | 0.9734 | 95.24 |
| 2 | 100 | 4.5355 | 2.7081 | 0.9723 | 94.59 |
| 3 | 200 | 4.6076 | 2.6616 | 0.9714 | 95.10 |
| 4 | 300 | 4.1682 | 2.4857 | 0.9766 | 95.58 |
| 5 | 400 | 4.0890 | 2.4502 | 0.9775 | 95.40 |
| Mean ± Std | — | 4.37 ± 0.21 | 2.59 ± 0.11 | 0.974 ± 0.003 | 95.18 ± 0.36 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Sun, Z.; Zhang, Q.; Chen, G. Air Quality Index Prediction Based on Transformer Encoder–CNN–BiLSTM Model. Atmosphere 2026, 17, 249. https://doi.org/10.3390/atmos17030249
Sun Z, Zhang Q, Chen G. Air Quality Index Prediction Based on Transformer Encoder–CNN–BiLSTM Model. Atmosphere. 2026; 17(3):249. https://doi.org/10.3390/atmos17030249
Chicago/Turabian StyleSun, Zhuoran, Qing Zhang, and Guici Chen. 2026. "Air Quality Index Prediction Based on Transformer Encoder–CNN–BiLSTM Model" Atmosphere 17, no. 3: 249. https://doi.org/10.3390/atmos17030249
APA StyleSun, Z., Zhang, Q., & Chen, G. (2026). Air Quality Index Prediction Based on Transformer Encoder–CNN–BiLSTM Model. Atmosphere, 17(3), 249. https://doi.org/10.3390/atmos17030249
