# A Hybrid Model for Air Quality Prediction Based on Data Decomposition

^{*}

## Abstract

**:**

^{2}) by 18% compared with the single prediction model. Compared with the mixed model, reduced the RMSE by 3%, reduced the MAE by 3%, and increased the R

^{2}by 0.5%. The experimental verification found that the proposed prediction model solves the problem of lagging prediction results of single prediction model, which is a feasible air quality prediction method.

## 1. Introduction

_{3}), sulfur dioxide (SO

_{2}), and nitrogen dioxide (NO

_{2}). The AQI is used to measure the overall quality of the air and classifies it into six levels (good, moderate, lightly polluted, moderately polluted, heavily polluted, and severely polluted), providing a good reference for people’s outdoor activities. The AQI ranges from 0–500 and reflects the impact on human health in the form of numerical values, with low values representing good air quality and high values representing poor air quality.

_{3}concentrations. To address the problem of low horizontal and directional prediction accuracy of nonlinear AQI sequences, Jiang et al., [14] proposed a hybrid model based on WD, multidimensional scaling and K-means (MSK) clustering methods and an improved extreme learning machine (ELM) method. To better monitor air quality in developing and highly urbanized countries, Sheen Mclean Cabaneros et al., [15] proposed a spatio-temporal interpolation modeling approach based on LSTM and wavelet preprocessing techniques for the spatial prediction of hourly NO

_{2}levels in urban central London, UK. Due to the strong correlation of atmospheric pollutants, Liu et al., [16] decomposed the AQI series into eight sub-series with different frequencies based on the maximum overlap discrete wavelet packet transform (MODWPT), thus reducing the non-smoothness of the time series for spatial prediction of AQI. All the above algorithms achieved good prediction results but did not take into account the different prediction algorithms for high-frequency subseries and low-frequency subseries after wavelet decomposition. Since the non-smoothness of AQI series in forecasting work increases the difficulty of AQI prediction, Wang et al., [17] proposed a hybrid prediction model to improve the prediction accuracy of AQI series by integrating the two-stage decomposition technique and the ELM with differential evolutionary optimization. To address the nonlinearity and instability of air quality, zhang et al., [18] proposed a hybrid deep learning model VBM-BiLSTM by fusing variational model decomposition (VMD) and bi-directional LSTM (BiLSTM) network to predict the variation of PM2.5 concentration. The above prediction model confirmed the positive effect of the hybrid prediction model on the prediction effect, but only for a single air pollutant, and the prediction effect of the model for other air pollutants was not verified.

^{2}, are selected to evaluate the prediction performance of the model and are used to verify the validity of the model.

## 2. Theoretical Foundations

#### 2.1. Sliding Window

_{n}use the values within [T

_{n}_

_{p}, T

_{n}) as features and the values at T

_{n}as labels or targets, p is called the sliding window size. A sample plot of the sliding window construction time series is shown in Figure 1. T

_{1}to T

_{6}are the original time series inputs, and the size of the sliding window is set to 5. Sample 1 features four data from T

_{1}to T

_{4}, and T

_{5}is the label of sample 1. Sample 2 features four data from T

_{2}to T

_{5}, T

_{6}is the label of sample 2, and so on, the original data of length 9 can build five time series samples. It can be seen that the value of the window size affects the number of time series samples and the features in the samples. For a given data set, a smaller window size means more samples and fewer features; a larger window size means fewer samples and more features.

#### 2.2. Wavelet Decomposition

_{4}low-frequency series has obvious trend as well as certain periodicity, and D

_{1–4}reflects the random fluctuation changes in the trend of the original time series.

#### 2.3. Complete Ensemble Empirical Mode Decomposition with Adaptive Noise (CEEMDAN)

#### 2.4. Autoregressive Moving Average Model

_{0}is a constant, $\theta $ is the moving average model coefficient, ${\epsilon}_{J}$ is the white noise process, and p and q are the orders of the ARMA model. p refers to the number of lagged observations included in the model. q refers to the size of the moving average window, which is usually used to limit the window size of the sliding window.

#### 2.5. Long Short-Term Memory

#### 2.6. Predictive Effect Evaluation Index

^{2}are used to evaluate the prediction accuracy of the model. The MAE reflects the real situation of the error of the prediction value; the RMSE is a measure of the deviation between the prediction value and the actual value, which is more sensitive to the outliers; R

^{2}is a statistical measure of the goodness of fit, and the closer its value is to 1, the better the model fits. The expressions are shown below:

## 3. Model Construction

#### 3.1. Experimental Environment

#### 3.2. Experimental Data

#### 3.3. Wavelet Decomposition-Long Short Term Memory-Autoregressive Moving Average Prediction Model

#### 3.4. Predicted Results

_{2}, NO

_{2}, CO, O

_{3}, PM10, PM2.5) and the AQI at six EPB monitoring stations in Tangshan City.

^{2}shows that the average index R

^{2}of the prediction results of the air quality data from the environmental monitoring stations of other EPBs reaches more than 0.94, except for the prediction results of CO in XII and the AQI of the Bureau of Materials. Therefore, the air quality prediction model WD-LSTM-ARMA proposed in this thesis has a good fitting effect.

## 4. Comparison and Analysis

#### 4.1. Model Comparison

#### 4.2. Case Analysis

^{2}by 18% relative to the single model ARMA with higher prediction accuracy; and reduced the RMSE by 3%, reduced the MAE by 3%, and improved the R

^{2}by 0.5% relative to the other hybrid model WD-LSTM with higher prediction accuracy.

## 5. Conclusions

^{2}compared to the single model. WD-LSTM-ARMA is 3% lower on RMSE, 3% lower on MAE, and 0.5% higher on R

^{2}compared to the hybrid model. Therefore, the model is more suitable for prediction of air quality.

## Author Contributions

## Funding

## Data Availability Statement

## Conflicts of Interest

## References

- Zhao, G.; Huang, G.; He, H.; He, H.; Ren, J. Regional Spatiotemporal Collaborative Prediction Model for Air Quality. IEEE Access
**2019**, 7, 134903–134919. [Google Scholar] [CrossRef] - Zheng, H.; Cheng, Y.; Li, H. Investigation of Model Ensemble for Fine-Grained Air Quality Prediction. China Commun.
**2020**, 17, 207–223. [Google Scholar] [CrossRef] - Li, W.; Lu, C.; Ding, Y. A Systematic Simulating Assessment WithinReach Greenhouse Gas Target by Reducing PM2.5Concentrations in China. Pol. J. Environ. Stud.
**2017**, 26, 683–698. [Google Scholar] [CrossRef] - Filipiak-Florkiewicz, A.; Topolska, K.; Florkiewicz, A.; Cieślik, E. Are Environmental Contaminants Responsiblefor ‘Globesity’? Pol. J. Environ. Stud.
**2017**, 26, 467–478. [Google Scholar] [CrossRef] - Mahmood, S.; Ali, S.; Qamar, M.A.; Ashraf, M.R.; Atif, M.; Iqbal, M.; Hussain, T. Hard Water and Dyeing Properties:Effect of Pre- and Post-Mordanting on DyeingUsing Eucalyptus Globulus AndCurcuma Longa Extracts. Pol. J. Environ. Stud.
**2017**, 26, 747–753. [Google Scholar] [CrossRef] - Liu, H.; Li, Q.; Yu, D.; Gu, Y. Air Quality Index and Air Pollutant Concentration Prediction Based on Machine Learning Algorithms. Appl. Sci.
**2019**, 9, 4069. [Google Scholar] [CrossRef] [Green Version] - Appel, K.W.; Pouliot, G.A.; Simon, H.; Sarwar, G.; Pye, H.O.T.; Napelenok, S.L.; Akhtar, F.; Roselle, S.J. Evaluation of Dust and Trace Metal Estimates from the Community Multiscale Air Quality (CMAQ) Model Version 5.0; Atmospheric Sciences: Leeds, UK, 2013. [Google Scholar]
- Woody, M.C.; Wong, H.-W.; West, J.J.; Arunachalam, S. Multiscale Predictions of Aviation-Attributable PM2.5 for U.S. Airports Modeled Using CMAQ with Plume-in-Grid and an Aircraft-Specific 1-D Emission Model. Atmos. Environ.
**2016**, 147, 384–394. [Google Scholar] [CrossRef] - Donnelly, A.; Misstear, B.; Broderick, B. Real Time Air Quality Forecasting Using Integrated Parametric and Non-Parametric Regression Techniques. Atmos. Environ.
**2015**, 103, 53–65. [Google Scholar] [CrossRef] - Jin, X.-B.; Yang, N.-X.; Wang, X.-Y.; Bai, Y.-T.; Su, T.-L.; Kong, J.-L. Deep Hybrid Model Based on EMD with Classification by Frequency Characteristics for Long-Term Air Quality Prediction. Mathematics
**2020**, 8, 214. [Google Scholar] [CrossRef] [Green Version] - Wu, Q.; Lin, H. A Novel Optimal-Hybrid Model for Daily Air Quality Index Prediction Considering Air Pollutant Factors. Sci. Total Environ.
**2019**, 683, 808–821. [Google Scholar] [CrossRef] [PubMed] - Salazar, L.; Nicolis, O.; Ruggeri, F.; Kisel’ák, J.; Stehlík, M. Predicting Hourly Ozone Concentrations Using Wavelets and ARIMA Models. Neural Comput. Appl.
**2019**, 31, 4331–4340. [Google Scholar] [CrossRef] [Green Version] - Mallat, S.G. Multifrequency Channel Decompositions of Images and Wavelet Models. IEEE Trans. Acoust. Speech Signal Process.
**1989**, 37, 2091–2110. [Google Scholar] [CrossRef] [Green Version] - Jiang, F.; He, J.; Tian, T. A Clustering-Based Ensemble Approach with Improved Pigeon-Inspired Optimization and Extreme Learning Machine for Air Quality Prediction. Appl. Soft Comput.
**2019**, 85, 105827. [Google Scholar] [CrossRef] - Cabaneros, S.M.; Calautit, J.K.; Hughes, B. Spatial Estimation of Outdoor NO2 Levels in Central London Using Deep Neural Networks and a Wavelet Decomposition Technique. Ecol. Modell.
**2020**, 424, 109017. [Google Scholar] [CrossRef] - Liu, H.; Chen, C. Spatial Air Quality Index Prediction Model Based on Decomposition, Adaptive Boosting, and Three-Stage Feature Selection: A Case Study in China. J. Clean. Prod.
**2020**, 265, 121777. [Google Scholar] [CrossRef] - Wang, D.; Wei, S.; Luo, H.; Yue, C.; Grunder, O. A Novel Hybrid Model for Air Quality Index Forecasting Based on Two-Phase Decomposition Technique and Modified Extreme Learning Machine. Sci. Total Environ.
**2017**, 580, 719–733. [Google Scholar] [CrossRef] [PubMed] - Zhang, Z.; Zeng, Y.; Yan, K. A Hybrid Deep Learning Technology for PM2.5 Air Quality Forecasting. Environ. Sci. Pollut. Res.
**2021**. [Google Scholar] [CrossRef] - Wu, C.-H.; Lu, C.-C.; Ma, Y.-F.; Lu, R.-S. A New Forecasting Framework for Bitcoin Price with LSTM. In Proceedings of the 2018 IEEE International Conference on Data Mining Workshops (ICDMW), Singapore, 17–20 November 2018; IEEE: Piscataway, NJ, USA, 2018; pp. 168–175. [Google Scholar]
- Ma, J.; Ding, Y.; Gan, V.J.L.; Lin, C.; Wan, Z. Spatiotemporal Prediction of PM2.5 Concentrations at Different Time Granularities Using IDW-BLSTM. IEEE Access
**2019**, 7, 107897–107907. [Google Scholar] [CrossRef] - Liu, Y.; Wang, L. Drought Prediction Method Based on an Improved CEEMDAN-QR-BL Model. IEEE Access
**2021**, 9, 6050–6062. [Google Scholar] [CrossRef] - Velasco, C.; Lobato, I.N. Frequency Domain Minimum Distance Inference for Possibly Noninvertible and Noncausal ARMA Models. Ann. Statist.
**2018**, 46. [Google Scholar] [CrossRef] - Lennon, H.; Yuan, J. Estimation of a Digitised Gaussian ARMA Model by Monte Carlo Expectation Maximisation. Comput. Stat. Data Anal.
**2019**, 133, 277–284. [Google Scholar] [CrossRef] [Green Version] - Graves, A. Long Short-Term Memory. In Supervised Sequence Labelling with Recurrent Neural Networks; Studies in Computational Intelligence; Springer: Berlin/Heidelberg, Germany, 2012; Volume 385, pp. 37–45. ISBN 978-3-642-24796-5. [Google Scholar]

Stations | Longitude | Latitude |
---|---|---|

Gongxiaoshe | 118.1662 | 39.6308 |

Shierzhong | 118.1838 | 39.65782 |

Xiaoshan | 118.1997 | 39.6295 |

Wuziju | 118.1853 | 39.6407 |

Taocigongsi | 118.2185 | 39.6679 |

Leidazhan | 118.144 | 39.643 |

Stations | Index | AQI | SO_{2} | NO_{2} | CO | O_{3} | PM10 | PM2.5 |
---|---|---|---|---|---|---|---|---|

Gongxiaoshe | RMSE | 8.9325 | 3.4657 | 4.7523 | 0.2258 | 8.5855 | 11.5951 | 5.9676 |

MAE | 6.0555 | 1.9987 | 3.5679 | 0.1591 | 6.3389 | 7.7954 | 4.1122 | |

R^{2} | 0.9456 | 0.9728 | 0.9501 | 0.9467 | 0.9802 | 0.9525 | 0.9441 | |

Shierzhong | RMSE | 9.28 | 4.4069 | 5.1125 | 0.2962 | 6.5316 | 12.4035 | 6.069 |

MAE | 6.5033 | 2.5241 | 3.7711 | 0.185 | 4.8603 | 8.3986 | 4.3578 | |

R^{2} | 0.9478 | 0.9687 | 0.961 | 0.9322 | 0.9872 | 0.9528 | 0.9495 | |

Xiaoshan | RMSE | 8.7925 | 4.3061 | 4.2181 | 0.2541 | 6.554 | 12.9483 | 5.6387 |

MAE | 5.911 | 2.4288 | 3.1313 | 0.1516 | 4.9478 | 9.1704 | 4.1039 | |

R^{2} | 0.9506 | 0.9619 | 0.9558 | 0.9285 | 0.9885 | 0.9447 | 0.9523 | |

Wuziju | RMSE | 10.4078 | 4.2532 | 5.1094 | 0.2312 | 5.901 | 12.7252 | 5.9236 |

MAE | 6.8036 | 2.4893 | 3.8179 | 0.1381 | 4.2853 | 8.209 | 4.305 | |

R^{2} | 0.9332 | 0.9574 | 0.9508 | 0.9513 | 0.9886 | 0.9578 | 0.9516 | |

Taocigongsi | RMSE | 8.6066 | 3.1685 | 4.6597 | 0.2251 | 5.9224 | 12.9119 | 5.9849 |

MAE | 5.6633 | 1.9634 | 3.4786 | 0.146 | 4.4979 | 8.602 | 4.2579 | |

R^{2} | 0.9615 | 0.9735 | 0.9576 | 0.9463 | 0.9886 | 0.9646 | 0.9508 | |

Leidazhan | RMSE | 8.4232 | 3.5518 | 3.4449 | 0.1732 | 6.3126 | 9.7748 | 5.3129 |

MAE | 6.0255 | 2.0302 | 2.4858 | 0.1128 | 4.7903 | 6.2446 | 3.8336 | |

R^{2} | 0.9438 | 0.9666 | 0.956 | 0.9712 | 0.9885 | 0.9616 | 0.9498 |

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |

© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Fan, S.; Hao, D.; Feng, Y.; Xia, K.; Yang, W.
A Hybrid Model for Air Quality Prediction Based on Data Decomposition. *Information* **2021**, *12*, 210.
https://doi.org/10.3390/info12050210

**AMA Style**

Fan S, Hao D, Feng Y, Xia K, Yang W.
A Hybrid Model for Air Quality Prediction Based on Data Decomposition. *Information*. 2021; 12(5):210.
https://doi.org/10.3390/info12050210

**Chicago/Turabian Style**

Fan, Shurui, Dongxia Hao, Yu Feng, Kewen Xia, and Wenbiao Yang.
2021. "A Hybrid Model for Air Quality Prediction Based on Data Decomposition" *Information* 12, no. 5: 210.
https://doi.org/10.3390/info12050210