A Hybrid Model for Air Quality Prediction Based on Data Decomposition

: Accurate and reliable air quality predictions are critical to the ecological environment and public health. For the traditional model fails to make full use of the high and low frequency information obtained after wavelet decomposition, which easily leads to poor prediction performance of the model. This paper proposes a hybrid prediction model based on data decomposition, choosing wavelet decomposition (WD) to generate high-frequency detail sequences WD(D) and low-frequency approximate sequences WD(A), using sliding window high-frequency detail sequences WD(D) for reconstruction processing, and long short-term memory (LSTM) neural network and autoregressive moving average (ARMA) model for WD(D) and WD(A) sequences for prediction. The ﬁnal prediction results of air quality can be obtained by accumulating the predicted values of each sub-sequence, which reduces the root mean square error (RMSE) by 52%, mean absolute error (MAE) by 47%, and increases the goodness of ﬁt (R 2 ) by 18% compared with the single prediction model. Compared with the mixed model, reduced the RMSE by 3%, reduced the MAE by 3%, and increased the R 2 by 0.5%. The experimental veriﬁcation found that the proposed prediction model solves the problem of lagging prediction results of single prediction model, which is a feasible air quality prediction method.


Introduction
Over the past few decades, air pollution had become one of the major problems afflicting most developing countries [1]. China is the largest developing country in the world and many cities have suffered from more serious air pollution problems in the past few years, such as Tianjin, Beijing, Shijiazhuang, Tangshan, Cangzhou, etc. Short-term environmental pollution can cause serious health damage, such as eye irritation, breathing difficulties, and lung and cardiovascular health effects [2].Long-term exposure to high levels of air pollutants may lead to chronic diseases such as chronic bronchitis, chronic heart failure, and chronic respiratory diseases etc. People with heart and lung disease, diabetics, the elderly and children are vulnerable to health effects related to air pollution. In addition, these pollutants and their derivatives can cause many adverse effects on the environment [3][4][5], including visibility impairment, acid deposition, global climate change, and water quality degradation. This shows that efficient and accurate air quality prediction models are important for early warning of susceptible populations and preventing diseases induced by exposure to air pollutants.
Air quality evaluation indicators include criteria air pollutants and air quality index (AQI) [6].The AQI is a comprehensive unitless quantitative description of air quality, consisting of six major pollutants: fine particulate matter (PM2.5), respirable particulate matter (PM10), carbon monoxide (CO), ozone (O 3 ), sulfur dioxide (SO 2 ), and nitrogen dioxide (NO 2 ). The AQI is used to measure the overall quality of the air and classifies it into six levels (good, moderate, lightly polluted, moderately polluted, heavily polluted, Information 2021, 12, 210 2 of 12 and severely polluted), providing a good reference for people's outdoor activities. The AQI ranges from 0-500 and reflects the impact on human health in the form of numerical values, with low values representing good air quality and high values representing poor air quality.
The commonly used air quality prediction methods can be broadly classified into three types: deterministic models [7,8], statistical models [9] and hybrid models [10].Deterministic models, also called chemical transfer models, are based on atmospheric physical, chemical reaction and emission data to simulate the emission, accumulation, dispersion and transfer of pollutants in the air. The method is difficult to be applied in practice because the variation of pollutants requires access to a priori information such as sane source information and the evolution of pollutants in the atmosphere, which is often limited and insufficient. Statistical models [11] are divided into parametric and non-parametric approaches based on statistical correlation of weather and air quality variables. All the above air quality prediction methods are single-model air quality prediction methods, and the poor prediction performance of single-model air quality prediction methods is caused by various factors such as feature space, model size and parameter selection. In order to compensate for the large error of a single-model, hybrid prediction models were created. Hybrid models [12] refer to models generated by combining signal decomposition techniques with other prediction models, which are characterized by further decomposing the nonlinear original time series into more stable and regular subseries, and obtaining the final prediction results by aggregating the predicted values of all subseries.
Wavelet decomposition technique is a widely used signal processing method, commonly used in prediction models of time series [13], and its basic principle is to decompose a non-smooth discrete time series into a combination of sequences with different high-frequency detail components and a low-frequency approximate component, and the number of high-frequency detail components depends on the number of layers of wavelet decomposition. Ledys Salazar et al., [12] fused wavelet decomposition and auto regressive integrated moving average model (ARIMA), for predicting hourly O 3 concentrations. To address the problem of low horizontal and directional prediction accuracy of nonlinear AQI sequences, Jiang et al., [14] proposed a hybrid model based on WD, multidimensional scaling and K-means (MSK) clustering methods and an improved extreme learning machine (ELM) method. To better monitor air quality in developing and highly urbanized countries, Sheen Mclean Cabaneros et al., [15] proposed a spatio-temporal interpolation modeling approach based on LSTM and wavelet preprocessing techniques for the spatial prediction of hourly NO 2 levels in urban central London, UK. Due to the strong correlation of atmospheric pollutants, Liu et al., [16] decomposed the AQI series into eight sub-series with different frequencies based on the maximum overlap discrete wavelet packet transform (MODWPT), thus reducing the non-smoothness of the time series for spatial prediction of AQI. All the above algorithms achieved good prediction results but did not take into account the different prediction algorithms for high-frequency subseries and low-frequency subseries after wavelet decomposition. Since the non-smoothness of AQI series in forecasting work increases the difficulty of AQI prediction, Wang et al., [17] proposed a hybrid prediction model to improve the prediction accuracy of AQI series by integrating the twostage decomposition technique and the ELM with differential evolutionary optimization. To address the nonlinearity and instability of air quality, zhang et al., [18] proposed a hybrid deep learning model VBM-BiLSTM by fusing variational model decomposition (VMD) and bi-directional LSTM (BiLSTM) network to predict the variation of PM2.5 concentration. The above prediction model confirmed the positive effect of the hybrid prediction model on the prediction effect, but only for a single air pollutant, and the prediction effect of the model for other air pollutants was not verified.
Therefore, a hybrid prediction model based on data decomposition is proposed to address the problems of existing studies. The collected air quality data (air pollutants as well as AQI) is first analyzed statistically, filled with missing values and pre-processed. Second, a partial autocorrelation function (PACF) [19],was performed to analyze the correlation Information 2021, 12, 210 3 of 12 duration of the air quality data itself. Furthermore, wavelet decomposition of the data can further extract the hidden information, and then the training set is obtained after rolling window processing. Finally, each subsequence is input to the prediction algorithm for training separately to obtain the prediction model. In this experiment, three performance indicators, MAE, RMSE and R 2 , are selected to evaluate the prediction performance of the model and are used to verify the validity of the model. The rest of the paper is organized as follows: details about the data set, decomposition methods, related prediction models, and evaluation indicators are briefly reviewed in Section 2, Section 3 presents the model construction and its prediction results; Section 4 compares and analyzes the proposed hybrid model with existing prediction models; and Section 5 concludes the whole paper.

Sliding Window
The sliding window [20] constructs one sample for each time recording unit T. The samples of T n use the values within [T n _ p , T n ) as features and the values at T n as labels or targets, p is called the sliding window size. A sample plot of the sliding window construction time series is shown in Figure 1. T 1 to T 6 are the original time series inputs, and the size of the sliding window is set to 5. Sample 1 features four data from T 1 to T 4 , and T 5 is the label of sample 1. Sample 2 features four data from T 2 to T 5 , T 6 is the label of sample 2, and so on, the original data of length 9 can build five time series samples. It can be seen that the value of the window size affects the number of time series samples and the features in the samples. For a given data set, a smaller window size means more samples and fewer features; a larger window size means fewer samples and more features. well as AQI) is first analyzed statistically, filled with missing values and pre-processed. Second, a partial autocorrelation function (PACF) [19], was performed to analyze the correlation duration of the air quality data itself. Furthermore, wavelet decomposition of the data can further extract the hidden information, and then the training set is obtained after rolling window processing. Finally, each subsequence is input to the prediction algorithm for training separately to obtain the prediction model. In this experiment, three performance indicators, MAE, RMSE and R 2 , are selected to evaluate the prediction performance of the model and are used to verify the validity of the model. The rest of the paper is organized as follows: details about the data set, decomposition methods, related prediction models, and evaluation indicators are briefly reviewed in Section 2, Section 3 presents the model construction and its prediction results; Section 4 compares and analyzes the proposed hybrid model with existing prediction models; and Section 5 concludes the whole paper.

Sliding Window
The sliding window [20] constructs one sample for each time recording unit T. The samples of Tn use the values within [Tn_p, Tn) as features and the values at Tn as labels or targets, p is called the sliding window size. A sample plot of the sliding window construction time series is shown in Figure 1. T1 to T6 are the original time series inputs, and the size of the sliding window is set to 5. Sample 1 features four data from T1 to T4, and T5 is the label of sample 1. Sample 2 features four data from T2 to T5, T6 is the label of sample 2, and so on, the original data of length 9 can build five time series samples. It can be seen that the value of the window size affects the number of time series samples and the features in the samples. For a given data set, a smaller window size means more samples and fewer features; a larger window size means fewer samples and more features.

Wavelet Decomposition
In this paper, we use wavelet decomposition technique to process the raw air quality time series data, which can separate the high-frequency signal with high-frequency detail characteristics from the trending low-frequency signal, so as to obtain more data features. The decomposition process is as follows:

Wavelet Decomposition
In this paper, we use wavelet decomposition technique to process the raw air quality time series data, which can separate the high-frequency signal with high-frequency detail characteristics from the trending low-frequency signal, so as to obtain more data features. The decomposition process is as follows: where A j , D j refer to the low-frequency approximation component and the high-frequency detail component, respectively. H is the low-pass filter and G is the high-pass filter. Each layer of the decomposed signal in the process of performing wavelet transform is half of the pre-decomposed signal data, so two interpolation reconstructions are needed to recover the signal length, and the reconstruction formula is as follows: where, H 2 and G 2 are the dual operators that are H and G, respectively. In this paper, a four-layer decomposition followed by reconstruction is performed using the db4 wavelet fundamental to obtain a total of four high-frequency detail components of D 1 , D 2 , D 3 , D 4 and one low-frequency approximation component of A 4 . Figure 2 shows the results of the wavelet decomposition of PM2.5 for the supply and marketing agency. It can be seen that the A 4 low-frequency series has obvious trend as well as certain periodicity, and D 1-4 reflects the random fluctuation changes in the trend of the original time series.
to recover the signal length, and the reconstruction formula is as follows: where, 2 H and 2 G are the dual operators that are H and G , respectively. In this paper, a four-layer decomposition followed by reconstruction is performed using the db4 wavelet fundamental to obtain a total of four high-frequency detail components of 1 D , 2 D , 3 D , 4 D and one low-frequency approximation component of 4 A . Figure 2 shows the results of the wavelet decomposition of PM2.5 for the supply and marketing agency. It can be seen that the A4 low-frequency series has obvious trend as well as certain periodicity, and D1-4 reflects the random fluctuation changes in the trend of the original time series.

Complete Ensemble Empirical Mode Decomposition with Adaptive Noise (CEEMDAN)
Compared with empirical mode decomposition (EMD) and ensemble empirical mode decomposition (EEMD), CEEMDAN [21] has better and stronger signal decomposition by overcoming mode mixing, and the eigenmode function of CEEMADN is more favorable for feature extraction. As an optimization of EMD and EEMD algorithms, CEEMDAN can decompose complex raw signals into sequences of IMFs.

Autoregressive Moving Average Model
The ARMA model is a forecasting model based on stochastic theory that performs model analysis on the perturbation terms so that the model integrates past values, present values and errors, and has high accuracy in smooth stochastic series forecasting [22].

Complete Ensemble Empirical Mode Decomposition with Adaptive Noise (CEEMDAN)
Compared with empirical mode decomposition (EMD) and ensemble empirical mode decomposition (EEMD), CEEMDAN [21] has better and stronger signal decomposition by overcoming mode mixing, and the eigenmode function of CEEMADN is more favorable for feature extraction. As an optimization of EMD and EEMD algorithms, CEEMDAN can decompose complex raw signals into sequences of IMFs.

Autoregressive Moving Average Model
The ARMA model is a forecasting model based on stochastic theory that performs model analysis on the perturbation terms so that the model integrates past values, present values and errors, and has high accuracy in smooth stochastic series forecasting [22]. Therefore, in this paper, the ARMA model is used to predict the low-frequency approximate component signals from wavelet decomposition, and the mathematical model of the ARMA model [23] is shown below: where A J is the input data, ϕ is the coefficient of the autoregressive model, ϕ 0 is a constant, θ is the moving average model coefficient, ε J is the white noise process, and p and q are the orders of the ARMA model. p refers to the number of lagged observations included in the model. q refers to the size of the moving average window, which is usually used to limit the window size of the sliding window.

Long Short-Term Memory
LSTM artificial neural network [24] is a special type of recurrent neural network that is capable of learning long-term dependencies. Even for the longest sequential data, LSTM can avoid gradient disappearance or gradient explosion, and it is widely used to solve sequential data problems such as automatic speech recognition and natural language processing. As shown in Figure 3, LSTM has a complex recursive structure connected in time order, LSTM has two important properties, one is the hidden layer state V(T) that changes with time, and the other is the cell state C(T) that maintains long-term memory.
The cell state C(T) is determined by the input gate I(T), forgotten gate F(T) and output gate O(T) at this moment with the hidden layer state V(T) and cell state C(T) at the previous moment; the hidden layer state V(T) is determined by the cell state C(T) and input data at this moment. the updated Equations (4)-(9) for the cell state and hidden layer state in LSTM are as follows: where W and B denote the weight matrix and bias vector, respectively, which are obtained by model training; σ is the sigmoid activation function and tanh is the tanh activation function. "·" denotes the element-by-element product.
Forgotten Gate Input Gate Output Gate

Predictive Effect Evaluation Index
In this paper, the MAE, RMSE and R 2 are used to evaluate the prediction accuracy of the model. The MAE reflects the real situation of the error of the prediction value; the RMSE is a measure of the deviation between the prediction value and the actual value, which is more sensitive to the outliers; R 2 is a statistical measure of the goodness of fit, and the closer its value is to 1, the better the model fits. The expressions are shown below:

Predictive Effect Evaluation Index
In this paper, the MAE, RMSE and R 2 are used to evaluate the prediction accuracy of the model. The MAE reflects the real situation of the error of the prediction value; the RMSE is a measure of the deviation between the prediction value and the actual value, which is more sensitive to the outliers; R 2 is a statistical measure of the goodness of fit, and the closer its value is to 1, the better the model fits. The expressions are shown below: where y is the actual value, y predict is the predicted value of the model, n is the overall length of the data.

Experimental Environment
The experimental environment is a PC with the following configuration: Windows 10 64-bit, Intel Core i7-7500U CPU@2.70GHz,4GRAM, Using Anaconda Navigator3 (Jupyter notebook), python 3.7 as the experimental platform for simulation.

Experimental Data
The air quality data used in this thesis are 69,400 data of air quality from six Environmental Protection Bureau (EPB) environmental monitoring stations in Tangshan City, spanning the period from 1 May 2018 to 1 August 2019. The environmental monitoring stations of the EPB sampled air quality-related attributes once every hour. Table 1 shows the latitude and longitude coordinates of the six EPB environmental monitoring sites in Lubei District, Tangshan.

Wavelet Decomposition-Long Short Term Memory-Autoregressive Moving Average Prediction Model
The prediction model proposed in this paper is shown in Figure 4, and the specific steps are: Step 1: The vacant values of all the original data sets are filled by linear interpolation method and the data are min-max normalized, the formula is shown in (13). Then the entire data are decomposed using the fourth-order wavelet transform decomposition and then reconstruction method to obtain four high-frequency signals WD(D) and one lowfrequency signal WD(A), and the decomposed results are divided into training set and test set, accounting for 67% and 33% of the total amount, respectively: where, x i represents the original sequence, n represents the length of the sequence, and y i represents the normalized result.
Step 2: Normalize the training and test sets to obtain the corresponding normalized models.
Step 3: The training set is subjected to ACF and PACF validation, and the length of the sliding window is determined according to the p-value of ACF. The four highfrequency signals WD(D) of the training set after wavelet decomposition are then subjected to sliding window processing by taking the window length of p, respectively, to obtain the two-dimensional input features of the LSTM training model.
Step 4: Input the two-dimensional input features obtained from Step 3 to the built LSTM training model; input the normalized low-frequency signal WD(A) obtained from Step 2 to the ARMA model for training.
Step 5: The data from the test set are input into the trained LSTM and ARMA models to obtain the prediction results, respectively.
Step 6: The final predicted values are generated by overlapping the predicted results of each component.   Table 2 shows the prediction indicators of the effectiveness of the air quality prediction model proposed in this paper for six air pollutants (SO2, NO2, CO, O3, PM10, PM2.5) and the AQI at six EPB monitoring stations in Tangshan City.

Predicted Results
The prediction accuracy of the air quality prediction model proposed in this paper does not differ much in each station, and good prediction results were achieved for both criteria air pollutants and AQI. As can be seen from Table 2, the air quality prediction model proposed in this paper has a good fitting effect, and the value of R 2 shows that the average index R 2 of the prediction results of the air quality data from the environmental monitoring stations of other EPBs reaches more than 0.94, except for the prediction results of CO in XII and the AQI of the Bureau of Materials. Therefore, the air quality prediction model WD-LSTM-ARMA proposed in this thesis has a good fitting effect.   Table 2 shows the prediction indicators of the effectiveness of the air quality prediction model proposed in this paper for six air pollutants (SO 2 , NO 2 , CO, O 3 , PM10, PM2.5) and the AQI at six EPB monitoring stations in Tangshan City.

Predicted Results
The prediction accuracy of the air quality prediction model proposed in this paper does not differ much in each station, and good prediction results were achieved for both criteria air pollutants and AQI. As can be seen from Table 2, the air quality prediction model proposed in this paper has a good fitting effect, and the value of R 2 shows that the average index R 2 of the prediction results of the air quality data from the environmental monitoring stations of other EPBs reaches more than 0.94, except for the prediction results of CO in XII and the AQI of the Bureau of Materials. Therefore, the air quality prediction model WD-LSTM-ARMA proposed in this thesis has a good fitting effect.

Model Comparison
In order to verify the validity of the prediction model proposed in this paper, the model proposed in this paper was compared and analyzed with other prediction models. In addition to the prediction models proposed in this paper, four other prediction models were applied, including two single prediction models, LSTM and ARMA, and two hybrid models, CEEMDAN-LSTM and WD-LSTM, respectively. The architecture of these five prediction models is shown in Figure 5. The WD-LSTM prediction model is to send the four high-frequency sequences and one low-frequency sequence obtained from wavelet decomposition into the LSTM prediction model for prediction, and finally the prediction results of each sequence are reconstructed to obtain the final prediction results. the CEEMDAN-LSTM prediction model uses the CEEMDAN decomposition [21] method to the original data is decomposed, and then the decomposed subsequences are sent into the LSTM separately for prediction, and finally the predicted results are reconstructed to obtain the final prediction results.

Model Comparison
In order to verify the validity of the prediction model proposed in this paper, the model proposed in this paper was compared and analyzed with other prediction models. In addition to the prediction models proposed in this paper, four other prediction models were applied, including two single prediction models, LSTM and ARMA, and two hybrid models, CEEMDAN-LSTM and WD-LSTM, respectively. The architecture of these five prediction models is shown in Figure 5. The WD-LSTM prediction model is to send the four high-frequency sequences and one low-frequency sequence obtained from wavelet decomposition into the LSTM prediction model for prediction, and finally the prediction results of each sequence are reconstructed to obtain the final prediction results. the CEEMDAN-LSTM prediction model uses the CEEMDAN decomposition [21] method to the original data is decomposed, and then the decomposed subsequences are sent into the LSTM separately for prediction, and finally the predicted results are reconstructed to obtain the final prediction results.

Case Analysis
Taking the PM2.5 of Tangshan city supply and marketing agency as a case study, we can see from Figure 6 that the single prediction model LSTM and ARMA model predict the trend of PM2.5 quite well. Because of the autocorrelation of the series, the lag between

Case Analysis
Taking the PM2.5 of Tangshan city supply and marketing agency as a case study, we can see from Figure 6 that the single prediction model LSTM and ARMA model predict the trend of PM2.5 quite well. Because of the autocorrelation of the series, the lag between the predicted and actual values is easy to occur, that is, the predicted value at moment t seems to be obtained by translating the actual value at moment t-1. From the prediction results of the hybrid model, it can be seen that the problem of lagging prediction results that existed in the single model has been eliminated. The prediction accuracy of CEEMDAN-LSTM is improved compared to the single model, but the overall predicted values are smaller than the actual values. Therefore, WD-LSTM and WD-LSTM-ARMA perform relatively well.
Combined with the error box diagram in Figure 7, it can be seen that there are large error points in LSTM and ARAM, and the overall error range is also larger compared to the hybrid prediction model. CEEMDAN-LSTM suffers from most of the predicted values being smaller than the actual values, and there are also large error points relative to WD-LSTM and WD-LSTM-ARMA. WD-LSTM and WD-LSTM-ARMA perform better, but WD-LSTM-ARMA has a more concentrated overall error and the smallest mean absolute error.  The prediction performance of the five prediction algorithms was evaluated by taking the mean values of the evaluation indexes of the prediction effects of the six air pollutants and the AQI from six monitoring stations in Tangshan City. As can be seen from  The prediction performance of the five prediction algorithms was evaluated by taking the mean values of the evaluation indexes of the prediction effects of the six air pollutants and the AQI from six monitoring stations in Tangshan City. As can be seen from The prediction performance of the five prediction algorithms was evaluated by taking the mean values of the evaluation indexes of the prediction effects of the six air pollutants and the AQI from six monitoring stations in Tangshan City. As can be seen from Figures 8-10, the single prediction models LSTM and ARMA are not satisfactory in predicting either the six air pollutants or the AQI. Overall, the hybrid model predicted all six air pollutants and AQI better than the single prediction model, with CEEMDAN-LSTM performing worse, indicating that the single CEEMDAN decomposition method was not able to fully extract the effective information of the time series. Although the prediction models of WD-LSTM and WD-LSTM-ARMA have similar prediction performance with similar accuracy in predicting PM2.5 concentrations, WD-LSTM-ARMA is significantly better than WD-LSTM model for other pollutants as well as AQI, and the WD-LSTM-ARMA code has shorter computing time.   Through the overall statistical analysis of the evaluation indexes of the prediction results, the WD-LSTM-ARMA air quality prediction model proposed in this paper reduced the RMSE by 52%, reduced the MAE by 47%, and improved the R 2 by 18% relative to the single model ARMA with higher prediction accuracy; and reduced the RMSE by 3%, reduced the MAE by 3%, and improved the R 2 by 0.5% relative to the other hybrid model WD-LSTM with higher prediction accuracy.

Conclusions
Since air quality is deteriorating worldwide, accurate air quality prediction has important theoretical and practical significance for production and life. This study proposes    Through the overall statistical analysis of the evaluation indexes of the prediction results, the WD-LSTM-ARMA air quality prediction model proposed in this paper reduced the RMSE by 52%, reduced the MAE by 47%, and improved the R 2 by 18% relative to the single model ARMA with higher prediction accuracy; and reduced the RMSE by 3%, reduced the MAE by 3%, and improved the R 2 by 0.5% relative to the other hybrid model WD-LSTM with higher prediction accuracy.

Conclusions
Since air quality is deteriorating worldwide, accurate air quality prediction has important theoretical and practical significance for production and life. This study proposes    Through the overall statistical analysis of the evaluation indexes of the prediction results, the WD-LSTM-ARMA air quality prediction model proposed in this paper reduced the RMSE by 52%, reduced the MAE by 47%, and improved the R 2 by 18% relative to the single model ARMA with higher prediction accuracy; and reduced the RMSE by 3%, reduced the MAE by 3%, and improved the R 2 by 0.5% relative to the other hybrid model WD-LSTM with higher prediction accuracy.

Conclusions
Since air quality is deteriorating worldwide, accurate air quality prediction has important theoretical and practical significance for production and life. This study proposes Through the overall statistical analysis of the evaluation indexes of the prediction results, the WD-LSTM-ARMA air quality prediction model proposed in this paper reduced the RMSE by 52%, reduced the MAE by 47%, and improved the R 2 by 18% relative to the single model ARMA with higher prediction accuracy; and reduced the RMSE by 3%, reduced the MAE by 3%, and improved the R 2 by 0.5% relative to the other hybrid model WD-LSTM with higher prediction accuracy.

Conclusions
Since air quality is deteriorating worldwide, accurate air quality prediction has important theoretical and practical significance for production and life. This study proposes a hybrid model for air quality prediction based on data decomposition with a focus on air quality prediction. It is experimentally verified that the WD-LSTM-ARMA air quality timescale prediction model based on data decomposition proposed in this paper can extract the periodic features as well as the random features of the original time series through wavelet decomposition, which has good prediction accuracy and generalizability and is applicable to the prediction of six pollutants as well as the AQI. By introducing a sliding window to handle high-frequency subsequence data, the information required for the model training process can be satisfied while reducing the collection of other feature data. Comparative experiments are conducted with the single prediction models ARMA and LSTM and the hybrid models CEEMDAN-LSTM and WD-LSTM. The experimental results show that the proposed prediction model WD-LSTM-ARMA is 52% lower on RMSE, 47% lower on MAE, and 18% higher on R 2 compared to the single model. WD-LSTM-ARMA is 3% lower on RMSE, 3% lower on MAE, and 0.5% higher on R 2 compared to the hybrid model. Therefore, the model is more suitable for prediction of air quality.
For the future demand of air pollution control and even urban construction for air quality prediction, the model proposed in this paper can be subsequently integrated with decomposition methods or time series problems in anticipation of better prediction results, and it can also be applied to gas load prediction, short-term prediction of network traffic flow and short-term power load prediction.