Air-Quality Prediction Based on the EMD–IPSO–LSTM Combination Model

: Owing to climate change, industrial pollution, and population gathering, the air quality status in many places in China is not optimal. The continuous deterioration of air-quality conditions has considerably affected the economic development and health of China’s people. However, the diversity and complexity of the factors which affect air pollution render air quality monitoring data complex and nonlinear. To improve the accuracy of prediction of the air quality index (AQI) and obtain more accurate AQI data with respect to their nonlinear and nonsmooth characteristics, this study introduces an air quality prediction model based on the empirical mode decomposition (EMD) of LSTM and uses improved particle swarm optimization (IPSO) to identify the optimal LSTM parameters. First, the model performed the EMD decomposition of air quality data and obtained uncoupled intrinsic mode function (IMF) components after removing noisy data. Second, we built an EMD–IPSO–LSTM air quality prediction model for each IMF component and extracted prediction values. Third, the results of validation analyses of the algorithm showed that compared with LSTM and EMD–LSTM, the improved model had higher prediction accuracy and improved the model ﬁtting effect, which provided theoretical and technical support for the prediction and management of air pollution.


Introduction
Aerial substances that are dangerous and serious to human health are collectively known as "air pollution" [1]. The Chinese economy's rapid expansion and the growth in the number of cars and industries has increased air pollution and has become a serious problem [2]. Thus, most Chinese cities have established air-quality monitoring networks with the government's help [3]. However, to solve the problem of air pollution, the important thing is not to monitor the air pollution in real-time, but to accurately predict the air quality, which helps cities develop and protect people's health [4]. Figure 1 shows the air quality of Beijing on a specific day in 2020. The continually deteriorating airquality conditions have seriously affected the economic development and human health of China. The air quality index (AQI) is an evaluation standard for the concentration of aerial pollutants. It is calculated from the concentration of the individual pollutants of SO 2 , NO 2 , PM 10 , PM 2.5 , CO, and O 3 in the air, which enables people to have an intuitive understanding of air pollution. Table 1 shows the classification criteria for AQI. Research shows that there is an inevitable relationship between air pollution and respiratory diseases [5]. Polluted air mainly enters the human body through the respiratory system, which seriously affects human health. Accurate early warnings concerning the predicted level of air pollution are crucial to the prevention and control of air pollution as cities develop. Therefore, it is important to monitor and warn people about the air quality.  In the 1980s, mathematical and statistical prediction methods and numerical analyses were used to quantify pollutants [6]. The classic time series analysis is a standard statistical technique. The autoregressive, moving average, autoregressive moving average, and autoregressive integrated moving average (ARIMA) models are classical statistical models used in this field [7]. TRIPTI et al. [8] used the seasonal ARIMA (SARIMA) model and forecast future trends by making the data stationary. However, owing to the diversity of the factors affecting air pollution, air-quality monitoring data have complex characteristics which greatly influence the accurate prediction of air-quality. Nowadays, air quality has become increasingly important to people. An increasing number of research studies have been conducted on air quality [9]. Machine learning (ML) algorithms have been successfully applied to air-quality prediction by several researchers [10][11][12][13][14]. The support vector regression machine is a ML method used to minimize structural risk based on statistical learning theory [15]. Leong et al. [16] proposed a support vector machine (SVM) model to predict the air pollution index and showed that the model could solve the problem of air pollution using radial basis functions effectively and accurately. Wang et al. [17] proposed the new hybrid Garch method that combined the individual prediction model of ARIMA and SVM and generated reliable and accurate predictions. Traditional analysis methods are no longer suitable for processing a large amount of time series data. Du et al. [18] studied the periodic solution of a discrete-time neutral neural network. The study proved its stability and was extended to other neural networks. In recent years, neural networks such as biological neural networks and artificial neural networks have developed rapidly [19] and have been extensively used in the  In the 1980s, mathematical and statistical prediction methods and numerical analyses were used to quantify pollutants [6]. The classic time series analysis is a standard statistical technique. The autoregressive, moving average, autoregressive moving average, and autoregressive integrated moving average (ARIMA) models are classical statistical models used in this field [7]. TRIPTI et al. [8] used the seasonal ARIMA (SARIMA) model and forecast future trends by making the data stationary. However, owing to the diversity of the factors affecting air pollution, air-quality monitoring data have complex characteristics which greatly influence the accurate prediction of air-quality. Nowadays, air quality has become increasingly important to people. An increasing number of research studies have been conducted on air quality [9]. Machine learning (ML) algorithms have been successfully applied to air-quality prediction by several researchers [10][11][12][13][14]. The support vector regression machine is a ML method used to minimize structural risk based on statistical learning theory [15]. Leong et al. [16] proposed a support vector machine (SVM) model to predict the air pollution index and showed that the model could solve the problem of air pollution using radial basis functions effectively and accurately. Wang et al. [17] proposed the new hybrid Garch method that combined the individual prediction model of ARIMA and SVM and generated reliable and accurate predictions. Traditional analysis methods are no longer suitable for processing a large amount of time series data. Du et al. [18] studied the periodic solution of a discrete-time neutral neural network. The study proved its stability and was extended to other neural networks. In recent years, neural networks such as biological neural networks and artificial neural networks have developed rapidly [19] Sustainability 2022, 14, 4889 3 of 18 and have been extensively used in the fields of image identification [20][21][22], stock price forecasting [23][24][25], intelligent robots [26][27][28], and elsewhere.
Recurrent neural networks (RNNs) have been extensively used for learning time series data, and long short-term memory (LSTM) neural networks enable RNN to learn long-term temporal dependencies. Seng et al. [29] proposed a comprehensive method of prediction based on LSTM with many environmental datasets. The results showed that LSTM solved the gradient disappearance and gradient explosion of RNN and achieved higher prediction accuracy, which verified that LSTM had good application prospects in time series prediction. Qadeer et al. [30] used different ML methods to predict hourly PM 2.5 concentrations in two major cities in Korea. The results showed that the performance of an optimized LSTM network was superior to other models. Liu et al. [31] used a LSTM model based on factory-aware attention mechanism for PM 2.5 predictions and showed that the obtained results were superior to other traditional ML methods for forecasting PM 2.5 pollutants. Arsov et al. [32] used RNNs with memory units to forecast PM 10 particulate matter concentrations and revealed that (a) the prediction effect of this model was better than the base model and (b) it could be successfully applied to the prediction of atmospheric pollution. Wang et al. [33] proposed a chi-square test (CT)-LSTM method which combined the CT and a LSTM network model to build a predictive model. The results showed that the air quality data could be further analyzed from the aspect of data preprocessing in future work to improve prediction accuracy. In recent years, wavelet decomposition has been used for data enhancement in deep learning. Sheen Mclean et al. [34] proposed a new spatiotemporal interpolation model which combined deep learning with wavelet preprocessing technology. The overall results showed that the latest model proposed exhibited great potential in the assessment of the spatiotemporal characteristics of outdoor air pollution. Huang et al. [35] used the combination of empirical mode decomposition (EMD) and gated recurrent unit to predict PM 2.5 concentration, and the study showed that the prediction result was greatly improved compared with the single model. This work showed that EMD could use decomposition and reconstruction to improve the prediction accuracy of the model when dealing with complex air quality data.
Based on the above, an air-quality prediction method combined with EMD-IPSO-LSTM is proposed here to improve the predictive accuracy of the air-quality index (AQI). Figure 2 shows a general overview of the research methodology. First, EMD was mainly used to extract all the scales of the original signal. Second, the extracted components with different frequencies were input in the LSTM model for training. Subsequently, the numbers of neurons in each LSTM layer were determined by the improved particle swarm optimization (IPSO) algorithm. Third, the latest model was used to conduct experiments on the AQI for data from Beijing acquired from 1 January 2020 to 31 December 2020. This paper's contributions are: (1) Since EMD can decompose time series data into multiple signals of different frequencies, training each signal separately can make complex time series data easier to predict, so we used EMD decomposition to decompose the AQI data. The decomposed multiple smooth subsequences were then input in the constructed LSTM model. Finally, results were acquired by summarizing all the sequences predicted by the LSTM. (2) Given that the parameters of the LSTM model are mostly set empirically, the IPSO algorithm was used to solve the optimal parameters. (3) A nonlinear decreasing inertia weight and a learning factor that changes with the inertia weight are proposed to overcome the problems of standard PSO associated with the fact that it is easy to fall into local optima and slow convergence at the later stage. (4) Model training was conducted at representative locations in Beijing to prove the universality of the model. (5) The model was compared with the single LSTM and EMD-LSTM model, and the experimental results show that each evaluation index has been significantly improved, which proves the model's effectiveness.  T1 T2 T3 T4 T5 T6 T7 T8   T1 T2 T3 T4 T5   T2 T3 T4 T5 T6   T3 T4 T5 T6 T7   T4 T5 T6 T7   This paper is organized as follows. Section 1 introduces the definition of air qual index and expounds the research status of air quality. In view of the shortcomings previous work, we analyze the progress of relevant research work and demonstrate main contribution of this paper. Section 2 introduces the main techniques used in t paper and the relevant theories are described in detail. Section 3 introduces the d source and the method of data preprocessing. In addition, the experimental setup a research process are described, and the research results are recorded. Section 4 summ rizes this paper and discusses the results obtained. At the same time, it also discusses shortcomings of this work and provides research ideas for future work. Section 5 su marizes the results of this study and draws a conclusion which verifies the contributi of this paper to air quality prediction.

Principle of EMD
As an adaptive signal decomposition method, EMD was extensively used to d compose time series into multiple intrinsic mode function (IMF) and a residual comp nent [36][37][38]. In turn, the IMF components were determined by satisfying two conditio (1) The number of extreme and zero points had to be equal to or differ by no more th one. (2) For each time series, the average value of the upper envelope formed by the lo maximum value and the lower envelope formed by the local minimum value w zero. Figure 3 shows the flow of the specific decomposition method of EMD. The spec decomposition method was: (1) Identify all local maxima and local minima of the sequence X(t) to be decompos This paper is organized as follows. Section 1 introduces the definition of air quality index and expounds the research status of air quality. In view of the shortcomings of previous work, we analyze the progress of relevant research work and demonstrate the main contribution of this paper. Section 2 introduces the main techniques used in this paper and the relevant theories are described in detail. Section 3 introduces the data source and the method of data preprocessing. In addition, the experimental setup and research process are described, and the research results are recorded. Section 4 summarizes this paper and discusses the results obtained. At the same time, it also discusses the shortcomings of this work and provides research ideas for future work. Section 5 summarizes the results of this study and draws a conclusion which verifies the contribution of this paper to air quality prediction.

Principle of EMD
As an adaptive signal decomposition method, EMD was extensively used to decompose time series into multiple intrinsic mode function (IMF) and a residual component [36][37][38]. In turn, the IMF components were determined by satisfying two conditions: (1) The number of extreme and zero points had to be equal to or differ by no more than one. (2) For each time series, the average value of the upper envelope formed by the local maximum value and the lower envelope formed by the local minimum value was zero. Figure 3 shows the flow of the specific decomposition method of EMD. The specific decomposition method was: (1) Identify all local maxima and local minima of the sequence X(t) to be decomposed and connect all local maxima and local minima to form the upper envelope u 0 (t) and the lower envelope d 0 (t), respectively. (2) Identify the mean value a 0 (t) = (u 0 (t) + d 0 (t))/2 of the upper and lower envelopes, and subtract the mean value a 0 (t) from the sequence X(t) to be decomposed to obtain the component h 1 (t), i.e., h 1 (t) = X(t) − a 0 (t). (3) Determine whether h 1 (t) satisfied the IMF condition. If it was satisfied, h 1 (t) was the first IMF component. However, if the condition was not satisfied, apply the same processing to h 1 (t) as that applied to X(t). The new component would be judged and processed in the same way until the IMF conditions were met. The first component of IMF would then be obtained. (4) Repeat the above steps with the remaining component r 1 (t) = X(t) − im f 1 as a new decomposition sequence until the component im f n or the remaining component was less than the predetermined value or the remaining component became a monotonic function. The final result was X(t) = ∑ n i=1 im f i + r n (t). The decomposition of the original sequence X(t) was completed at this point.
3, x FOR PEER REVIEW 5 of 18 (3) Determine whether ℎ ( ) satisfied the IMF condition. If it was satisfied, ℎ ( ) was the first IMF component. However, if the condition was not satisfied, apply the same processing to ℎ ( ) as that applied to X(t). The new component would be judged and processed in the same way until the IMF conditions were met. The first component of IMF would then be obtained.

LSTM
LSTM has a long-term correlation learning ability, is improved compared with RNN, and is suitable for processing time series problems. LSTM adds cell states or memory cells on the basis of RNN to solve problems of traditional RNN [39]. It mainly includes the forget, input, and output gates [40]. The state vector discards useless memories through the forget gate, and the input gate adds the necessary information on the basis of the new input and the previous output. Finally, the output gate determines the new output of the corresponding unit. Figure 4 shows the single LSTM memory block.

LSTM
LSTM has a long-term correlation learning ability, is improved compared with RNN, and is suitable for processing time series problems. LSTM adds cell states or memory cells on the basis of RNN to solve problems of traditional RNN [39]. It mainly includes the forget, input, and output gates [40]. The state vector discards useless memories through the forget gate, and the input gate adds the necessary information on the basis of the new input and the previous output. Finally, the output gate determines the new output of the corresponding unit. Figure 4 shows the single LSTM memory block.  The process of updating the LSTM neurons was: (1) The output of ℎ −1 and the current input were used as the inputs of the forge ting gate to obtain the output value of the forgetting gate based on Equation (1).

= ( • [ℎ −1 , ] + )
(1 where and were the parameters of the forgetting gate, σ was the activatio function which typically used the sigmoid function, and the value range of range between 0 and 1. After the forgetting gate, the state vector of the LSTM was • −1 . (2) The output of ℎ −1 and the current input were transformed nonlinearly as th input of the input gate to obtain a new state vector ̃. ̃ controlled the amount o input through the input gate. The specific equations were Equations (2) and (3).
where and were the parameters of the input gate, tanh was the activatio function, determined the acceptance of ̃, and the value range of was between and 1. After the input gate, the state vector of the LSTM was •̃. (4) The output of ℎ −1 and the current input were used as inputs of the output gat to obtain the output of the output gate; the specific equation was Equation (5). The process of updating the LSTM neurons was: (1) The output of h t−1 and the current input x t were used as the inputs of the forgetting gate to obtain the output value of the forgetting gate based on Equation (1).
where W f and b f were the parameters of the forgetting gate, σ was the activation function which typically used the sigmoid function, and the value range of f t ranged between 0 and 1. After the forgetting gate, the state vector of the LSTM was f t ·c t−1 . (2) The output of h t−1 and the current input x t were transformed nonlinearly as the input of the input gate to obtain a new state vector c t . c t controlled the amount of input through the input gate. The specific equations were Equations (2) and (3).
where W i and b i were the parameters of the input gate, tanh was the activation function, i t determined the acceptance of c t , and the value range of i t was between 0 and 1. After the input gate, the state vector of the LSTM was i t · c t . (3) Update the state vector c t based on Equation (4).
where the new state vector c t was obtained as the current state vector and the value range of c t was between 0 and 1. (4) The output of h t−1 and the current input x t were used as inputs of the output gate to obtain the output of the output gate; the specific equation was Equation (5). where W o and b o were the parameters of the output gate, σ was the activation function which typically used the sigmoid function. The value range of o t was between 0 and 1. (5) Calculate the ultimate output value of the LSTM neurons based on Equation (6).
That is, c t interacted with the input gate after tan h to obtain the final output h t of the LSTM. The value range of h t was between −1 and 1.

IPSO
The PSO algorithm is used frequently to optimize complex numerical functions [41]. It originated from the study of the predatory behavior of bird flocks. In the PSO algorithm, there were several particles in the search space, wherein the algorithm attempted to optimize fitness functions. Every particle calculated its own fitness value based on its position in the search space. By combining information about its current position and its previous optimal position, the direction along which it would move was chosen. To obtain the final answer, these steps were repeated several times until the end condition was met [42][43][44]. Figure 5 shows the search process of the standard PSO. where and were the parameters of the output gate, σ was the activation function which typically used the sigmoid function. The value range of was between 0 and 1.
(5) Calculate the ultimate output value of the LSTM neurons based on Equation (6).
That is, interacted with the input gate after tanh to obtain the final output ℎ of the LSTM. The value range of ℎ was between −1 and 1.

IPSO
The PSO algorithm is used frequently to optimize complex numerical functions [41]. It originated from the study of the predatory behavior of bird flocks. In the PSO algorithm, there were several particles in the search space, wherein the algorithm attempted to optimize fitness functions. Every particle calculated its own fitness value based on its position in the search space. By combining information about its current position and its previous optimal position, the direction along which it would move was chosen. To obtain the final answer, these steps were repeated several times until the end condition was met [42][43][44]. Figure 5 shows the search process of the standard PSO. In the case of the PSO algorithm, it was easy to fall into a local minimum value and fail to identify the global maximum value [45]. Based on various improvement experiences associated with the PSO, the learning factor and inertia weight were improved here. In the case of the PSO algorithm, it was easy to fall into a local minimum value and fail to identify the global maximum value [45]. Based on various improvement experiences associated with the PSO, the learning factor and inertia weight were improved here. (1) Improvement in the inertia weight When the inertia weight was large, the global search capability of the particle was enhanced and the local search capability was weakened. Conversely, when the inertia weight was small, the local search capability of the particle was enhanced and the global search capability was weakened. Proper adjustment of the inertia weight facilitated the rapid search of particles and improved the global search ability, but also facilitated local refinement and obtained a better global optimal solution in the shortest time. It was observed that selecting appropriate parameters was the key to the study and improved the capability of the PSO algorithm. The improvement equation was Equation (7).
where t was the current number of iterations, t max the maximum number of iterations, and w max and w min were the maximum and minimum values of the inertia weights. Appropriate inertia weight played an important role in the search ability of IPSO. Combining previous research and real experimental results, we found that w max of 0.9 and w min of 0.3 were the most suitable for this model, which could achieve a balance between local search and global search.
(2) Improvement of learning factors Symbols c 1 and c 2 denoted the cognitive and social learning factors of the particles. Cognitive factors affected the local search performance and social factors affected the global search performance. Choosing appropriate learning factors was beneficial as they increased the convergence speed and avoided local extreme values. Equations (8) and (9) express the improvement formulas.

Model Evaluation Metrics
The model was validated with AQI index prediction experiments to evaluate the capability of the model and verify the effectiveness of the method. Training the model with excess training sets leads to the overfitting of the model, and training with insufficient training sets leads to the underfitting of the model. Therefore, selecting an appropriate data division method is vital for the accuracy of the model. Here, 8784 pieces of data were normalized, then 95% were selected as the training set and the remaining as the test set. Compared with BP, linear regression (LR), LSTM, EMD-LSTM, and EMD-IPSO-LSTM networks, the mean absolute error (MAE), root-mean-square error (RMSE), mean absolute percentage error (MAPE), and R-square (R 2 ) were selected to evaluate the prediction performance of the model. Equations (10)-(13) express the relevant formulas.
whereŷ i represented the predicted value, y i the true value, and y i the mean value. As the values of MAE, RMSE, and MAPE became smaller, the model fitting effect was improved. Furthermore, as the value of R 2 came closer to 1, the model fitting effect was also improved.

Data Sources and Preprocessing
This study selected 8784 pieces of historical monitoring information from three representative meteorological monitoring stations in Beijing from 1 January 2020 to 31 December 2020 as the experimental dataset. The dataset was obtained from the https: //quotsoft.net/air/website accessed on 12 June 2021. Table 2 shows some of the data sets. The collected data had to be preprocessed before being input in the model for training mainly owing to the following two aspects: first, there were missing values in the collected data which would influence the model predictive accuracy. Thus, before inputting the data into the model for training, the missing values needed to be filled. Given that the air pollutant concentration was influenced by the previous moment, the average of both previous values and the value of the next moment were used to deal with the missing values. Second, the different magnitudes made the prediction error larger. To reduce the large error where results were based on different types of data, the original data needed to be standardized, and the transformation function was Equation (14).
where µ was the mean of all data and σ the standard deviation of all data. This was by far the most common method of data standardization.

Predictive Modeling
First, EMD was used to decompose the AQI sequence, and the AQI data were decomposed to the components of multiple IMF and to a residual component, which were respectively used as input variables of the EMD-LSTM model. Taking the Dongsi site as an example, the AQI values of the preceding 4 h period were used to predict the AQI value in the subsequent 1 h period. The AQI time series was decomposed in 11 IMF components and a RES component. Using the Dongsi station as an example, Figure 6 shows the EMD decomposition results.
It can be observed that the frequency of IMF1 is the highest; the frequencies of IMF2, IMF3, and IMF4 gradually decrease; and RES is the residual component. It is generally believed that the noise is mainly concentrated in the high-frequency IMF components and the low-frequency IMF components are less affected by the noise. However, deletion of the noise would reduce the prediction accuracy greatly. Hence, we retained all the IMF components.
Second, the EMD-LSTM prediction model was used to obtain the prediction value of each component. Third, the predictive results of each component were summarized to obtain the final prediction result. It can be observed that the frequency of IMF1 is the highest; the frequencies of IMF2, IMF3, and IMF4 gradually decrease; and RES is the residual component. It is generally believed that the noise is mainly concentrated in the high-frequency IMF components and the low-frequency IMF components are less affected by the noise. However, deletion of the noise would reduce the prediction accuracy greatly. Hence, we retained all the IMF components.
Second, the EMD-LSTM prediction model was used to obtain the prediction value of each component. Third, the predictive results of each component were summarized to obtain the final prediction result.
Because fewer neural network layers are difficult to fit complex data, more neural network layers lead to the complexity of the model. After repeated comparative experiments, we find that the two-layer neural network was sufficient to fit the training data, which can reduce the complexity of the model while ensuring the prediction accuracy. As Figure 7 shows, we chose a LSTM neural network with two hidden layers and used the PSO algorithm to find the optimal number of neurons L1 in the first layer and L2 in the second layer of the LSTM. To reach the global optimal value faster, an IPSO algo- Because fewer neural network layers are difficult to fit complex data, more neural network layers lead to the complexity of the model. After repeated comparative experiments, we find that the two-layer neural network was sufficient to fit the training data, which can reduce the complexity of the model while ensuring the prediction accuracy. As Figure 7 shows, we chose a LSTM neural network with two hidden layers and used the PSO algorithm to find the optimal number of neurons L1 in the first layer and L2 in the second layer of the LSTM. To reach the global optimal value faster, an IPSO algorithm was designed to correct global optimal value here, and Figure 8 shows the algorithm comparison outcomes. When the results tend to converge, the fitness value of IPSO is smaller than that of PSO, indicating that the parameters found by IPSO are better than PSO and have higher prediction accuracy. rithm comparison outcomes. When the results tend to converge, the fitness value of IP SO is smaller than that of PSO, indicating that the parameters found by IPSO are better than PSO and have higher prediction accuracy.  The output of each hidden layer of LSTM was used as the input of next layer, and the data were finally output through the fully connected layer. Figure 9 shows the ar chitecture of the model. The specific steps were: (1) Normalize the AQI sequence and perform EMD decomposition to obtain multiple IMF and RES components. Then, 95% of the training set samples and 5% of the tes rithm comparison outcomes. When the results tend to converge, the fitness value of IP-SO is smaller than that of PSO, indicating that the parameters found by IPSO are better than PSO and have higher prediction accuracy.  The output of each hidden layer of LSTM was used as the input of next layer, and the data were finally output through the fully connected layer. Figure 9 shows the architecture of the model. The specific steps were: (1) Normalize the AQI sequence and perform EMD decomposition to obtain multiple IMF and RES components. Then, 95% of the training set samples and 5% of the test  The output of each hidden layer of LSTM was used as the input of next layer, and the data were finally output through the fully connected layer. Figure 9 shows the architecture of the model. The specific steps were: (1) Normalize the AQI sequence and perform EMD decomposition to obtain multiple IMF and RES components. Then, 95% of the training set samples and 5% of the test set samples were selected and the raw data were transformed into supervised learning to predict the AQI for the future 1 h using data from the past 4 h.
(2) After normalizing the original data, the normalized data were transformed into the data format required for LSTM, then the LSTM neural network was built. Due to the long training time of the LSTM neural network and the low efficiency of the multi-layer network, this experiment set up a two-layer LSTM which obtained better experimental results in the shortest time. Table 3 shows the main parameters of LSTM. Then, obtained components of IMF and the RES component were input into the LSTM neural network.

Results and Discussion
LSTM neural network was suitable for time series forecasting. However, although LSTM has achieved good results in handling time series problems, it did not achieve ideal results when applied to complex air quality data. Therefore, we summarize the main advantages and limitations of our proposed model according to the real datasets and use EMD combined with decomposition and reconstruction. For the decomposed components, the prediction accuracy is improved. The experimental results verify our hypothesis.
This paper chose the LSTM neural network as the core, which solved the long-term dependence problem of RNN. Simultaneously, EMD performed sequence decomposition according to the time scale characteristics of the sequence and had obvious advantages in  (6) To initialize the IPSO parameters, we set the population size to 50 and the maximum number of iterations to 100. Taking the number of neurons in the two hidden layers of LSTM as the optimization goal, the optimization range is L1, L2 ∈ [1,64]. MAE is selected as the objective function of the EMD-LSTM neural network, that is, the fitness of the IPSO algorithm function. Finally, through the IPSO algorithm, the optimal number of neurons in LSTM are L1 = 24 and L2 = 16. The number of hidden layer neurons obtained by IPSO is brought into EMD-LSTM, and we find that the model has higher prediction accuracy.

Results and Discussion
LSTM neural network was suitable for time series forecasting. However, although LSTM has achieved good results in handling time series problems, it did not achieve ideal results when applied to complex air quality data. Therefore, we summarize the main advantages and limitations of our proposed model according to the real datasets and use EMD combined with decomposition and reconstruction. For the decomposed components, the prediction accuracy is improved. The experimental results verify our hypothesis.
This paper chose the LSTM neural network as the core, which solved the long-term dependence problem of RNN. Simultaneously, EMD performed sequence decomposition according to the time scale characteristics of the sequence and had obvious advantages in processing nonlinear and nonstationary data. Finally, the particle swarm algorithm was proposed to improve the search speed. Considering the above reasons, we combined EMD and LSTM and used an IPSO to find the optimal solution for the number of neural units in LSTM.
To prove the effectiveness of the proposed model, we performed comparative experiments. The input characteristics of each model are seven-dimensional sequence data, and the output are one-dimensional data. We selected the data from the Dongsi monitoring station and used the five models of BP, LR, LSTM, EMD-LSTM and EMD-IPSO-LSTM to obtain the predicted values. Accordingly, the predicted values were compared with the true values. Then, the error was calculated to obtain the experimental result. Figure 10 shows the results of the Dongsi site. The results show that the EMD-IPSO-LSTM can better extract the potential characteristics of air quality data and has certain advantages in the prediction of AQI.  Next, to make the results of the model more convincing, we tested the stability of the model. We conducted the same comparative experiment on Guanyuan and Tiantan (see Figures 11 and 12). The experimental results showed that the improved model had the highest prediction accuracy in the comparative experiment. The model was also suitable for Guanyuan and Tiantan, which further verified the model's effectiveness. Notably, in the comparative experiment, it is apparent that the prediction accuracy of LR is the worst, indicating that complex air quality data cannot be fitted using LR. Compared with other comparison models, the prediction performance of the proposed method is the best. In short, there are several reasons for this result. First, EMD decomposes the complex air quality data into multiple components; then, using only LSTM prediction, it can effectively improve the prediction accuracy. In addition, the improved PSO accurately extracts the best parameters of the model and improves the prediction performance of the LSTM, which further improves the accuracy of the proposed model.

Output value
Next, to make the results of the model more convincing, we tested the stability of the model. We conducted the same comparative experiment on Guanyuan and Tiantan (see Figures 11 and 12). The experimental results showed that the improved model had the highest prediction accuracy in the comparative experiment. The model was also suitable for Guanyuan and Tiantan, which further verified the model's effectiveness. Next, to make the results of the model more convincing, we tested the stability of the model. We conducted the same comparative experiment on Guanyuan and Tiantan (see Figures 11 and 12). The experimental results showed that the improved model had the highest prediction accuracy in the comparative experiment. The model was also suitable for Guanyuan and Tiantan, which further verified the model's effectiveness.  To make the performance of the model more intuitive, Table 4   Next, to make the results of the model more convincing, we tested the stability of the model. We conducted the same comparative experiment on Guanyuan and Tiantan (see Figures 11 and 12). The experimental results showed that the improved model had the highest prediction accuracy in the comparative experiment. The model was also suitable for Guanyuan and Tiantan, which further verified the model's effectiveness.  To make the performance of the model more intuitive, Table 4 shows the evaluation indices of the BP, LR, LSTM, EMD-LSTM, and EMD-IPSO-LSTM models of the three  To make the performance of the model more intuitive, Table 4 shows the evaluation indices of the BP, LR, LSTM, EMD-LSTM, and EMD-IPSO-LSTM models of the three sites of Dongsi, Guanyuan, and Tiantan. The errors and stability of the MAE, RMSE, MAPE, and R 2 of the EMD-IPSO-LSTM model at the three stations were significantly improved, which provided more accurate air quality prediction accuracy than other models. The fluctuation trend of the predicted value was basically consistent with the actual value, which was used as a reference method for AQI prediction. BP and LR are time series prediction models. The experimental results shows that the LSTM achieves a better fit of the data than the traditional ML model. Figures 10-12 show that the fitting curve of the EMD-IPSO-LSTM model was smoother than those of the LSTM and the EMD-LSTM models (see Table 4). MAE, RMSE, and MAPE of the EMD-IPSO-LSTM model were all improved, and the R 2 was closer to one. These findings proved that the long-term memory capability of the LSTM network optimized by the EMD decomposition and IPSO could have a better fitting effect on air-quality data. From an overall perspective, the combined EMD-IPSO-LSTM model was better in each index and had a better R 2 fit.
The results showed that LSTM had long-term memory ability and high prediction accuracy. However, it was difficult to achieve the best performance with a single LSTM model for complex AQI data. After adding EMD, the prediction accuracy of the three stations was improved, which showed that EMD improved the prediction accuracy by decomposing complex time series data into time series with different frequencies. Similarly, in the comparative experiment of the three stations, it was seen that not selecting the appropriate parameters had a great impact on LSTM, worse than the BP neural network. Therefore, it was necessary to use a particle swarm optimization algorithm to find the optimal number of neural units of LSTM, which further improved the prediction accuracy of the model. Here, the EMD-IPSO-LSTM model was superior to other models in shortterm air quality prediction and had practical application value.
Although this method accurately predicted AQI, the experimental data here was not sufficient due to experimental conditions. The results of this study can be further improved. Owing to the limitations of the air quality monitoring station data, we did not have information regarding the meteorological factors near the monitoring stations. Information regarding these factors would likely further improve the performance of our model and should be considered in future work. For example, temperature and wind would affect the diffusion of air pollutants. Future research should consider meteorological factors, vehicle emissions, and the interactions between different monitoring stations in the city, which would predict air quality more accurately. In addition, more advanced data interpolation technology could be used to replace cubic spline interpolation in EMD to reduce the error caused by fitting the envelope of each extreme point of the signal and improve the quality of signal decomposition.

Conclusions
Recently, air quality problems have seriously affected people's health and daily life. Consequently, the prevention and control of air pollution has attracted public attention.
Owing to the complex factors affecting air quality, AQI concentration series are complex and nonstationary. Therefore, accurate prediction of pollution is challenging. Traditional LSTM is a widely-used time series prediction method and an improvement of RNN. In addition, the LSTM can process data with long-term dependence and has a fast convergence speed. However, with the increase in complexity, it is difficult to provide accurate data to predict the AQI.
Here, a combined prediction model based on EMD-IPSO-LSTM was proposed. Based on the analysis of the AQI data of the three stations in Beijing in 2020, the following conclusions were drawn: (1) The decomposition of the data into multiple components of different frequencies through EMD decomposition and incorporating them into the LSTM model improved the accuracy of AQI prediction effectively. (2) The neural units in the hidden layer of LSTM were often determined themselves based on historical experience. Here, the PSO algorithm was selected for optimization and the optimal numbers of neurons in each layer were obtained. (3) Based on the slow convergence speed of the PSO, the problem of local optimization was easily countered; accordingly, a nonlinear decreasing inertia weight and a learning factor that changed with the inertia weight were proposed. These changes reduced the optimization time and led to a faster convergence toward the global optimum value. (4) Based on comparative experiments, it was observed that the EMD-IPSO-LSTM hybrid model proposed here had the best prediction performance, and the true and the predicted values had a high degree of fitting. These findings proved that the hybrid prediction method proposed here was effective for future AQI predictions. Therefore, this method has practical application value.