Air Pollutant Concentration Prediction Based on a CEEMDAN-FE-BiLSTM Model

: The concentration series of PM 2.5 (particulate matter ≤ 2.5 µ m) is nonlinear, nonstationary, and noisy, making it difﬁcult to predict accurately. This paper presents a new PM 2.5 concentration prediction method based on a hybrid model of complete ensemble empirical mode decomposition with adaptive noise (CEEMDAN) and bi-directional long short-term memory (BiLSTM). The new method was applied to predict the same kind of particulate pollutant PM 10 and heterogeneous gas pollutant O 3 , proving that the prediction method has strong generalization ability. First, CEEMDAN was used to decompose PM 2.5 concentrations at different frequencies. Then, the fuzzy entropy (FE) value of each decomposed wave was calculated, and the near waves were combined by K-means clustering to generate the input sequence. Finally, the combined sequences were put into the BiLSTM model with multiple hidden layers for training. We predicted the PM 2.5 concentrations of Seoul Station 116 by the hour, with values of the root mean square error ( RMSE ), the mean absolute error ( MAE ), and the symmetric mean absolute percentage error ( SMAPE ) as low as 2.74, 1.90, and 13.59%, respectively, and an R 2 value as high as 96.34%. The “CEEMDAN-FE” decomposition-merging technology proposed in this paper can effectively reduce the instability and high volatility of the original data, overcome data noise, and signiﬁcantly improve the model’s performance in predicting the real-time concentrations of PM 2.5 . The results show the following: (1) The models that used the decomposition-merging technology have signiﬁcant effects. Models that did not use decomposition technology or merging technology showed that decomposition-merging technology could effectively overcome the non-linearity, large ﬂuctuations, and noise of the PM 2.5 concentration series. The impact of accuracy signiﬁcantly improved the predictive ability of the model. (2) In the hybrid model, the CEEMDAN decomposition method was more suitable for the decomposition of the PM 2.5 concentration sequence than EMD and EEMD. In terms of the decomposition effect, CEEMDAN > EEMD > EMD. (3) FE presented the best merging effect after decomposing the sequence. Compared with SE and AE, the SMAPE of the hybrid model using FE was reduced by 12.88% and 17.79%, respectively, and the SE and AE methods had similar effects. In the case of EMD and CEEMDAN decomposing the sequence, the AE merging effect was better. In the EEMD decomposition-merging, SE and AE had the same entropy clustering effect. The result of the merging of 14 identical decomposition sequences was the same. Therefore, the ﬁnal result was the same. In terms of the merging effect, FE > AE ≥ SE. (4) The prediction effect of CEEMDAN-FE-BiLSTM was better than the other models in terms of the horizontal accuracy and goodness of ﬁt. The RMSE , MAE , and SMPE values were as low as 2.74, 1.90, and 13.59%, respectively, and the R 2 value was as high as 96.34%. (5) Compared with the single-model BiLSTM, the horizontal prediction error RMSE , MAE , and SMAPE values of the CEEMDAN-FE-BiLSTM model were reduced by 33.01%, 30.66%, and 22.30%, respectively, and the goodness of ﬁt was improved by 4.90%. Decomposing the unmerged CEEMDAN-BiLSTM, the CEEMDAN-FE-BiLSTM model’s horizontal prediction errors were reduced by 30.28%, 34.93%, and 30.16%,


Introduction
With the acceleration of industrialization and urbanization, the discharge of pollutants, such as exhaust gas, wastewater, and waste, has greatly increased. In addition, air pollution has become an issue of close concern to all countries. The main causes of serious atmospheric pollution are aerosols (PM 2.5 , PM 10 , etc.) and gases (O 3 , etc.). When PM particles exceed the standard, the atmosphere is in a turbid state. When the O 3 concentration exceeds the standard, it causes pollution, such as ash and photochemical smog [1]. In addition, PM 2.5 blocks the transmission of solar radiation, causing air convection to stagnate, which is not conducive to the diffusion of air pollutants. Rising PM 2.5 concentrations can greatly reduce visibility, affect people's normal travel and traffic order, and easily cause large-scale car accidents [2]. Therefore, the accurate predictions of real-time PM 2.5 , PM 10 , and O 3 are of great practical significance to the governments of various countries for implementing air pollution improvement policies, protecting human health, and ensuring normal production and life activities.
At present, the methods of predicting PM 2.5 mainly include the numerical model method [3], the statistical modeling method [4], the machine learning method [5], and the deep learning method [6]. The numerical model method is mainly based on the aerodynamic theory and the physical and chemical change process, and it uses mathematical methods to establish the dilution and diffusion model of the air pollution concentration to dynamically predict the air quality and the concentration changes of the main pollutants.

CEEMDAN
CEEMDAN [22] was developed based on the EMD algorithm. EMD [23] is an adaptive orthogonal basis time-frequency signal processing method for unknown nonlinear signals. It decomposes the signal into eigenmode components (IMF) with different frequencies.
Due to the existence of data noise, traditional EMD decomposition has the phenomenon of modal aliasing. Wu Z. H. [24] proposed an EEMD, which adds zero mean to the original signal every time the signal is decomposed. White noise with fixed variance can effectively improve the modal aliasing phenomenon, but the added Gaussian white noise is difficult to eliminate, and there is a problem of reconstruction error. Based on EEMD, Torres et al. proposed CEEMDAN, adaptively adding and eliminating white noise. This model not only effectively overcomes modal aliasing, but also reduces reconstruction errors, iterations, and calculation costs.
Assuming that the original time series is X(t), the steps of CEEMDAN's decomposition are as follows: (1) We add n times the Gaussian white noise X(t) of the same length to the original time series data, where i = 1, 2, · · · , n, and ξ 0 are adaptive coefficients. The first modal component IMF 1 is obtained through EMD decomposition; m-many IMF 1 are obtained by m-many experiments and the average value of I MF 1 is obtained. The process is shown in Equations (1) and (2): (2) IMF 1 is removed from the original sequence X(t). The remaining time series is marked as r 1 (t). The adaptive signal E 1 (n i (t)) is calculated by EMD and added to the remaining time series r 1 (t). Then, at each new round of EMD decomposition, we repeat m times the average value to obtain I MF 2 . The process is shown in Equations (3)-(5): (3) For the kth component (k = 2, 3, . . . , n), similar to Step (2), we obtain the kth component I MF k . The process is shown in Equations (3)-(5).
(4) We repeat the above steps until the residual component is not suitable to decompose again, and then we stop decomposing. At this time, all I MF s that meet the conditions are extracted, and the trend term is r n (t):

FE
FE [25] is an improvement of sample entropy (SE) [26] and approximate entropy (AE) [27], which is used to measure the complexity of time series. FE introduces the concept of fuzzy sets, using exponential functions as fuzzy functions to calculate the similarity of vectors. Not only does FE integrate the advantages of sample entropy not dependent on data length, consistency, approximate entropy, strong anti-noise, anti-outlier ability, and so forth, but the introduced fuzzy function also enables FE to solve the problem of sample entropy breakpoints in the calculation and make the value change more stable.

BiLSTM
RNN can process time series data using neurons with self-feedback. However, as the time series grows, the residual error that RNN needs to return decreases exponentially, resulting in a slow update of the network weights and the problem of gradient disappearance or gradient explosion. Hochreiter S et al. [28] proposed the most primitive LSTM, and Gers et al. [29] proposed adding a forget gate. The information of the preceding and following time steps can be filtered without all steps going through the fully connected layer to form the basic framework of LSTM that is commonly used today. The long-term and short-term neural networks replace the traditional hidden layer with the LSTM layer. It can obtain two kinds of information of the cell state and the hidden layer state from the previous moment. It adopts a control gate mechanism and is composed of memory cells, input gates, output gates, and forgetting gates. Schuster M et al. [30] inherited the construction ideas of LSTM and bidirectional recurrent neural network (BRNN), and constructed BiLSTM. The internal structure of the unit is the same as that of LSTM. The overall network structure of BiLSTM is shown in Figure 1.

FE
FE [25] is an improvement of sample entropy (SE) [26] and approximate entropy (AE) [27], which is used to measure the complexity of time series. FE introduces the concept of fuzzy sets, using exponential functions as fuzzy functions to calculate the similarity of vectors. Not only does FE integrate the advantages of sample entropy not dependent on data length, consistency, approximate entropy, strong anti-noise, anti-outlier ability, and so forth, but the introduced fuzzy function also enables FE to solve the problem of sample entropy breakpoints in the calculation and make the value change more stable.

BiLSTM
RNN can process time series data using neurons with self-feedback. However, as the time series grows, the residual error that RNN needs to return decreases exponentially resulting in a slow update of the network weights and the problem of gradient disappearance or gradient explosion. Hochreiter S et al. [28] proposed the most primitive LSTM and Gers et al. [29] proposed adding a forget gate. The information of the preceding and following time steps can be filtered without all steps going through the fully connected layer to form the basic framework of LSTM that is commonly used today. The long-term and short-term neural networks replace the traditional hidden layer with the LSTM layer It can obtain two kinds of information of the cell state and the hidden layer state from the previous moment. It adopts a control gate mechanism and is composed of memory cells input gates, output gates, and forgetting gates. Schuster M et al. [30] inherited the construction ideas of LSTM and bidirectional recurrent neural network (BRNN), and constructed BiLSTM. The internal structure of the unit is the same as that of LSTM. The overall network structure of BiLSTM is shown in Figure 1. On the basis of the forward layer, where the original information propagates forward from the initial time to time t, the backward layer is added, where the information is propagated back from time t to the initial time. Both layers determine the output at the same time. On the basis of the forward layer, where the original information propagates forward from the initial time to time t, the backward layer is added, where the information is propagated back from time t to the initial time. Both layers determine the output at the same time.

CEEMDAN-FE-BiLSTM
The CEEMDAN prediction model was divided into three parts. The first part is the decomposition part, which uses the CEEMDAN model to decompose the hourly PM 2.5 concentration to form K-many IMF components. The second part is the IMF component merging part where (1) the concept of FE was introduced to measure the similarity between IMF, (2) K-many FE values were obtained, and (3) K-means clustering was used to add and merge the similar IMF sequences to obtain m-many components, which were recorded as Feat i (I = 1, 2, ···, m). The third part was the prediction part of the BiLSTM model. BiLSTM is composed of forward LSTM and backward LSTM. The former is responsible for forward feature extraction, and the latter is responsible for reverse feature extraction. The feature information propagated in these two directions was fused, and the final feature was output to obtain the predicted value of the PM 2.5 concentration. The modeling process is shown in Figure 2.
concentration to form K-many IMF components. The second part is the IMF component merging part where (1) the concept of FE was introduced to measure the similarity between IMF, (2) K-many FE values were obtained, and (3) K-means clustering was used to add and merge the similar IMF sequences to obtain m-many components, which were recorded as Feati (I = 1, 2, ···, m). The third part was the prediction part of the BiLSTM model. BiLSTM is composed of forward LSTM and backward LSTM. The former is responsible for forward feature extraction, and the latter is responsible for reverse feature extraction. The feature information propagated in these two directions was fused, and the final feature was output to obtain the predicted value of the PM2.5 concentration. The modeling process is shown in Figure 2.

Data Sources
The data in this paper came from the air pollution values recorded in the Seoul dataset from the Kaggle website. The hourly PM2.5 concentrations of Station 116 were selected as the research objects. The time range was from 0:00 on 1 January 2017 to 23:00 on 31 December 2019, providing a total of 25,906 data points.

Evaluation Criteria
To quantitatively evaluate the prediction performance of the model, we selected the root mean square error (RMSE), the mean absolute error (MAE), the symmetric mean absolute percentage error (SMAPE), and R 2 to measure the prediction accuracy and generalization ability of different models. Let y i be the real value and ŷ i be the model prediction

Data Sources
The data in this paper came from the air pollution values recorded in the Seoul dataset from the Kaggle website. The hourly PM 2.5 concentrations of Station 116 were selected as the research objects. The time range was from 0:00 on 1 January 2017 to 23:00 on 31 December 2019, providing a total of 25,906 data points.

Evaluation Criteria
To quantitatively evaluate the prediction performance of the model, we selected the root mean square error (RMSE), the mean absolute error (MAE), the symmetric mean absolute percentage error (SMAPE), and R 2 to measure the prediction accuracy and generalization ability of different models. Let y i be the real value andŷ i be the model prediction value, where I = 1,2, ···, n (n is the number of samples). The expressions of the above evaluation indexes are shown in Equations (10)- (13).
Atmosphere 2021, 12, 1452 6 of 15 It can be seen from the definition of the three evaluation indicators that the smaller the RMSE and MAE values are, the closer the SMAPE value is to 0, and the closer the R 2 value is to 1, which means that the prediction error of the model is smaller and the generalization ability is stronger.

Experimental Setup
In this experiment, we used the hourly PM 2.5 concentrations over 3 years, with 374 h of data missing in the middle. There were a few missing values, but these missing values had little impact on the model effect. Therefore, no filling was performed, because the PM 2.5 concentration changed greatly in different time periods. To retain the original information of the data, no discrete values were processed. The first 80% of the data were divided into the training set, and the last 20% were divided into the test set. The training set was used for the key parameter selection and model establishment, and the test set was used for the model prediction effect evaluation.
(1) CEEMDAN parameter setting: We used the CEEMDAN algorithm in the PyEMD package to set different modal numbers, and to decompose and test the training set data. When the modal number was set to 14, the score of each decomposed wave was the most stable.
(2) FE parameter setting: CEEMDAN has more IMF components after decomposing the original sequence. We considered combining the components to reduce the amount of calculation for subsequent predictions. The FE value of each IMF component was calculated by the concept of FE, and the FE value was similar. The subsequence was reconstructed into a new sequence. According to previous experience [31], we set the embedding dimension m to 2 and the function boundary width r to 0.15, and we calculated the FE value of each IMF i (I = 1, 2, · · · , 14).
(3) BiLSTM prediction: To merge similar sequences more objectively, K-means clustering is used to merge the IMF sequences according to each FE value to form the input sequence of BiLSTM. The parameter settings of BiLSTM are shown in Table 1 after the investigation and case analysis. We input 12 samples to BiLSTM each time, that is, a time window of 12 h, and predicted the PM 2.5 concentration value of the next hour based on the historical data of the previous 12 h. The learning rate was 5 × 10 −3 , and the learning passed through two hidden layers. The number of iterations was set to 30. For the optimization algorithm in the model, compared to stochastic gradient descent, the Adam algorithm used in this study had a faster and more stable convergence rate, and the common MSE was used for the loss function. We selected the optimal number of clusters by searching, set the number of clusters of K-means clustering to {1, 2, · · · , 14}, and we examined the new input sequence on the training set under different clustering conditions. In the performance situation, the input sequence corresponding to the minimum MSE was the input of the final BiLSTM prediction, and then the test set was predicted.

CEEMDAN Modal Decomposition
According to the CEEMDAN modal decomposition method, the original sequence of the PM 2.5 concentration was decomposed into 14 groups of decomposed waves, as shown in Figure 3.
faster and more stable convergence rate, and the common MSE was used for the loss function. We selected the optimal number of clusters by searching, set the number of clusters of K-means clustering to {1, 2, ···, 14}, and we examined the new input sequence on the training set under different clustering conditions. In the performance situation, the input sequence corresponding to the minimum MSE was the input of the final BiLSTM prediction, and then the test set was predicted.

CEEMDAN Modal Decomposition
According to the CEEMDAN modal decomposition method, the original sequence of the PM2.5 concentration was decomposed into 14 groups of decomposed waves, as shown in Figure 3. From the IMFi of Figure 3a-n, these subsequences decomposed by CEEMDAN showed a trend of decreasing frequency, decreasing amplitude, and increasing wavelength from IMF1 to IMF14. The subsequences showed a certain change rule and period, indicating that the complex sequence of PM2.5 concentration was decomposed into subsequences containing information of different scales and gradually reducing noise. From the IMF i of Figure 3a-n, these subsequences decomposed by CEEMDAN showed a trend of decreasing frequency, decreasing amplitude, and increasing wavelength from IMF 1 to IMF 14 . The subsequences showed a certain change rule and period, indicating that the complex sequence of PM 2.5 concentration was decomposed into subsequences containing information of different scales and gradually reducing noise.

FE Calculation Results
According to the set dimension and function boundary width, the sample entropy of the decomposition wave IMF i was calculated, which was used to evaluate the degree of confusion between the wavefront parts, that is, the frequency of the wave, to provide a basis for the next step of merging and reorganizing the IMF components. The sample entropy of the decomposition wave IMF i is shown in Table 2. The smaller the value of FE is, the more structured the signal's pattern is, and the larger the value is, the more random or unpredictable the signal is. It can be seen from Table 2 that from IMF 1 to IMF 14 , the FE gradually decreased, which once again shows that the subsequence noise obtained by CEEMDAN decomposition gradually decreased.

BiLSTM Experiment Results
According to the key parameter settings of BiLSTM, the performance of BiLSTM on the training set, when the number of clusters of K-means clustering increased, is shown in Figure 4.

BiLSTM Experiment Results
According to the key parameter settings of BiLSTM, the performance of the training set, when the number of clusters of K-means clustering increase in Figure 4. When the number of K clusters increased, the training error of BiLST creased and then increased. When there were five K clusters, the training set R was the smallest (4.70). Therefore, the IMF components were combined and r into five reconstruction sequences Feati (I = 1, 2, ···, 5). To reflect the changes tively in the reconstruction sequence, the results of the partial component rec are shown in Figure 5. When the number of K clusters increased, the training error of BiLSTM first decreased and then increased. When there were five K clusters, the training set RMSE value was the smallest (4.70). Therefore, the IMF components were combined and reorganized into five reconstruction sequences Feat i (I = 1, 2, · · · , 5). To reflect the changes more intuitively in the reconstruction sequence, the results of the partial component reconstruction are shown in Figure 5.  As seen in Figure 5, the new sequences after merging and recombination all show a certain periodicity, which indicates that the noise in the sequence after merging and recombining was less than that before being decomposed. Next, we put these five Feat components into the BiLSTM for training. As shown in Table 1, we set the following parameters: window_size = 12, batch_size = 12, and max_epoch = 30. The flow of the decomposedcombined Feat sequence in BiLSTM is shown in Figure 6. As seen in Figure 5, the new sequences after merging and recombination all show a certain periodicity, which indicates that the noise in the sequence after merging and recombining was less than that before being decomposed. Next, we put these five Feat components into the BiLSTM for training. As shown in Table 1, we set the following parameters: window_size = 12, batch_size = 12, and max_epoch = 30. The flow of the decomposed-combined Feat sequence in BiLSTM is shown in Figure 6.
As seen in Figure 5, the new sequences after merging and recombination all show a certain periodicity, which indicates that the noise in the sequence after merging and recombining was less than that before being decomposed. Next, we put these five Feat components into the BiLSTM for training. As shown in Table 1, we set the following parameters: window_size = 12, batch_size = 12, and max_epoch = 30. The flow of the decomposedcombined Feat sequence in BiLSTM is shown in Figure 6.

Model Comparison Analysis
We selected CEEMD-BiLSTM, CEEMD-SE-BiLSTM, and CEEMD-AE-BiLSTM as the comparison models, and used the evaluation indicators RMSE, MAE, SMAPE, and R 2 to evaluate the performance of all prediction models. Table 3 shows the performance evaluation of each hour of the PM 2.5 concentration predictions for each model on the test set. Figures 7a-c and 8a-c, from the horizontal direction (RMSE, MAE, SMAPE) and the vertical direction (R 2 ), respectively, compare the prediction effects of each model intuitively.  The results show the following: (1) The models that used the decomposition-merging technology have significant effects. Models that did not use decomposition technology or merging technology showed that decomposition-merging technology could effectively overcome the non-linearity, large fluctuations, and noise of the PM2.5 concentration series. The impact of accuracy significantly improved the predictive ability of the model. (2) In the hybrid model, the CEEMDAN decomposition method was more suitable for the decomposition of the PM2.5 concentration sequence than EMD and EEMD. In terms of the The results show the following: (1) The models that used the decomposition-merging technology have significant effects. Models that did not use decomposition technology or merging technology showed that decomposition-merging technology could effectively overcome the non-linearity, large fluctuations, and noise of the PM2.5 concentration series. The impact of accuracy significantly improved the predictive ability of the model. (2) In The results show the following: (1) The models that used the decomposition-merging technology have significant effects. Models that did not use decomposition technology or merging technology showed that decomposition-merging technology could effectively overcome the non-linearity, large fluctuations, and noise of the PM 2.5 concentration series. The impact of accuracy significantly improved the predictive ability of the model. (2) In the hybrid model, the CEEMDAN decomposition method was more suitable for the decomposition of the PM 2.5 concentration sequence than EMD and EEMD. In terms of the decomposition effect, CEEMDAN > EEMD > EMD. (3) FE presented the best merging effect after decomposing the sequence. Compared with SE and AE, the SMAPE of the hybrid model using FE was reduced by 12.88% and 17.79%, respectively, and the SE and AE methods had similar effects. In the case of EMD and CEEMDAN decomposing the sequence, the AE merging effect was better. In the EEMD decomposition-merging, SE and AE had the same entropy clustering effect. The result of the merging of 14 identical decomposition sequences was the same. Therefore, the final result was the same. In terms of the merging effect, FE > AE ≥ SE. (4) The prediction effect of CEEMDAN-FE-BiLSTM was better than the other models in terms of the horizontal accuracy and goodness of fit. The RMSE, MAE, and SMPE values were as low as 2.74, 1.90, and 13.59%, respectively, and the R 2 value was as high as 96.34%.

Extension Analysis
This section presents our exploration into the general applicability of the CEEMDAN-FE-BiLSTM model. We tested the stability of the hybrid model based on PM 10 and O 3 concentration sets and compared them with the other models. The data used in this section are the same as those in Section 3.1.

Predictive Analysis of PM 10
The PM 10 concentrations used in this section were monitored together with PM 2.5 , and the evaluation indicators remained unchanged from the abovementioned RMSE and MAE. Table 4 shows the performance evaluation of each model on the test set for each hour of the PM 10 concentration prediction, and Figure 9 presents the corresponding histograms. From Figure 9, compared to the prediction model of PM 2.5 concentrations, although the prediction accuracy of the model when predicting PM 10 was reduced, the difference between the MAE and RMSE was less than 4. Regardless of the level of accuracy or the goodness of fit, the CEEMDAN-FE-BiLSTM model remains the model with the best predictive effect, with RMSE, MAE, and SMAPE values as low as 5.64, 3.57, and 14.05%, and an R 2 value as high as 94.98%. The hybrid model that does not use the entropy value to merge the decomposition sequence had the worst effect, and the model effect of using the entropy value to merge the decomposition sequence FE > AE > SE once again proved the effectiveness of the FE model merging. In summary, the same model had the same effect on the predictions of PM 10 and PM 2.5 , which proves the effectiveness and accuracy of CEEMDAN-FE-BiLSTM in predicting similar particles.

Predictive Analysis of O 3
Section 4.1 proves the applicability of the CEEMDAN-FE-BiLSTM hybrid model in PM particulate matter prediction. As described in this section, we selected a gas O 3 dataset that is different from PM 2.5 , used the same model in Section 4.1 to predict the hourly concentration of O 3 , and used the same evaluation index for evaluation. In addition, the stability of model prediction was further explored. Table 5 shows the performance evaluation of each model on the test set for each hour of O 3 concentration prediction, and Figures 10 and 11 present the corresponding histograms. Continuing the performance of predicting the concentrations of PM 2.5 and PM 10 , out of the four evaluation indicators used, CEEMDAN-FE-BiLSTM was significantly better than the other models. Moreover, CEEMDAN-FE-BiLSTM was the best model for predicting the hourly concentration of O 3 , except for the value of SMAPE. In addition to the obvious increase, the RMSE and MAE values were still very low, as low as 0.0044 and 0.0036, respectively, and the R 2 value was as high as 95.61%. The model that did not use entropy decomposition had the worst effect. Unlike for predicting PM particles, the effect of using SE value decomposition was significantly better than that of the AE decomposition. Compared to CEEMDAN-AE-BiLSTM, the CEEMDAN-SE-BiLSTM model showed that horizontal prediction error RMSE, MAE, and SMAPE values decreased by 40.71%, 42.62%, and 25.79%, respectively, and the goodness of fit increased by 37.47%. Compared to the CEEMDAN-SE-BiLSTM model with the second-best prediction effect, the horizontal prediction error RMSE, MAE, and SMAPE values of the CEEMDAN-FE-BiLSTM model were reduced by 46.99%, 37.69%, and 12.10%, respectively. Therefore, in the gas prediction, the FE decomposition also plays a decisive role in the accuracy of the model. In summary, for O 3 prediction, the prediction accuracy of CEEMDAN-FE-BiLSTM remained the highest, and the goodness of fit was much better than the other three models. The other three models presented large fluctuations, which proves that the CEEMDAN-FE-BiLSTM hybrid demonstrated the best stability in its model predictions.

Conclusions
The PM2.5 concentration sequence has the characteristics of non-linearity, non-stationary, and a lot of noise. We propose a CEEMDAN-FE-BiLSTM hybrid model based on the decomposition-merge technology to predict PM2.5 concentrations hour by hour. First, the CEEMDAN algorithm is used to decompose the original PM2.5 concentration sequence to obtain 14 IMF components; then, the FE values of the 14 IMF components are calculated according to the FE definition, the number of clusters is set, and the K-means clustering is based on the FE value. IMF components are merged to obtain a new component Feat, which is input into BiLSTM, and the optimal number of clusters is five, according to the training effect. Finally, five new components are input into the BiLSTM model to predict PM2.5 concentrations hour by hour. In order to prove the validity and stability of the CEEMDAN-FE-BiLSTM model, it was used to predict PM10 and O3. The experimental results show the following: (1) Decomposing the original sequence using the CEEMDAN algorithm can effectively remove noise and extract timing information. (2) Using entropy values to recombine IMF sequence can significantly improve the prediction performance of the BiLSTM model. In different cases, SE and AE had different effects on the combination of sequences. Whether for PM particles or gas particles, the prediction effect of BiLSTM after using the FE value to recombine the IMF sequence was significantly better

Conclusions
The PM2.5 concentration sequence has the characteristics of non-linearity, non tionary, and a lot of noise. We propose a CEEMDAN-FE-BiLSTM hybrid model base the decomposition-merge technology to predict PM2.5 concentrations hour by hour. F the CEEMDAN algorithm is used to decompose the original PM2.5 concentration sequ to obtain 14 IMF components; then, the FE values of the 14 IMF components are calcul according to the FE definition, the number of clusters is set, and the K-means clusteri based on the FE value. IMF components are merged to obtain a new component which is input into BiLSTM, and the optimal number of clusters is five, according to training effect. Finally, five new components are input into the BiLSTM model to pr PM2.5 concentrations hour by hour. In order to prove the validity and stability o CEEMDAN-FE-BiLSTM model, it was used to predict PM10 and O3. The experimenta sults show the following: (1) Decomposing the original sequence using the CEEMD algorithm can effectively remove noise and extract timing information. (2) Using ent values to recombine IMF sequence can significantly improve the prediction perform of the BiLSTM model. In different cases, SE and AE had different effects on the comb tion of sequences. Whether for PM particles or gas particles, the prediction effe BiLSTM after using the FE value to recombine the IMF sequence was significantly b than the first two, that is, FE combination plays a decisive role in improving the goodn of-fit of the model. (3) Regardless of whether it is. predicting similar or heterogen pollutants, the CEEMDAN-FE-BiLSTM model is significantly better than the other mo

Conclusions
The PM 2.5 concentration sequence has the characteristics of non-linearity, non-stationary, and a lot of noise. We propose a CEEMDAN-FE-BiLSTM hybrid model based on the decomposition-merge technology to predict PM 2.5 concentrations hour by hour. First, the CEEMDAN algorithm is used to decompose the original PM 2.5 concentration sequence to obtain 14 IMF components; then, the FE values of the 14 IMF components are calculated according to the FE definition, the number of clusters is set, and the K-means clustering is based on the FE value. IMF components are merged to obtain a new component Feat, which is input into BiLSTM, and the optimal number of clusters is five, according to the training effect. Finally, five new components are input into the BiLSTM model to predict PM 2.5 concentrations hour by hour. In order to prove the validity and stability of the CEEMDAN-FE-BiLSTM model, it was used to predict PM 10 and O 3 . The experimental results show the following: (1) Decomposing the original sequence using the CEEMDAN algorithm can effectively remove noise and extract timing information. (2) Using entropy values to recombine IMF sequence can significantly improve the prediction performance of the BiLSTM model. In different cases, SE and AE had different effects on the combination of sequences. Whether for PM particles or gas particles, the prediction effect of BiLSTM after using the FE value to recombine the IMF sequence was significantly better than the first two, that is, FE combination plays a decisive role in improving the goodness-of-fit of the model. (3) Regardless of whether it is. predicting similar or heterogeneous pollutants, the CEEMDAN-FE-BiLSTM model is significantly better than the other models in terms of the horizontal accuracy and goodness of fit, with little fluctuation and a stable prediction effect. The "CEEMDAN-FE" decomposition-merging technology proposed in this paper can effectively reduce the instability and high volatility of the original data, overcome data noise, and significantly improve the model's performance in predicting the real-time concentrations of PM 2.5 .
Although the proposed CEEMDAN-FE-BiLSTM hybrid model can solve the irregular and unstable characteristics of PM 2.5 concentration sequences and improve the prediction accuracy of a PM 2.5 concentration sequence, there are still many problems to be solved. First of all, in this study, we only considered hourly forecasting. Next, we can explore the prediction accuracy of the model for the next 12 h and 24 h to further enhance the broadness of the model's applicability. Secondly, we only considered the single series prediction without considering the factors affecting its change, such as wind speed, temperature, precipitation, and so on. If influencing factors are added, the prediction accuracy of the model may be improved again.