1. Introduction
With the advancement of industrialization, air pollution has become increasingly serious, and the problems of air quality have attracted more and more public attention. Beijing-Tianjin-Hebei area, the Yangtze River Delta, and the Pearl River Delta have been included in key air pollution monitoring areas. Air quality is a reflection of the degree of air pollution and is judged by the pollutant concentration in the air [
1]. Air pollutants include soot, total suspended particulate matter, inhalable particulate matter (PM
10), fine particulate matter (PM
2.5), CO, NO
2, O
3, SO
2, volatile organic compounds, etc., where PM
2.5 refers to particles in the atmosphere with a diameter less than or equal to 2.5 microns, which can enter the lungs [
2]. Although PM
2.5 is only a small component in the earth’s atmosphere and has a small particle size, it contains a large amount of toxic and harmful substances, has a long residence time in the atmosphere, and has a long transportation distance, which has an important impact on air quality and visibility, and has direct or indirect effects on human health and plant growth [
3]. Forecasting of PM
2.5 concentration is of great significance for building sustainable cities. PM
2.5 forecasting contributes to environmental improvement and to implement air quality citizen science, i.e., all those studies by which citizens could be more directly involved in participatory environment monitoring and sustainable cities. Therefore, it is necessary to monitor and forecast the PM
2.5 concentration in real time. In order to strengthen the management of air quality, the Ministry of Environmental Protection in China has released a daily report of the Air Quality Index (AQI) since 2012 [
4].
Air pollution in North China is serious, and environmental issues are receiving more and more attention. Particularly, the air pollution problem in Beijing area has attracted the most attention. It is important for China’s sustainable development to monitor and forecast of the PM2.5 concentration in Beijing area, with Beijing’s unique political, economic, and cultural status. China’s Environmental Status Bulletin issued by the Ministry of Environmental Protection shows that although the air quality of PM2.5 in the Beijing-Tianjin-Hebei metropolitan region has improved compared with previous years, it is still the most polluted area in China, and the number of cities in the Beijing-Tianjin-Hebei region accounts for more than five of the 10 cities with relatively poor air quality in 74 cities each year. Therefore, the Beijing-Tianjin-Hebei urban agglomeration, as the representative area, is chosen to forecast the PM2.5 concentration data in the Beijing area in order to provide an effective method for regional monitoring and control.
At present, researchers at China and abroad have conducted a lot of research on the generation, dissemination, and forecasting of PM
2.5. Among them, the model of air quality forecasting develops rapidly which can be roughly divided into two categories: the numerical forecasting model and the statistical forecasting model [
5,
6]. Numerical based forecasting models are based on different chemical mechanisms and chemical kinetic equations which is a kind of chemical transport model. However, due to the large number of parameters in the model, the application is difficult, the forecasting accuracy is not very high, and the calculation amount is large [
5,
7]. Statistical-based predictive models do not rely on chemical mechanisms and chemical kinetic equations, and can be forecasted by analyzing their regularity [
6,
8]. The accuracy of statistical forecasting is higher than that of numerical forecasting, and the statistical forecasting model is simple, fast, and low cost.
The method of air quality forecasting based on machine learning belongs to statistical forecasting and has become the main research direction of PM
2.5 forecasting [
9]. These methods, such as backpropagation neural networks (BPNN) [
10], support vector regression (SVR) [
11], radial basis function neural networks [
12], echo state networks (ESNs) [
5], and other machine learning models, have been widely used in air quality forecasting. Although the machine learning model has achieved significant results in air quality forecasting, there are still some problems, such as manual adjustment of parameters, input variable redundancy, and so on [
13].. Therefore, more and more demands make us develop more effective AQI series modeling methods. With the development of artificial intelligence technology, intelligent optimization algorithms are gradually welcomed in air quality forecasting [
14].
In recent years, some researchers have applied intelligent optimization algorithms to the modeling process of time series forecasting [
15,
16,
17]. They use various intelligent optimization algorithms to select input variables, optimize model structure and adjust parameters which have attracted more and more attention from researchers of different backgrounds. In 2016, Niu et al [
15] proposed a new short-term forecasting for the PM
2.5 concentration based on complete ensemble empirical mode decomposition method and the grey wolf optimization (GWO) algorithm. In 2017, Sun et al [
16] proposed a hybrid model based on principal component analysis and the cuckoo search (CS) optimized Least Squares Support Vector Machine for air quality forecasting and monitoring. In 2018, Li et al [
17] established a forecasting model of atmospheric PM
2.5 and nitrogen dioxide (NO
2) concentration based on SVR. The Quantum PSO algorithm is used to select the optimal parameters that affect the performance of the SVR model. In 2019, Zhu et al [
11] focused on modeling and forecasting atmospheric pollution data, and proposed a two-step hybrid model of NO
2 and SO
2 forecasting based on data preprocessing and intelligent optimization algorithms (CS and GWO). In the above popular variants, the intelligent optimization algorithm and the machine learning model have achieved good results in forecasting AQI time series. It has a significant effect on improving forecasting accuracy and efficiency of the model and system.
Aiming at the problems in PM
2.5 forecasting, this paper proposes a hybrid method based on ESN and classical PSO [
18] to improve the forecasting accuracy of the PM
2.5 concentration. First of all, this paper improves the classical PSO algorithm, which improves the search ability of the algorithm and maintains a good balance between exploration and development. Then, the CCM [
19] method is used to analyze the correlation of the original data, and the relevant variables are retained. After that, the improved PSO algorithm is used to optimize the embedding dimension and delay time of the PSR [
20] process, and the hyper-parameters of the ESN model. Finally, the optimized PSR method is used to map the optimal variable subset to the high dimensional space, and the optimized ESN model is used to predict the PM
2.5 concentration. The establishment of the hybrid forecasting model might provide a basis for the government to formulate environmental protection policies, help the society in sustainable development, and provide suggestions for citizens to go out for activities and health care.
4. IPSO-PSR-ESN Hybrid Method
This section will introduce the PM
2.5 concentration prediction model based on the hybrid model of IPSO, PSR, and ESN in detail. The basic structure of the model is shown in
Figure 3. Aiming at the problems existing in the prediction of PM
2.5 chaotic sequence, this paper proposes a hybrid model named IPSO-PSR-ESN. Firstly, the CCM method is used to select the original time series, and the original data is sorted according to the correlation with the PM
2.5 data, and the variable subset (PM
2.5, PM
10, CO, SO
2, NO
2, and O
3) is obtained when the forecasting error is minimized. Secondly, the PSR theory is used to reconstruct the selected optimal subset, and the time series is extended to the higher-dimensional phase space so that the information contained in the dynamic system is fully revealed. Finally, the reconstructed data is predicted using the ESN model. The embedded dimension and delay time in the PSR process and multiple parameters of the ESN model are optimized simultaneously using the improved IPSO algorithm.
In theory, when all variables are projected from the same system, each component will fully expand the attractor. Therefore, in the above basic extension, it will definitely lead to redundancy. Traditional methods for determining the delay time and embedding dimension consider PSR from the perspective of information theory or entropy, which is separate from subsequent modeling, which can be very time consuming for time series prediction and calculated PSR parameters. Not necessarily suitable for time series prediction. In addition, if the chaotic time series contains noise, the reconstruction parameters may also change. In practical applications, the optimal PSR parameters are closely related to the application scenarios of the time series. Therefore, how to reconstruct multi-variable time series in a targeted manner is of great significance for dynamic system modeling. The heuristic algorithm can be used to optimize the delay time and embedding dimension , and the data reconstruction process is combined with the modeling process. Finding the optimal value of the delay time and embedding dimension in the predictive modeling process may provide a new way of thinking about revealing dynamic systems.
The reserve pool is the core part of the ESN, and its parameters and structure have a great impact on the performance of the ESN [
39]. According to the characteristics of different prediction objects, designing a suitable reserve pool structure is the primary problem in ESN modeling. The main hyper-parameters in the reserve pool include the size of the reserve pool, the spectral radius of the internal connection matrix, the sparsity, the leakage rate, and the input transform coefficients, etc. [
40]. When the ESN predicts data of different characteristics, the hyper-parameter settings are often different. At present, these hyper-parameters do not have fixed values or calculation methods, and need to be specifically analyzed for specific problems. The commonly used selection method is based on empirical selection or trial and error method. These methods have great contingency and will affect the modeling effect. These methods have great contingency and will affect the modeling effect. The heuristic algorithm can also be used to automatically optimize the five reserve pool hyper-parameters of the ESN, and improve the prediction effect of the ESN model by obtaining the global optimal value.
Therefore, the IPSO algorithm is used to optimize the delay time and embedding dimension in the PSR process and the hyper-parameters in the reserve pool (the size of the reserve pool, the spectral radius of the internal connection matrix, the sparsity, the leakage rate, and the input transform coefficients). The six selected optimal subsets are predicted, and the 17-dimensional variables are optimized. The objective function is set to the error function of the ESN training process.
Then the IPSO-PSR-ESN hybrid model is established. The flow chart of the hybrid model is shown in
Figure 3. The steps of our algorithm are as follows:
Step 1: Select variables by inputting the original data, using the CCM method for causal analysis, and selecting the best subset.
Step 2: Input the selected subset to the hybrid model. Use the IPSO to optimize 17-dimensional decision variables include the parameters of PSR and the hyper-parameters in the ESN model.
Step 3: Initialize the IPSO population. Set the parameters such as population size, maximum iteration number, and initialize the velocity of each particle to 0, as well as the position of each particle by using the good point set theory to make the particles more evenly distributed in the decision space.
Step 4: Substitute the particle position into the objective function and calculate the target vector.
Step 5: The initial position of each particle () is the initial particle position, and the optimal value from is the initial global optimality of the particle ().
Step 6: While the maximum number of iterations is not reached:
(1) Calculate the inertia weight
and the learning factor
and
in the velocity update formula as [
28]:
where
is the current number of iterations,
is the maximum number of iterations, and
as well as
are the maximum and minimum values of the inertia weight
, respectively.
decreases as the number of iterations increases. In the early stage of the algorithm, the larger the
is, the more difficult it is to fall into the local optimum while in the later stage, the smaller the
is, the faster the algorithm converges and the more stable the convergence is.
and
. The ideal result can be obtained according to
decreasing
increments.
(2) Update the velocity and position of each particle using the speed and position update formula while limiting its value to a certain range.
(3) Adopt a mutation strategy to satisfy the mutation conditions of the particles to avoid local optimum.
(4) Substitute the particle position into the objective function and calculate the target vector.
(5) Evaluate particles of the new generation, update the of each particle and the of the population.
(6) Create a logistics chaotic sequence based , and then randomly replacing the position of one particle in the population using the point with the smallest target value in the chaotic sequence.
End while
Step 7: After the iteration is completed, the final is the required parameters of PSR and the hyper-parameters of the ESN model.
Step 8: The process of PSR. Map selected subsets of variables to high dimensional variable space based on optimized delay time and embedded dimension .
Step 9: Divide the reconstructed sequence into a training data set and a test data set.
Step 10: Substitute the optimized hyper-parameter into the ESN and train the ESN model through the training set.
Step 11: Substitute the test set into the trained model to obtain the index of test accuracy.
5. Experimental Results
All the experiments in this paper were completed in the experimental simulation environment of a Windows 7 operating system, equipped with 3.70 GHz, Intel(R) core i3-4170m CPU, and 12 GB RAM memory. All simulation experiments were carried out with Matlab R2018B (The MathWorks, Inc., Natick, MA, USA). In each simulation experiment, the program parameters were identical, and each simulation experiment group was repeated 10 times.
5.1. Evaluation Indicator
In recent years, many error evaluation indicators have been widely used in related literature, but there is no recognized general standard method. Therefore, this paper uses multiple common error criteria to evaluate the effectiveness of the proposed hybrid model. In this paper, four common prediction criteria (RMSE, MAE, SMAPE, and CR) are used to evaluate the prediction accuracy:
(1) Root mean square error (RMSE), which represents the degree of dispersion between the predicted series and the actual series.
(2) Mean absolute error (MAE), which is a quantity used to measure how close forecasts or predictions are to the eventual outcomes in statistics.
(3) Symmetric mean absolute percentage error (SMAPE), which measures the size of the error in percentage terms, and reflects the relative difference between the predicted series and the actual series.
(4) Correlation coefficient (CR), which is a statistical criterion to reflect the degree of close correlation between variables.
where
,
, and
are the predicted value, observation, and the average of observation of the time series, respectively,
represents the number of samples,
represents the covariance,
represents the variance, and
as well as
represent the predicted sequence and the observed sequence, respectively.
In the above evaluation indicators, the smaller the evaluation index values of RMSE, MAE, and SMAPE, the better the prediction effect of the model. indicates linear correlation, indicates no correlation. When , it indicates that there is a correlation between the two sequences, and the larger the , the stronger the linear correlation.
5.2. Models Comparison
To enlighten possible causal relationships, the relations between air pollutants and meteorological variables are assessed here by the CCM method. Thus, we have obtained the relevant value data and optimal variable subset among air pollutants which are showed in
Table 1 and
Table 2 and
Figure 2. The models that were proposed in this study were used to test the optimal variable subset to evaluate their performance in forecasting the PM
2.5 concentration. According to the CCM causal analysis, when the predictor is six-dimensional, the prediction error reaches the minimum value, and the optimal subset of variables can be determined. Then the PSR process is performed on the selected optimal subset. In order to improve the overall forecast accuracy, this paper firstly combines the PSR process with the forecasting model as a whole to avoid the problems of parameter selection in isolation. And then according to the different conditions such as the forecasting step size, the phase space reconstruct parameters of the best subset is dynamically selected by the IPSO algorithm. It is obvious that the data series is mapping into high-dimensional spaces. The calculated delay time
and embedding dimension
of the Beijing data in the one-step prediction and the ten-step prediction are shown in
Table 3. It can be seen from the table that the delay time
and the embedding dimension
of the optimal subset obtained by optimization in the one-step prediction as well as the delay time
and the embedding dimension
the optimal subset obtained by optimization in the ten-step prediction.
After reconstruction of the data optimal time series subset by the optimized PSR process, the system information contained in the time series is adequately revealed, then it uses the optimized forecasting model to forecast the PM2.5 concentration sequence. In IPSO algorithm, the parameter settings including iterations (MAXTI), population size (POP), and dimension of decision space (DIM) are listed as follows: MAXIT = 100, POP = 40, and DIM = 17. In this paper, the IPSO algorithm uses the above parameter settings in each simulation experiments to ensure the validity and fairness of all experiments.
In order to verify the effectiveness of the proposed algorithm, the original model (ESN) [
23], single-hidden layer feedforward network (SLFN) [
41], extreme learning machine (ELM) [
42], back propagation neural network (BPNN) [
43], the least squares support vector machine (LSSVM) [
44], and the long and short time memory (LSTM) [
45] are selected as the benchmark models. The first 75% of the data is selected as the training set, and the last 25% of the data is the simulated test set.
In order to evaluate the effectiveness of the proposed method comprehensively and objectively, single-step and multi-step (10-step) forecasting experiments of PM
2.5 concentration were carried out respectively. The one-step forecasting results of Beijing PM
2.5 data are shown in
Table 4, and the ten-step forecasting results are shown in
Table 5. For example in
Table 4, the RMSE, MAE, SMAPE, and CR of PM
2.5 concentration of the proposed IPSO-PSR-ESN model for one-step ahead forecasting are 9.4961, 5.7938, 0.1029, and 0.9945, respectively. It is concluded that the two tables verify the superior performance of the proposed model compared with other state-of-the-art models.
As can be seen from the
Table 4 and
Table 5, the forecast accuracies including RMSE, MAE, SMAPE, and CR are illustrated in tables where the smallest or largest value in each row is marked in boldface. From the results displayed in the Figure 7 and
Table 4 and
Table 5, it is obvious that the developed IPSO-PSR-ESN model owns superior forecasting ability compared to other comparative models in this paper. In addition, the values of RMSE, MAE, and SMAPE of the proposed model are all smallest among all the comparison models, and the value of CR of the proposed model is largest among all the models, which further confirms that the proposed model has the best forecasting performance. The smallest error evaluation index (RMSE, MAE, SMAPE) of the proposed hybrid model shows that the forecasting data and the actual data are pretty close and the effect of model fitting is the best. From
Table 4 and
Table 5, the correlation coefficient of PM
2.5 concentration reaches 0.9946 and 0.74062 for one-step forecasting and ten-step forecasting, respectively, which is the maximum value in the comparative models. Therefore, it can be concluded that the proposed model is suitable for the PM
2.5 concentration forecasting. The proposed hybrid model has achieved good forecasting results in the one step and the ten-step prediction. The single-step forecasting and error curve of the Beijing PM
2.5 series generated by the IPSO-PSR-ESN model are shown in
Figure 4. The five hyper-parameters of the ESN are shown in
Figure 5. At the same time,
Figure 6 shows the prediction and error curve for the ten-step prediction.
A graph showing the prediction error of different models as a function of the number of predicted steps is given in
Figure 7. It can be seen from the error curve and various precision indicators that the LSSVM model can obtain good prediction results when the number of prediction steps is small, but when the number of prediction steps increases, the prediction effect of the LSSVM model will drastically deteriorate. The predictive effect of the step prediction is the worst of the selected models. LSTM is a deep learning model that has been used in time series prediction in recent years. The LSTM model shows good performance when the number of prediction steps is large, but the prediction effect of the model is not ideal when the number of prediction steps is small. The prediction effect of other comparison models is between the two models. By comparing the prediction effects of the IPSO-PSR-ESN model and other different models, the prediction accuracy of the hybrid model proposed in this paper has achieved the best prediction results in the 1~10-step prediction, which proves that the optimization part of the IPSO algorithm can effectively improve the prediction accuracy, which avoids the randomness of the parameter settings in the model, and combines the PSR process with the ESN model to dynamically select the PSR parameters and the ESN model hyper-parameters according to the specific prediction requirements. In addition, the proposed IPSO-PSR-ESN model makes the optimal forecast accuracies in the different forecasting step size. All the experiment results can demonstrate the superiority of the proposed hybrid forecasting model. For these seven forecasting models, the best forecasting results are from the IPSO-PSR-ESN model by comparing the performance validations. Through integrating the PSR process and ESN model, and optimizing the parameters of reconstruction process and the hyper-parameters in the ESN model by IPSO algorithm, the model can obtain accurate forecast results. Thus, it is considerable to use the IPSO-PSR-ESN model to forecast atmospheric pollutants in environmental governance and prevention, and provide some basis for the government regulation.
6. Conclusions
PM2.5 is the main source of air pollution, and forecasting the PM2.5 concentration is important for sustainable development and public health. In order to improve the forecast accuracy, this paper developed a novel hybrid model, i.e., IPSO-PSR-ESN model, for PM2.5 concentration forecasting. This paper collected the real-world time series of Beijing in 2016, and filtered the original dataset through CCM to obtain the optimal variable subset. Then, the IPSO algorithm was used to optimize the PSR process and ESN parameters. Finally, based on the trained hybrid model, PM2.5 concentration prediction was achieved. Based on the experimental results and comparative analysis, the following four conclusions can be obtained: (1) the proposed hybrid model outperforms other comparison models in the PM2.5 concentration forecasting; (2) the proposed error correction model can significantly improve the prediction ability of initial forecasting model (ESN); (3) the proposed IPSO algorithm has a positive effect on the forecasting performance of ESN model. Therefore, it can be concluded that (4) the proposed hybrid model can precisely forecast the PM2.5 concentration.
Accurate PM2.5 forecasting enables government to take actions that reduce the severity of episodes of high levels of PM2.5, like promoting the transformation and upgrading of high-pollution enterprises, and encouraging citizens to take public transports instead of driving. Moreover, predictions also enable individuals to take protective actions that limit their own exposure to high levels of PM2.5, such as reducing outdoor activities and staying indoors as much as possible. Although the hybrid model performs well in PM2.5 forecasting, it fails to consider the potential factors affecting air quality in extreme conditions, such as radon emissions and other pollutants, in which the situation is very complicated and needs to be solved in future studies. The accuracy and prediction horizon of PM2.5 concentration series makes its forecasting become a very difficult task. We hope to apply some of the latest methods in the field of artificial intelligence, such as deep learning, multi-task learning, and so on, analyze more factors that affect PM2.5 concentration, for instance the radon radiations. In addition, we expect to extend the forecast time step in future research to achieve medium- and long-term forecasts. Through the interdisciplinary integration, it provides the basis for the government’s air pollution control and the sustainable development of society.