1. Introduction
Pollution of the environment is one of the most serious issues facing humankind today, and badly polluted air can cause great damage in economics and people’s lives. According to the World Health Organization (WHO), it is known that almost 3 million children die every year from a range of problems caused by air pollution [
1]. With the process of industrialization and urbanization, the air pollution is becoming increasingly serious and the hazy weather has grown rapidly, especially in developing countries. In recent years, the foggy weather in many areas of China have become increasingly serious. Since the beginning of 2013, sustained haze weather has turned Beijing-Tianjin-Hebei (Jing-Jin-Ji region) into heavy pollution region. Fine particulate matter is one of the key contributors that leading to air pollution and hazy weather. It carries many adverse health effects, such as respiratory diseases and premature death [
2].
Recently, increasingly countries have set up environmental monitoring systems, which can provide a large amount of PM monitoring data. However, PM data are affected by many factors and fluctuates greatly over time, making it very challenging to predict. Therefore, many models and tools have been developed to predict PM
2.5 and other air pollutant concentrations to improve the accuracy of the predictions. These models can be generally categorized into physical, statistical and hybrid models. For example, physical methods can be used to simulate the processes of emissions, diffusion and transfer of pollutants through meteorological, emission, and chemical models [
3,
4,
5]. Statistical methods which mainly include autoregressive integrated moving average model (ARIMA), artificial neural networks (ANN) and multiple linear regression (MLR) [
2,
6,
7,
8,
9], have been broadly applied to the pollutant concentration prediction. For instance, Ref. [
10] proposed a forecasting model based on MLR and bivariate correlation analysis to predict the annual and seasonal concentrations of PM10 and PM
2.5. Ref. [
11] studied the effects of meteorological factors on ultrafine particulate matter (UFP) and PM10 concentrations under traffic congestion conditions using the ARIMA model. However, in practice, most pollutant sequences are non-linear and irregular, which may involve the problem of non-linear dynamical systems, so these linear algorithms are still problematic in predicting PM concentration. On the contrary, using artificial neural network models to predict pollutant concentration can overcome the limitations of traditional linear models and handle nonlinear problems well. [
12] developed extended model based on long-term and short-term memory neural network. The model takes into account the spatiotemporal correlation to predict the pollutant concentration and shows excellent performance. [
13] applied cuckoo search (CS) to optimize BPNN to predict PM concentrations in four major cities in China.
Recently, to predict air quality more accurately, many hybrid models have been proposed based on ensemble learning paradigms, data preprocessing techniques and heuristic algorithms. For example, Ref. [
14] developed a new prediction model based on the multidimensional k-nearest neighbor model and the ensemble empirical mode decomposition (EEMD) method. Ref. [
15] developed a novel hybrid model based on wavelet transform (WT) and stacked autoencoder (SAE) and long short-term memory (LSTM) to simulate PM
2.5 at six sites in China. Ref. [
16] developed a model based on a combination of WT and neural network algorithm to decompose the PM
2.5 data and then perform sub-series prediction analysis and finally data reconstruction. Ref. [
17] proposed a novel PM
2.5 hybrid prediction model, which includes a new pre-processing method (wavelet transform and variational mode decomposition), using differential evolution (DE) algorithm optimized BPNN to predict each decomposition sequence. The drawback of the decomposition-based prediction model is that using a single method to predict all signal sequences. Since different decomposition sequences have different characteristics, a single model does not fit all the characteristics of the decomposition sequences [
18]. Thus, ensemble prediction model integrated multiple single models will help avoid the shortcomings of a single model and further improve the prediction accuracy. Furthermore, many heuristic algorithms are used to help optimize the weight coefficients of the ensemble model. [
19] developed an ensemble model based on differential evolution (DE) to determine the optimum weights for electricity demand forecasting. Ref. [
20] employed the cuckoo search algorithm (CSO) to optimize the weight coefficients of ensemble model. Whale optimization algorithm (WOA), proposed by Ref. [
21], is a novel heuristic algorithm by imitating whale behavior in nature. However, the WOA will encounter problems such as being stuck in a local optimal solution and slow in convergence, when solving more complex problems. Thus, a new improved whale optimization algorithm (IWOA) is proposed in this study to strengthen the local seeking capability of the WOA.
Through the above analysis, considering the criticality of data pre-processing and the limitations of one single prediction model, a new hybrid decomposition–ensemble learning paradigm based on variation mode decomposition (VMD) and modified whale-optimization algorithm (IWOA) is introduced. First, the original PM sequence is decomposed into different VM sequences using VMD. Then, the weight-determined ensemble model, which optimized by IWOA, is employed to forecast each decomposition component. Finally, several prediction subsets are assembled into the final prediction result.
The paper is structured as follows: in
Section 2, several single forecasting models, the ensemble prediction theory and VMD, are introduced. In
Section 3, the proposed decomposition–ensemble model is presented. In
Section 4, the study areas and the evaluation criteria are described. In
Section 5, the comparative results of the proposed model and other models is in conducted. Finally, in
Section 6, the conclusions the important results of this paper are explicitly introduced.
3. Decomposition–Ensemble Learning Paradigm
In this part, we suggest a new hybrid decomposition–ensemble learning paradigm that integrates VMD method, several prediction models and IWOA optimization. The main process of the developed decomposition–ensemble paradigm is shown in
Figure 1. The three main steps of the ensemble model are as follows:
- -
Step 1: Decomposition process:
First, the features and noise of the original pollution data needed to be cleaned and processed so that an effective prediction model could be built. In this study, VMD technology was used to disaggregate the original pollution datasets into a set of VMs and the residue component with corresponding frequencies.
- -
Step 2: Ensemble forecasting and IWOA optimization:
The decomposition sequences with different characteristics were obtained via the VMD process. However, different sequences had different properties, which meant that a single prediction method could no longer effectively adapt to all the characteristics of the VMs. Thus, the ensemble strategy is adopted to solve this problem, and can be described as that if there are
M types of prediction methods with the correct selection of weight coefficients to solve a problem. The results of multiple models were added together. Assume that
(Model = “BPNN”, “ANFIS”, “ANFIS-FCM”, “GMDH”) is the ensemble prediction result of each VM by using the above methods. Then, using IWOA to optimize the output of the
, it can be expressed as
where
(
i = 1, 2, …,
N) is the weight coefficient of the model
N.
is the range of weight coefficients by NNCT [
27].
To improve the optimal weight coefficients (i = 1, 2, …, N), IWOA was employed to find the optimal solution for the ensemble weight coefficients. Before optimization, the objective equation needed to be confirmed first. The objective function of this paper is set by Equation (18). When the predefined minimum value of the objective function or the maximum iterations was reached, the optimization process was terminated. Nevertheless, the search boundary of the WOA is set to [−2, 2], the nesting dimension is 5 and the maximum number of iterations is 500.
- -
Step 3: Assemble forecasting results:
Through the above steps, the overall prediction results of the VMs were obtained. Then, the prediction results were combined to obtain the final result.
5. Results and Analysis
5.1. Data Decomposition by VMD
In the proposed VMD-IWOA ensemble model, the original PM2.5 concentration sequence is first decomposed into several independent VMs by using VMD. However, too many VMs introduce new problems. During the integrated prediction process, each VM generates estimation errors, and too many VMs cause an accumulation of errors. It also increases the time consumed in a single prediction step. To prevent the above problems, the entire VMs were restructured into three VMs and a residual.
5.2. The Process of Ensemble Forecast on VMs
The BPNN, ANFIS, ANFIS-FCM and GMDH prediction models were applied to forecast each VM, which reconstructed in
Section 5.1. Additionally, then, the ensemble model integrates the results of the four prediction models on each VM, and optimizes the weights of the four prediction results based on IWOA. Before the simulation, the parameters of the four neural network model need to be initialized. The input nodes of the neural network are set to four, the hidden nodes to nine and the output nodes to 1. Besides, the rolling single-step forecasting operation method based on PM
2.5 concentration data of four cities is used to test the predictive performance. The detailed experimental parameters of the four neural networks are shown in
Table 2.
Table 3 shows the prediction results of the single models and the proposed ensemble model for each VM. To evaluate model performance, the RMSE was utilized as a model evaluation index. As can be seen from
Table 3, each model performed optimally predictive behavior at a particular VM. For instance, the experimental results in Beijing were shown as follows: the BPNN provides the lowest RMSE values among all single models at VM2 and VM3, while at VM1 and residual, GMDH has the lowest RMSE values. The prediction results in Tianjin show that the ANFIS presents the best results at VM1. The FCM performs best at VM3. At VM2 and Residual, the GMDH provides the best results. The experimental results in Baoding show that among all of the single models, the RMSE value was lower than those of the other methods at VM1 and Residual, when the ANFIS was applied. At VM2 and VM3, the GMDH presents the optimal results. The forecasting results in Shijiazhuang reveal that the GMDH performs better than the others at VM2, VM3 and residual while ANFIS performs the best at VM1.
Based on the above analysis, it can be revealed that each model has its advantages on the particular VMs. A single prediction model cannot be used to predict all decomposition signals uniformly. Thus, the most suitable model is selected according to the different conditions, which reveals that an ensemble model can incorporate the virtues of multiple individual models to overcome the limitations of individual models. Therefore, this study proposed an ensemble model based on the IWOA to seek the best weight coefficients of the ensemble model. The searching boundary is set in [−2, 2] based on the NNCT, and the RMSE criteria is used as fitness function of IWOA.
Table 3 presents the best weights and final results of the ensemble model. By comparing with each single model, it indicates that the developed ensemble model can give the desired prediction results.
Comparing the ensemble model with BPNN, ANFIS, FCM and GMDH, the average RMSE of four cities at VM1 was reduced by 26.10%, 2.62%, 7.80% and 3.97%, respectively; At VM2, the average RMSE of four cities was reduced by 5.81%, 11.15%, 11.51% and 3.59%; At VM3, the average RMSE of four sites was reduced by 7.19%, 58.21%, 13.92% and 6.22%, respectively. For Residual, the average RMSE of four sites was reduced by 17.79%, 33.09%, 22.27% and 7.51%, respectively. Consequently, it can be seen that compared with the single models BPNN, ANFIS, FCM and GMDH, the forecasting result of the ensemble model is significantly improved on each VM component.
5.3. Model Performance Evaluation and Comparison
To evaluate the proposed ensemble model, three types of model comparison experiments were designed to compare the proposed ensemble model with other individual models, VMD-based models, and existing benchmark models.
5.3.1. Experiment 1: The Comparison between the Ensemble Model and VMD-Based Models
The experiment compares four VMD-based prediction models with the developed ensemble model. The four VMD-based models are VMD-BPNN, VMD-ANFIS, VMD-FCM and VMD-GMDH, which were constructed to emphasize important usages of the data decomposition technology. The corresponding improvement of the developed ensemble model and the VMD-based models are shown in
Table 4 and
Figure 2. By comparing the ensemble model with the VMD-BPNN, VMD-ANFIS, VMD-FCM and VMD-GMDH, we can conclude that the ensemble model significantly outperforms the other VMD-based models according to four evaluation criteria. For example, in Beijing, the ensemble model leads to 2.3843, 10.6660, 3.6867 and 2.1953 reductions in MAE, 5.4454, 21.3895, 11.6926 and 9.7510 reductions in RMSE, 0.3159, 11.9748, 12.3061 and 5.3553 reductions in MAPE, 5.2795, 21.3508, 11.6465 and 9.6318 reductions in TIC to compare with VMD-BPNN, VMD-ANFIS, VMD-FCM and VMD-GMDH, respectively. In addition,
Figure 2 illustrates the comparison of actual values and the forecast values. The predicted results from the developed ensemble model are better than other VMD-based models.
5.3.2. Experiment 2: The Comparison between the Ensemble Model and Individual Models
This experiment used four individual models to make comparison with the developed ensemble model. The four individual models are BPNN, ANFIS, FCM and GMDH.
Table 5 indicates the comparison forecasting results between ensemble model and other single models. From
Table 5, by comparing the ensemble model with the BPNN, ANFIS, FCM and GMDH, there are significant improvements in the predictions of the proposed model. For example, in Beijing, the ensemble model leads to 66.4829, 71.3965, 67.8848 and 82.5946 reductions in MAE, 65.7865, 73.3943, 7.7401and 81.1458 reductions in RMSE, 67.7547, 71.7270, 67.5358 and 83.5715 reductions in MAPE, 65.6355, 73.1804, 67.5553 and 80.7598 reductions in TIC to compare with BPNN, ANFIS, FCM and GMDH, respectively. Besides,
Figure 3 presents the comparison between the actual values and the forecast values. The forecast results from the developed ensemble model are better than other single models.
5.3.3. Experiment 3: The Comparison between the Proposed Model and the Existing Models
This part was conducted to further verify that the suggested hybrid decomposition–ensemble method can effectively improve performance prediction. Several existing models widely used in environmental prediction were applied to conduct comparative studies to access the suggested models. The existing models include two simple algorithms (i.e., ARIMA and RBFNN) and three hybrid algorithms (i.e., SSA-ENN, EEMD-GRNN and EEND-WOA-BPNN). The results of the comparative study are given in
Table 6 and
Figure 4. It can be seen from
Table 6 and
Figure 4 that the values of MAE, RMSE, MAPE and TIC of the developed model are all lower than the other existing models, which further shows the prediction performance of the developed ensemble model has obvious advantages. For example, comparing the proposed model with ARIMA, RBFNN, SSA-ENN, EEMD-GRNN and EEND-WOA-BPNN, the MAPE of Beijing was reduced by 94.01%, 91.14%, 90.22%, 86.73% and 69.20%, respectively. For Tianjing, the average MAPE of was reduced by 92.13%, 89.47%, 89.39%, 82.92% and 57.96%, respectively. For Baoding, the average MAPE was reduced by 94.01%, 91.14%, 90.22%, 86.73% and 69.20%, respectively. For Shijiazhuang, the average MAPE was reduced by 94.01%, 91.14%,90.22%, 86.73% and 69.20%, respectively.
In addition, the error mean and error STD are also used to evaluate the models’ accuracy and stability, and the results shows that the developed model has higher accuracy and stability than other existing models. Therefore, it can be concluded that the proposed ensemble model can be successfully and effectively employed for PM
2.5 concentration prediction compared with existing models. Furthermore, the proposed ensemble model has the following highlights compared to previous works [
15,
16]: 1. the data decomposition; 2. multi-model integration prediction; and 3. the optimized ensemble pattern weighting coefficients.
6. Conclusions
Reliable and precise PM2.5 concentration forecasting is important for air quality early warning and pollution control. Owing to uncertainties and unstable of the PM2.5 datasets, the original PM2.5 series are very difficult to forecast accurately. Thus, it is still a challenging task to predict and simulate the PM2.5 reasonably. In this study, a new hybrid decomposition–ensemble learning paradigm, which based on variation mode decomposition (VMD) and modified whale-optimization algorithm (IWOA), is proposed to predict the PM2.5 concentration. In this developed paradigm, the VMD method was employed to decompose the original PM2.5 sequence into several VM series for forecasting. The prediction results show that the single prediction model used for pollution concentration prediction has limited capability and is not appropriate for all VMs. To this end, an ensemble model, based on four individual forecasting approaches, BPNN, ANFIS, FCM and GMDH, is proposed for predict all the VM components. Furthermore, in order to ascertain the best ensemble weight coefficients, an improved Whale Optimization Algorithm, named IWOA, is proposed and the final forecasting results were achieved by reconstructing the precise sequence. The main contributions of this paper are summarized as follows: (1) A new decomposition–ensemble learning paradigm is developed for PM2.5 concentration forecasting. (2) The VMD technique is adopted to decompose the primary PM2.5 series. (3) ANFIS, ANFIS-FCM and GMDH are utilized for PM2.5 forecasting. (4) An improved heuristic algorithm, IWOA, is developed to improve the weight coefficients of the ensemble model.
To evaluate the developed model, daily PM2.5 sequence from four cities located in Jing-Jin-Ji area of China were collected as the test cases for the comparison study. The comparison results indicated that the developed ensemble model is superior to comparison models, include four VMD-based models, four individual models, two benchmark models and three existing models. Thus, the developed ensemble model provides an effective forecasting ability, especially for the highly volatile and irregular data (e.g., PM2.5 concentration) and can be a powerful tool for decision makers in air quality monitoring and early warning system.