Research and Application of an Air Quality Early Warning System Based on a Modified Least Squares Support Vector Machine and a Cloud Model

The worsening atmospheric pollution increases the necessity of air quality early warning systems (EWSs). Despite the fact that a massive amount of investigation about EWS in theory and practicality has been conducted by numerous researchers, studies concerning the quantification of uncertain information and comprehensive evaluation are still lacking, which impedes further development in the area. In this paper, firstly a comprehensive warning system is proposed, which consists of two vital indispensable modules, namely effective forecasting and scientific evaluation, respectively. For the forecasting module, a novel hybrid model combining the theory of data preprocessing and numerical optimization is first developed to implement effective forecasting for air pollutant concentration. Especially, in order to further enhance the accuracy and robustness of the warning system, interval forecasting is implemented to quantify the uncertainties generated by forecasts, which can provide significant risk signals by using point forecasting for decision-makers. For the evaluation module, a cloud model, based on probability and fuzzy set theory, is developed to perform comprehensive evaluations of air quality, which can realize the transformation between qualitative concept and quantitative data. To verify the effectiveness and efficiency of the warning system, extensive simulations based on air pollutants data from Dalian in China were effectively implemented, which illustrate that the warning system is not only remarkably high-performance, but also widely applicable.


Motivation
With the high-speed growth of the industrial economy in the past decades, atmospheric pollution has been acknowledged as one of the most serious environmental issues, because it not only threatens environmental security, but also induces adverse effects on health [1,2]. Additionally, particulate matter (PM) can also cause many environmental problems such as corrosion, soiling, damage to vegetation and reduced visibility [3]. Accordingly, modeling, forecasting and evaluating air quality play a significant and pivotal part in the early management and warning. However, although they are very vital, relevant studies regarding air quality forecasting and evaluation are still insufficient. High-efficiency forecasting for air quality has the capability to aid the public take effective initiatives to address air pollution, which can reduce the risk of falling ill and enhance living standards. Additionally, scientific evaluation of forecasting results is also an effective means to foresee the diversification of air quality levels. The assessment of air quality is a multiple criteria decision-making process, which can

Aim and Contributions
In the EWS, we designed two novel models to implement point forecasting and interval forecasting for six air pollutants, respectively. For point forecasting, a hybrid model based on the theory of complementary ensemble empirical mode decomposition (CEEMD) and least squares support vector machine (LSSVM) optimized by a modified biogeography-based optimization was successfully proposed, which was designated as CEEMD-BBODE(i.e., a combination of BBO and DE algorithms)-LSSVM. For interval forecasting, a novel interval forecasting model based on the theory of bias and variance estimation and LSSVM regression was developed for interval forecasting, which can overlook the uncertainty of future air pollutant levels and greatly reduce the probability of improper decision-making. Additionally, most papers either involve forecasting or assessment for air quality, whereas studies concerning both forecasting and comprehensive evaluation are very scarce. This paper not only implements air pollutant forecasting but also performs a comprehensive evaluation applying the theory of probability and fuzzy set, forming a novel air quality warning system. The primary step of the proposed EWS can be divided into three steps: firstly, as shown in Figure 1, the original data is decomposed into some intrinsic mode functions (IMFs) by CEEMD, and the first IMF (IMF 1 ) that possesses noise feature will be removed. Then, the preprocessed data will be reconstructed into training set and validation set. Secondly, CEEMD-BBODE-LSSVM model and interval forecasting model will be testified by the aforementioned training and validation set. Finally, cloud model will be established on the basis of air quality index and its tiered standards, and then the results of point forecasting for six air pollutants will be regard as an evaluation sample for a cloud model. After 2000 instances of numerical simulation, the final degree of certainty that a sample belongs to certain air quality rating will be determined by averaging the degrees of certainty generated by 2000 simulations. Summarizing, the main contributions of this paper are as follows: (1) A comprehensive warning system is developed firstly, which consists of a forecasting module and an evaluation module. It is proven as a remarkably effective and high-performance warning system via many numerical implementations; (2) In the forecasting module, interval forecasting, which has capability to provide more effective and credible information than point forecasting, is implemented effectively; (3) A modified optimization based on the theory of biogeography is utilized to determine the optimal parameters in LSSVM in order to achieve excellent forecasting performance in the warning system; (4) A comprehensive evaluation based on probability and fuzzy set is implemented in the EWS, which has enough capability to realize the transformation between qualitative concept and quantitative data. 4

of 33
The remainder of the paper is organized as follows: Section 2 introduces the related methodology utilized in this paper. In Section 3, modeling preparation is reported, and a detail case study that includes point forecasting, interval forecasting and comprehensive evaluation for air quality is effectively implemented. The forecasting effectiveness, implications and future considerations for the EWS are discussed in Section 4. Finally, the conclusions are put forth in the final section.

Methodology
In this section, the related methodologies of the comprehensive warning system are introduced. Modified optimization based on the theory of biogeography is utilized to optimize the parameters of five distributions for six air pollutants. As for the forecasting module, a hybrid model combining a novel decomposition means, a modified optimization and a classical LSSVM model is developed to implement point and interval forecasting for air pollutants. Additionally, in order to obtain qualitative conclusions about the forecasting results, we apply the evaluation based on the probability and fuzzy set theory to perform an overall assessment of air quality.

Distribution Functions
Statistical distribution functions were utilized to determine the basic characteristics of air pollutant concentration, from which we can penetrate into the uncertainty of air pollutants. Five distribution functions, namely Weibull, Gamma, Lognormal, Log-logistic and Inverse Gaussian were exploited to study the statistical properties of six air pollutants, which are PM2.5, PM10, O3, CO, NO2, SO2 respectively. The probabilistic distribution functions (PDF) and the cumulative distribution functions (CDF) of the aforementioned distributions are as shown in the Appendix A.

CEEMD
The empirical mode decomposition (EMD) is an adaptive time-frequency data analysis method designed for nonlinear and nonstationary signal analysis [17]. However, the mode mixing problem, a serious deficiency of the EMD, leads to its limitation in practical applications. As a consequence, many modified EMD methods devoted to signal decomposition were developed by researchers [18][19][20][21][22]. The ensemble EMD (EEMD) was developed as a noise-assisted mean, which can thoroughly eliminate the shortcomings of EMD. Time consumption in the process of analyzing large ensemble The remainder of the paper is organized as follows: Section 2 introduces the related methodology utilized in this paper. In Section 3, modeling preparation is reported, and a detail case study that includes point forecasting, interval forecasting and comprehensive evaluation for air quality is effectively implemented. The forecasting effectiveness, implications and future considerations for the EWS are discussed in Section 4. Finally, the conclusions are put forth in the final section.

Methodology
In this section, the related methodologies of the comprehensive warning system are introduced. Modified optimization based on the theory of biogeography is utilized to optimize the parameters of five distributions for six air pollutants. As for the forecasting module, a hybrid model combining a novel decomposition means, a modified optimization and a classical LSSVM model is developed to implement point and interval forecasting for air pollutants. Additionally, in order to obtain qualitative conclusions about the forecasting results, we apply the evaluation based on the probability and fuzzy set theory to perform an overall assessment of air quality.

Distribution Functions
Statistical distribution functions were utilized to determine the basic characteristics of air pollutant concentration, from which we can penetrate into the uncertainty of air pollutants. Five distribution functions, namely Weibull, Gamma, Lognormal, Log-logistic and Inverse Gaussian were exploited to study the statistical properties of six air pollutants, which are PM 2.5 , PM 10 , O 3 , CO, NO 2 , SO 2 respectively. The probabilistic distribution functions (PDF) and the cumulative distribution functions (CDF) of the aforementioned distributions are as shown in the Appendix A.

CEEMD
The empirical mode decomposition (EMD) is an adaptive time-frequency data analysis method designed for nonlinear and nonstationary signal analysis [17]. However, the mode mixing problem, a serious deficiency of the EMD, leads to its limitation in practical applications. As a consequence, many modified EMD methods devoted to signal decomposition were developed by researchers [18][19][20][21][22]. The ensemble EMD (EEMD) was developed as a noise-assisted mean, which can thoroughly eliminate the shortcomings of EMD. Time consumption in the process of analyzing large ensemble means and suffering from the residual of the added white noise are remarkable deficiencies in EEMD, even though EEMD has the capability to address the problem of mode mixing effectively. In order to remove these inherent defects of EEMD and improve its calculation efficiency, CEEMD was established by Yeh et al. [23]. As a noise-improved method, the CEEMD not only overcomes the mode mixing problem, but also eliminates the residual added white noise persisting into the IMFs and enhances the calculation efficiency of the EEMD method [24]. In order to eliminate the weaknesses in EMD and EEMD, the CEEMD appends a pair of white Gaussian noises to the original signal, which can make the algorithm save more computing time and lessen the final white noise residue at the same time. The essential steps of CEEMD are as follows: (1) Given that a single white noise has no enough capability to solve all intermittent signals, we established a positive mixture f 1 (t) and a negative mixture f 2 (t) via appending a pair of white noise (±ε n (t)) to the original signal: (2) Afterward, k ij + and k ij − are two ensembles of IMFs acquired from decomposing the positive and negative mixtures by the EMD, and k ij + or k ij − is the jth IMF acquired via additive of the ith positive noise or negative noise. (3) Then, the final IMF is computed by: (4) (Accordingly, the original signal f (t) can be indicated via: where r n (t) is the n-th residue (i.e., local trend).

The Modified BBO Algorithm
Biogeography-based optimization (BBO) was originally proposed by Simon [25]. The algorithm stems from a natural process, which can be utilized to address optimization problems in many fields concerning sensor selection [25], power system optimization [26,27], groundwater detection [28] and satellite image classification [29]. The BBO algorithm builds a habitat migration pattern based on probability according to the geographical distribution characteristics of species, in which individuals can probabilistically share information based on a habitat suitability index, and the inferior individuals can be improved by obtaining information from superior individuals. The BBO is an global optimization algorithm that possess powerful exploration capability for the current populations, while its global exploitation capability is poor. On the contrary, differential evolution (DE) possesses commendable exploitation capability, implements effective searches of the decision variable space and can avoid local convergence. To enhance the global exploitation capability of the BBO algorithm, this work proposes a novel modified BBO algorithm in which DE was added to the BBO algorithm when the number of iterations is even, and we designated the modified BBO algorithm as BBODE algorithm, which is essentially a combination of a BBO algorithm and a DE algorithm. The detail pseudo-code of our BBODE algorithm can be seen in Appendix A.
Additionally, there are four migration strategies among single islands in the BBODE algorithm, namely, the cosine model, quadratic model, exponential model, linear model, respectively. The linear model is the most commonly used one in practice. In the algorithm test section we discuss what kind of strategy has the most outstanding performance in the global optimization process. This paper provides four migration strategies in detail, which computational formulas are as shown in Equations (4)-(7) respectively: Cosine model: Quadratic model: Exponential model: Linear model: where I denotes maximum possible immigration rate, which will occur when there are no species in the habitat. E represents maximum possible emigration rate, which will happen when the habitat reaches its maximum environment capacity. The terms λ and µ express the probability of immigration and emigration, respectively. n denotes the maximum number of species, and k represents the number of species on the kth island.

LSSVM
Support vector machine (SVM), a significant branch of machine learning, was proposed by Vapnik [30] on the basis of statistical learning theory, and is an effective means to address pattern recognition and classification missions. The LSSVM based on the structural risk minimization principle is an extension of SVM, which applies the linear least squares criteria to the loss function instead of inequality constraints [31]. In fact, the LSSVM, which spends less computation time than SVM in practice, possesses effective capability in forecasting fields. More details on LSSVM can be found in [32].
It is noteworthy that different types of Mercer kernel function will consequentially generate different LSSVM models. Sigmoid, polynomial and radial basis function (RBF) are frequently-used kernel function for LSSVM model. In [33], the RBF is a prevalent choice for the kernel function on account of the fewer parameters to be set and superior capability in application. Accordingly, this work determined the RBF as the appropriate kernel function: Consequently, in this paper the parameters (i.e., σ, γ) in the LSSVM model were optimized by our modified BBO algorithm to achieve high-performance forecasting.

Interval Forecasting Based on LSSVM
The LSSVM tool not only implements effective point forecasting, but also performs outstandingly in interval forecasting, which has capability to quantify the uncertainty for point forecasting. In this paper, the LSSVM toolbox in MATLAB provided by De et al. (http://www.esat.kuleuven.be/sista/ lssvmlab/) was utilized to carry out interval forecasting for air pollutants. The construction of the forecasting intervals are based on the central limit theorem for linear smoothing combined with bias correction and variance estimation. Details of the code of LSSVM for interval forecasting can be obtained from the aforementioned website, and accordingly here we only a brief description on its steps: Step 1: utilize original data to train the LSSVM model based a RBF basis function.
Step 2: calculate the smoother matrix for LSSVM. Step 3: compute the conditional basis and conditional variance.
Step 4: set up the significance level.
Step 5: obtain forecasting intervals for this fixed significance level. More details about interval forecasting using LSSVM can be found in [34].

Normal Cloud Model Applied for Air Quality Evaluation
A novel hybrid model integrating randomness and fuzziness, namely the cloud model, based on the theory of probability and fuzzy set, presented by Li et al. [35], is an effective cognitive model based on the conversion between qualitative concept and quantitative data, which is applied in many fields. Randomness and fuzziness are generally considered in the evaluation. The cloud model possesses the joint properties of randomness and fuzziness, which are more effective and comprehensive than single randomness or fuzziness model [36]. In Figure 2, the x-axis and y-axis of normal cloud denote one kind of air pollutant and a certain degree of air quality, respectively. outstandingly in interval forecasting, which has capability to quantify the uncertainty for point forecasting. In this paper, the LSSVM toolbox in MATLAB provided by De et al. (http://www.esat.kuleuven.be/sista/lssvmlab/) was utilized to carry out interval forecasting for air pollutants. The construction of the forecasting intervals are based on the central limit theorem for linear smoothing combined with bias correction and variance estimation. Details of the code of LSSVM for interval forecasting can be obtained from the aforementioned website, and accordingly here we only a brief description on its steps: Step 1: utilize original data to train the LSSVM model based a RBF basis function.
Step 2: calculate the smoother matrix for LSSVM. Step 3: compute the conditional basis and conditional variance.
Step 4: set up the significance level.
Step 5: obtain forecasting intervals for this fixed significance level. More details about interval forecasting using LSSVM can be found in [34].

Normal Cloud Model Applied for Air Quality Evaluation
A novel hybrid model integrating randomness and fuzziness, namely the cloud model, based on the theory of probability and fuzzy set, presented by Li et al. [35], is an effective cognitive model based on the conversion between qualitative concept and quantitative data, which is applied in many fields. Randomness and fuzziness are generally considered in the evaluation. The cloud model possesses the joint properties of randomness and fuzziness, which are more effective and comprehensive than single randomness or fuzziness model [36]. In Figure 2, the x-axis and y-axis of normal cloud denote one kind of air pollutant and a certain degree of air quality, respectively. Ex denotes the expectation for the quantitative values presenting the level of air quality. En indicates the scope of a universe, which can be accepted by the level of air quality. He is a measurement for the variation of certainty degree from evaluations. The comprehensive workflow of the cloud model for air quality evaluation is illustrated in Figure 3, and includes five steps. Ex denotes the expectation for the quantitative values presenting the level of air quality. En indicates the scope of a universe, which can be accepted by the level of air quality. He is a measurement for the variation of certainty degree from evaluations. The comprehensive workflow of the cloud model for air quality evaluation is illustrated in Figure 3, and includes five steps.  Determining the air quality criterion (i.e., PM2.5, PM10, O3, CO, NO2, SO2) is the first step. The second step is to determine the parameters (i.e., Ex, En, He) in the cloud model. The third step is to compute the hybrid entropy, i.e., the analytic hierarchy process (AHP) weights. Transforming the observed data into cloud models repeatedly to achieve the distributions of certainty degrees is the fourth step. The fifth step is to calculate the mean of the certainty degrees and obtain the final air Determining the air quality criterion (i.e., PM 2.5 , PM 10 , O 3 , CO, NO 2 , SO 2 ) is the first step. The second step is to determine the parameters (i.e., Ex, En, He) in the cloud model. The third step is to compute the hybrid entropy, i.e., the analytic hierarchy process (AHP) weights. Transforming the observed data into cloud models repeatedly to achieve the distributions of certainty degrees is the fourth step. The fifth step is to calculate the mean of the certainty degrees and obtain the final air quality level.
The evaluation of air quality is a multi-criteria decision-making process, and the air quality criteria are shown in Table 1. How to properly address steps 2-5 is our primary concern. In this paper, we adopt Equation (9) to compute the cloud model parameters: where B max and B min present the upper bounds and lower bounds of a qualitative concept, which is essentially the grade of an air pollutant criterion. Parameter k has the capability to determine the degree of atomization for a normal cloud. Herein, the parameter k is supposed as 0.1 to achieve a balance between variation and robustness in the evaluation. It is worthy to note that the B max of PM 2.5 , PM 10 , O 3 , CO, NO 2 , SO 2 on the level VI is non-existent. Herein, we utilized a polynomial regression to obtain the pseudo-bounds.
It is significant to emphasize that the half normal cloud model, which is the half of a normal cloud model, was exploited on the highest and lowest level for all criteria, as the certainty degree in this interval is monotonous. As the observed data is beyond the pseudo-bound, the corresponding certainty degree is 1.
The AHP method is widely applied in multi-criteria decision-making processes. Olvera et al. applied the AHP method to estimate the weights (z i ) of PM 2.5 , PM 10 , O 3 , CO, NO 2 , SO 2 in the evaluation of air quality in Mexico City, which are 0.3, 0.3, 0.233, 0.1, 0.033, 0.033, respectively [17]. However, the AHP method has the inherent deficiency of being sensitive to the potential subjective uncertainty. In order to mitigate the influence of the subjective uncertainty in AHP and regional differences, a hybrid computational method of weights integrating entropy was presented. In the assessment of air quality, the entropy of air pollutant data (e t ) can be computed by Equation (10). Then, the AHP weights based on entropy of ith criteria ω i can be obtained, which is on the basis of normalized entropy (E i ) [37]. Additionally, the E i and ω i can be computed by Equations (11) and (12), respectively.
where F t denotes the frequency of ith interval. e i , namely entropy, represents the uncertainty of observed data for a criterion with T intervals. C represents the number of criteria.
To balance the latent uncertainty of subjectivity in the AHP method, a novel entropy-AHP method was proposed, which can be calculated via Equation (13). Then, the certainty degree U for a level of one criterion can be obtained using Equation (14): where µ i denotes the certainty degree computed by cloud model for each criterion.

Simulation Modeling and Analysis
In this section, modeling preparations are briefly introduced. A function test is implemented to verify the performance of the BBO and BBODE algorithms. The distribution function parameters for six air pollutants are estimated using BBO and BBODE, respectively. Point and interval forecasting are performed to infer the trends of air pollutants in the future. A comprehensive air quality evaluation is implemented by applying the cloud model.

Modeling Preparations
In this section, the study site, data source and fitness function are briefly described. Six metrics are employed to evaluate the performance of point forecasting and interval forecasting. Finally, a D-M test is used to test the forecasting performance.

Study Site and Data Source
In this paper, the Chinese city of Dalian (latitude and longitude 120 • 58 -123 • 31 and 38 • 43 -40 • 10 ) was selected as the study site for the EWS. It is located in the extreme south of the Liaodong Peninsula. The area of Dalian is 12,573.85 square kilometers. The population of the city is 6.6904 million, and the population density is 464 per square kilometer. In recent years, with the rapid development of the industrial economy of the city, air pollution has been increasingly worsening, which has becomes a growing concern of the public. The deteriorating air quality has increased the incidence of cardiovascular, asthma and lung disease among the public, especially for the elderly and children, which has increased the necessity of an air quality EWS. The existing air quality EWS in the city focuses on monitoring and lacks effective forecasting and comprehensive pollution evaluation, which hinders the development of an effective air quality EWS. Additionally, there is little research on the topic of air quality EWSs in Dalian, and the existing literature puts particular emphasis on cause analysis and air quality indexes, therefore, we chose Dalian as the study site for air quality EWS design.
The hourly air pollutants data were collected from a website (http://wat.epmap.org/), which is engaged in the collection of environmental data. Data concerning articulate matters (PM 2.5 , PM 10 ), ozone (O 3 ), carbon monoxide (CO), nitrogen dioxide (NO 2 ), sulfur dioxide (SO 2 ), as six common air pollutants, were collected from Dalian in the aforementioned website, and were utilized to validate the performance of forecasting models and implement a comprehensive air quality evaluation for the city. Figure 4 shows the study data for the six air pollutants in Dalian, which was divided into a training subset and a testing subset. engaged in the collection of environmental data. Data concerning articulate matters (PM2.5, PM10), ozone (O3), carbon monoxide (CO), nitrogen dioxide (NO2), sulfur dioxide (SO2), as six common air pollutants, were collected from Dalian in the aforementioned website, and were utilized to validate the performance of forecasting models and implement a comprehensive air quality evaluation for the city. Figure 4 shows the study data for the six air pollutants in Dalian, which was divided into a training subset and a testing subset.

The Fitness Function for the CEEMD-BBODE-LSSVM Model
Establishing a proper fitness function is very crucial for the BBODE algorithm, which can build a connection between LSSVM model and the BBODE algorithm and improve the performance of LSSVM via searching for the optimal LSSVM parameters. The fitness function represents the mean of the forecasting error, which is gradually decreasing during the process of searching for the optimal

The Fitness Function for the CEEMD-BBODE-LSSVM Model
Establishing a proper fitness function is very crucial for the BBODE algorithm, which can build a connection between LSSVM model and the BBODE algorithm and improve the performance of LSSVM via searching for the optimal LSSVM parameters. The fitness function represents the mean of the forecasting error, which is gradually decreasing during the process of searching for the optimal LSSVM parameters until the fitness value satisfies the end condition. In this paper, the fitness function was defined as follows: where MSE denotes the mean square error between target and forecasting values and y and represent the target values and forecasting values, respectively.

The Performance Metric
To determine quantitatively which forecasting model is optimal is our main concern. In this paper, six statistical criteria were utilized to investigate the accuracy and efficiency for point and interval forecasting. Four metrics as shown in Table 2 were used to evaluate the accuracy of point forecasting, which are mean absolute error (MAE), mean absolute percentage error (MAPE), root mean square error (RMSE) and goodness of fit (R 2 ), respectively. Two criteria were adopted to validate the effectiveness of interval forecasting, which are the coverage probability (CP) and average width (AW), respectively.

Metric
Definition Equation

MAE
Mean absolute error y i and y i denote the actual values and forecasting values, respectively. represents the average of actual values. The R 2 was also utilized to evaluate the fitness performance in the process of distribution fitting, where y i , y i and represent the observed cumulative probability, estimated cumulative probability and the average of the observed cumulative probability, respectively.
CP is a vital metric for interval forecasting, which is evaluated via reckoning the amount of target points within the constructed forecasting intervals. It can verify the effectiveness of interval forecasting with the corresponding significance level (a). Theoretically, the forecasting intervals are valid if CP ≥ (1 − a)%. If not, the implementation of interval forecasting is invalid. AW provides a measurement of the informativeness for interval forecasting. In theory, the narrow AW can provide greater information value than the wide AW: where L t and U t represent the lower and upper bounds of the ith interval forecasting respectively. y i denotes target points.

D-M Test
The D-M test, first proposed by Diebold and Mariano [38], can be utilized to determine whether there is a significant difference among samples. The D-M statistic is defined as follows: where ε t (1) and ε t (2) denote forecasting errors from two competing models in this paper. Each forecast accuracy is evaluated via an appropriate loss function F, and the prevalent loss functions are the square error function and absolute deviation function [39]. S 2 is a variance estimator of V t = F(ε t (1) ) − F(ε t (2) ).
The null hypothesis and alternative hypothesis of D-M test method are as follows: In the null hypothesis circumstance, DM follows the standard normal distribution N (0, 1). The null hypothesis will be rejected if |DM| > z α/2 , which means that there is significant difference among samples.

Numerical Analysis of the BBO and BBODE Algorithms
An excellent optimization algorithm should possess the ability of global exploration and local exploitation. To enhance the efficiency of the BBO algorithm, the BBODE algorithm was proposed in this paper. In order to investigate the performance of the BBODE algorithm, the implemented functions tests are described in this section. Six functions as shown in the Appendix A were exploited to validate the capabilities of exploration and exploitation for the BBO and BBODE algorithms. In order to implement an effective and fair comparison between the BBO and BBODE algorithms, each test function was optimized independently 20 times and we initialized random populations in the same way for the different algorithms. The average of the optimal value in each experiment and standard deviation were computed after numerical experiments. All numerical simulations were performed on the platform of MATLAB R2014b for Windows 7 with a 3.30 GHz Intel Core i5, 64 bit CPU and 8 GB RAM. The experimental parameters of BBO and BBODE are shown in Table 3. The numerical analysis conclusions can be summarized by studying Table 4, which exhibits the results of different test functions with different dimensions, which can sufficiently show that the BBODE algorithm generally has a significant superiority over the BBO algorithm. From the detailed information in Table 4, the BBODE algorithm can search for an optimal solution for a sphere function with dimensions of 5 and 10, a Rosenbrock function with dimensions of 2, a Rastrigin function with the dimensions of 2 and 5, a Shaffer function with dimensions of 2, and a Griewank function with dimensions of 2. Considering the elapsed time, BBODE is slightly more time-consuming than BBO. However, considering comprehensively the elapsed time, accuracy and standard deviation, BBODE is still more superior to BBO. Accordingly, the BBODE algorithm was proven to be an efficient and robust optimization algorithm.
Additionally, four kinds of migration strategies (i.e., cosine model, quadratic model, exponential model, linear model) in the BBODE algorithm are discussed in this section. Six test functions with different dimensions as shown in Figure 5 are utilized to validate the efficiency of the four strategies. Figure 5 clearly shows that the performance and convergence speed for the four strategies in the migration process, from which it is clearly evident that the cosine model possesses superior performance. Consequently, the cosine model, was adopted as an efficient migration strategy in our BBODE algorithm.

The Distributional Characteristics of the Air Pollutants
Studying the distributional characteristic of air pollutants is an important task, which can reveal the nature and statistical properties of air pollutant data. Six distributions were adopted to perform the analysis of the distribution characteristics of the air pollutants, which are shown in the Appendix A.
The distribution function parameters are commonly estimated by the ways of minimum least

The Distributional Characteristics of the Air Pollutants
Studying the distributional characteristic of air pollutants is an important task, which can reveal the nature and statistical properties of air pollutant data. Six distributions were adopted to perform the analysis of the distribution characteristics of the air pollutants, which are shown in the Appendix A.
The distribution function parameters are commonly estimated by the ways of minimum least square (MLS) and maximum likelihood estimation (MLE). In [9], the experimental results show that artificial intelligent optimization is superior to MLS or MLE in the process of searching for optimal distribution parameters. Accordingly, in this paper, we utilized artificial intelligence optimization to search for the optimal distribution function parameters.
In the function test section, the BBODE algorithm has high performance in the parameter optimization process. Here, the BBODE and BBO algorithms were utilized to search for the optimal distribution function parameters, and we performed a comparison between the performance of the BBODE and BBO algorithms. Table 5 reveals the estimated distribution function parameters obtained for the six air pollutants utilizing the BBODE and BBO algorithms. Goodness of fit (R 2 ) is adopted to evaluate the fitting performance using different distribution functions and different optimization methods. A larger value indicates better fitting performance. Table 6 presents the R 2 using different artificial intelligent optimization methods, from which can be concluded that the fitting performance using BBODE exceeds the performance of fitting using BBO. Figure 6 shows the combination of frequency histograms and the fitted distributions for six air pollutants. It can be concluded that Inverse Gaussian function performs superior performance in the process of fitting for PM 2.5 , PM 10 , SO 2 on the reason that the corresponding R 2 is larger than other distributions. The Gamma function is suitable to implement fitting for O 3 and NO 2 , and Log-logistic distribution is appropriate for fitting the CO data based on the aforementioned reasons.

The Point Forecasting for Air Pollutants
In this section, the proposed hybrid CEEMD-BBODE-LSSVM model was used to implement point forecasting. CEEMD, as a novel decomposition ensemble methodology, was adopted to decompose the original air pollutants data into several IMFs. The parameter setting of CEEMD is as follows: the total number of IMFs and residuals to be decomposed is 8, the standard deviation of added white noise in each ensemble is 0.4, the ensemble number is 200. In actual application, the first IMF will be removed, and the remaining IMFs will be added to construct a new dataset that is used for training and testing the model. The performance of LSSVM is very sensitive to the parameters (i.e., σ, γ). Therefore, the BBODE algorithm was applied to optimize the parameters in the LSSVM model in order to obtain high-performance forecasting accuracy. The forecasting work was actualized by LSSVM, which is an excellent forecasting tool in many fields. The air pollutants data from Dalian was utilized to test the performance of the proposed hybrid model, which were divided into training subset and testing subset as clearly shown in Figure 4. Table 5. Parameters of the different distributions based on the different optimized algorithm. In Table 5, a and b represent scale and shape parameters of distribution functions, respectively.

Indexes
Optimized Algorithm  Table 6. R 2 of different distribution using different optimized algorithm. The data in bold denotes that it is largest in each line of Table 6, which represents the optimal R 2 of distribution fitting.     Tables 7 and 8, the MAE, MAPE, RMSE of LSSVM, EEMD-LSSVM, CEEMD-LSSVM, CEEMD-BBODE-LSSVM are decreasing as a whole, which indicates that CEEMD-BBODE-LSSVM has better performance than the considered benchmark models. The R 2 of LSSVM, EEMD-LSSVM, CEEMD-LSSVM, CEEMD-BBODE-LSSVM for PM 2.5 forecasting increases progressively, which illustrates that the proposed hybrid model CEEMD-BBODE-LSSVM has superior forecasting capability than the other benchmark models. Similarly, the forecasting performance of CEEMD-BBODE-LSSVM for PM 10 , O 3 , CO, NO 2 , SO 2 is still superior to that of the other benchmark models. As for decomposition method, compared to models without CEEMD, the models with CEEMD show significant improvements, which illustrates that CEEMD is actually an excellent tool for de-noising. For example, in the forecasting of PM 2.5 , PM 10 , O 3 , CO, NO 2 , SO 2 in Jul. in Table 7, compared with LSSVM, the MAPE of CEEMD-LSSVM reflects 9.48%, 7.41%, 4.54%, 3.29%, 8.17%, 10.46% improvement, respectively, and the MAPE of CEEMD-LSSVM reflects 2.77%, 1.71%, 1.29%, 0.75%, 1.96%, 2.21% improvement, respectively, compared with EEMD-LSSVM. As for optimization, when making a comparison between CEEMD-LSSVM and CEEMD-BBODE-LSSVM for the six air pollutants in Tables 7  and 8, CEEMD-BBODE-LSSVM indicates an improvement in forecasting accuracy for CEEMD-LSSVM, which denotes the BBODE algorithm has better performance in the application of searching for optimal solutions for forecasting models. The aforementioned comparative analysis demonstrates that the CEEMD-BBODE-LSSVM model is superior to the benchmark models mentioned in this section. In order to be more clearly illustrate the forecasting performance of all models, we selected the first three days in July to make a visualization, which contain 35 test samples for the air pollutants, respectively. Figure 7 exhibits the comparison of forecasting values based on all models, which shows that the proposed hybrid CEEMD-BBODE-LSSVM model is more accurate and robust. From Figure 7, there is strong correlation between PM 2.5 and PM 10 on the reason of the similarity of forecasting results. From the black dotted line in Figure 7, it can be concluded that the CEEMD-BBODE-LSSVM model has outstanding capacity for outlier forecasting. Given the superior performance of the hybrid model in different forecasting environments, we concluded that the hybrid forecasting model has comprehensively wider applicability, effectiveness, compatibility.

The Interval Forecasting for Air Pollutants
The quantification of uncertainty, namely interval forecasting, plays a significant part in air quality EWSs, which can provide more credible and dynamic forecasting results. In this paper, the constructed nonsymmetrical forecasting intervals were generated by LSSVM since the point forecasting has weak capability to address the uncertainties in the forecasting process. Quantitative measures (i.e., AW, CP) are commonly used for evaluating the performance of interval forecasting, which are affected by the different significance level settings.
In theory, the constructed forecasting interval is effective if the condition that the CP is larger or equal to its corresponding confidence level is satisfied. Table 9 reports the numerical results of interval forecasting using the metrics CP and AW quantitatively. Int. J. Environ. Res. Public Health 2017, 14, x FOR PEER REVIEW 20 of 32 Figure 7. The comparison of forecasting performance for air pollutants in July.

The Interval Forecasting for Air Pollutants
The quantification of uncertainty, namely interval forecasting, plays a significant part in air quality EWSs, which can provide more credible and dynamic forecasting results. In this paper, the constructed nonsymmetrical forecasting intervals were generated by LSSVM since the point forecasting has weak capability to address the uncertainties in the forecasting process. Quantitative measures (i.e., AW, CP) are commonly used for evaluating the performance of interval forecasting, which are affected by the different significance level settings.
In theory, the constructed forecasting interval is effective if the condition that the CP is larger or equal to its corresponding confidence level is satisfied. Table 9 reports the numerical results of interval forecasting using the metrics CP and AW quantitatively. From Table 9, the CP is larger than the corresponding confidence level in most constructed intervals, which remarkably demonstrates that the constructed intervals are valid. It is noteworthy that there is a regular pattern where the interval forecasting width will be smaller when the significance level is increasing gradually, which was displayed schematically in Figure 8 as an illustrative example. The smaller the significance is, the larger the interval forecasting width is. It can observed that the interval forecasting has the best performance when the significance level is 0.05. However, in this situation, it is hard to determine precise values for forecasting when the interval forecasting width is large. The effectiveness of interval forecasting declines when the significance level is increasing. Theoretically, the optimal interval forecasting occurs on actual application and meteorological conditions. For example, the AW can be squeezed if the weather is stable, and AW can be enlarged if the weather is unstable. illustrative example. The smaller the significance is, the larger the interval forecasting width is. It can observed that the interval forecasting has the best performance when the significance level is 0.05. However, in this situation, it is hard to determine precise values for forecasting when the interval forecasting width is large. The effectiveness of interval forecasting declines when the significance level is increasing. Theoretically, the optimal interval forecasting occurs on actual application and meteorological conditions. For example, the AW can be squeezed if the weather is stable, and AW can be enlarged if the weather is unstable. In order to clearly illustrate the interval forecasting results, we adopted the first 100 test samples in July and August to create a visualization, which can be seen in Figure 9. In order to clearly illustrate the interval forecasting results, we adopted the first 100 test samples in July and August to create a visualization, which can be seen in Figure 9.
However, in this situation, it is hard to determine precise values for forecasting when the interval forecasting width is large. The effectiveness of interval forecasting declines when the significance level is increasing. Theoretically, the optimal interval forecasting occurs on actual application and meteorological conditions. For example, the AW can be squeezed if the weather is stable, and AW can be enlarged if the weather is unstable. In order to clearly illustrate the interval forecasting results, we adopted the first 100 test samples in July and August to create a visualization, which can be seen in Figure 9. In Figure 9, given the informativeness evaluated by CP and correctness assessed by AW in Table  9, we used significance levels of 0. 2, 0.2, 0.2, 0.1, 0.1, 0.1 corresponding to PM2.5, PM10, O3, CO, NO2, SO2 in July to implement interval forecasting, respectively. From Figure 9, it can be observed that most of the actual values are located within the forecasting intervals, which indicates that the In Figure 9, given the informativeness evaluated by CP and correctness assessed by AW in Table 9, we used significance levels of 0.2, 0.2, 0.2, 0.1, 0.1, 0.1 corresponding to PM 2.5 , PM 10 , O 3 , CO, NO 2 , SO 2 in July to implement interval forecasting, respectively. From Figure 9, it can be observed that most of the actual values are located within the forecasting intervals, which indicates that the efficiency of interval forecasting is theoretically valid. A reference about the hazard using point forecasting will be provided to decision-makers since the uncertainties for forecasting are quantified within the forecasting intervals. Accordingly, the proposed interval forecasting model can provide a tradeoff between effectiveness and informativeness, which is of great importance to formulate scientific policy on early air quality warnings.

Comprehensive Evaluation Implementation
Air quality evaluation is a multiple criteria decision-making process, and the cloud model has outstanding capability to address the fuzziness and randomness in the evaluation process. In this section, a comprehensive evaluation using the cloud model is effectively performed. In the evaluation process, the forecasting values generated by CEEMD-BBODE-LSSVM were regarded as samples to participate in the evaluation, which plays a vital part in EWS.

Evaluation Preparation
Before evaluation, there are some vital sections that need to be prepared, which consist of criteria for air quality, pseudo-boundary for all criteria, parameters in the cloud model, and weights, respectively. The criteria for air quality evaluation are as shown in aforementioned Table 1. The parameters of the cloud model were calculated by Equation (9), and can be seen in Table 10. It is worthy to note that B max is missing for all level VI criteria, so in this paper we used a polynomial regression to obtain them. The detailed information on the polynomial regression for B max in level VI for all criteria is shown in Table 11. The weights generated by the hybrid entropy-AHP method for all criteria are reported in Table 12.

Evaluation Implementation
After preparation of the cloud model, a comprehensive assessment was effectively implemented. For the sake of simplicity, we extracted none samples from the testing subset to perform a comprehensive assessment utilizing the cloud model, which is shown in Table 13. To enhance the accuracy and robustness, each sample was evaluated over 2000 times, and the mean of the distribution of certainty degree was adopted to determine the final certainty degree. The final air quality levels were attained with the maximum certainty degree, which presents the most possible membership. The final evaluation results for all cases are reported in Table 14. According to aforementioned Table 2, air quality can be classified in six levels: namely excellent, good, light pollution, moderate pollution, heavy pollution, serious pollution. From Table 14, the air quality of A 1 , A 3 , A 7 , A 8 is at level I. A 2 , A 4 , A 9 are belong to level II. A 5 and A 6 are belong to levels IV and V, respectively. It is worthy to note that the certainty degree 0 in Table 14 indicates that there is no membership at the level. In order to illustrate the distribution pattern, we took case A 4 in Table 13 as an illustrative example. In Figure 10, certainty degrees with different distribution patterns at each level for case A 4 can be seen. The certainty degree is maximum on the level II for case A 4 , which indicates that case A 4 belongs to level II. Additionally, when making a comparison among the cases that belong to the same level, more information rather than the simple final level can be provided by the certainty degree. For example, although cases A 2 , A 4 , A 9 belong to the same level II, their certainty degrees are different. The certainty degree of belonging to level II of cases A 2 , A 4 , A 9 are 0.5421, 0.6136 and 0.3589, respectively, which allows us to reach the conclusion that case A 4 is more likely to be level II than cases A 2 , A 9 . The aforementioned discussion revealed that cloud model can not only determine the air quality level, but also further expresses the relative severity of air quality at the same level. example. In Figure 10, certainty degrees with different distribution patterns at each level for case A4 can be seen. The certainty degree is maximum on the level II for case A4, which indicates that case A4 belongs to level II. Additionally, when making a comparison among the cases that belong to the same level, more information rather than the simple final level can be provided by the certainty degree. For example, although cases A2, A4, A9 belong to the same level II, their certainty degrees are different. The certainty degree of belonging to level II of cases A2, A4, A9 are 0.5421, 0.6136 and 0.3589, respectively, which allows us to reach the conclusion that case A4 is more likely to be level II than cases A2, A9. The aforementioned discussion revealed that cloud model can not only determine the air quality level, but also further expresses the relative severity of air quality at the same level. Table 15 shows the D-M test results on the basis of MAE loss function, from which a summary can be obtained as follows: in the forecasting of all pollutants, the D-M values of LSSVM, EEMD-LSSVM are larger than the upper bound of 1% significance level, which illustrates that CEEMD-BBODE-LSSVM is significantly superior to the LSSVM, EEMD-LSSVM model. Additionally, the D-M values for CEEMD-LSSVM are generally larger than the upper bound of 5% significance level, which denotes the proposed CEEMD-BBODE-LSSVM hybrid model has better performance than CEEMD-LSSVM in most cases. Obviously, the proposed hybrid model outperforms other benchmark models generally.    Table 15 shows the D-M test results on the basis of MAE loss function, from which a summary can be obtained as follows: in the forecasting of all pollutants, the D-M values of LSSVM, EEMD-LSSVM are larger than the upper bound of 1% significance level, which illustrates that CEEMD-BBODE-LSSVM is significantly superior to the LSSVM, EEMD-LSSVM model. Additionally, the D-M values for CEEMD-LSSVM are generally larger than the upper bound of 5% significance level, which denotes the proposed CEEMD-BBODE-LSSVM hybrid model has better performance than CEEMD-LSSVM in most cases. Obviously, the proposed hybrid model outperforms other benchmark models generally.

The Forecasting Effectiveness Based on D-M Test
In this paper, a D-M test was utilized to distinguish the difference between error series generated by a benchmark forecasting model and a target forecasting model, respectively, which has the capability to verify the point forecasting performance for different forecasting models.

The Public Health Implications of the EWS
There is few air quality EWS studies in China, which mainly depends on weather research and forecasting models (WRFs). However, WRFs are faced with many challenges in current applications, such as high costs, heavy workload and the difficulty of model debugging in a short time. Additionally, WRFs are usually implemented in the form of grids, and their local forecasting capability is poor. The proposed EWS for Dalian is based on artificial intelligence theory. High precision and scientific evaluation of the EWS in practical application was verified via the aforementioned numerical simulations. The forecasting and evaluation modules in the proposed EWS can be integrated into the existing air monitoring system in Dalian, which will promote the development of an EWS of air quality and provide more warning information for the public. Furthermore, effective warnings about air quality are conducive to lowering the incidence of public health diseases, such as lung, asthma or cardiovascular disease.

Future Considerations for the Air Quality EWS
In the comparison of EWSs, the factors of effectiveness, efficiency, cost and precision are frequently considered. Although the proposed EWS shows admirable performance in the tasks of forecasting and evaluation, the presented system merely involves empirical models and does not involve the deterministic models mentioned in literature reviews and WRFs. In order to get better performance in the EWS, integration of empirical models, deterministic models and WRFs is necessary in the future, which will combine the respective merits of the three models as much as possible and strengthen the scientific basis of the EWS. Additionally, in order to enhance the practicability of the EWS, it is necessary to establish an information platform on the EWS.

Conclusions
Establishing a comprehensive air quality warning system plays a particularly crucial role due to the increasing levels of atmospheric pollution. However, how to establish an effective warning system that has best performance is not only a challenging technical assignment, but also a noticeable concern for the public. In this paper, a comprehensive warning system was developed successfully, which consists of effective forecasting and scientific evaluation, respectively. For the forecasting module, a novel hybrid forecasting model, namely CEEMD-BBODE-LSSVM, is proposed for point forecasting.
To simplify the complexity of the original data, the series of air pollutants are decomposed into several IMFs using CEEMD, which can be reconstructed by the way of removing high-frequency signals. However, no theory can determine the proper number of IMFs so far, which may be an aspect for future investigations. The BBODE algorithm, as a modified BBO algorithm, is utilized to search for the optimal LSSVM parameters in order to achieve a desirable forecasting performance. The simulation results reveal that the hybrid model is remarkably superior to all benchmark models mentioned on the basis of four metrics (MAE, MAPE, RMSE, R 2 ). However, point forecasting cannot directly provide the uncertainty information, which means that the decision-maker must bear great risk when using point forecasting. Accordingly, to improve the accuracy and robustness of the forecasting performance, interval forecasting is implemented with the purpose of quantifying the inherent uncertainties, which has the capability to provide malleable information for the future trends of pollutants. Accordingly, it is significant to integrate the point forecasting and interval forecasting, which is essential for optimally regulating air quality. For the evaluation module, air quality is evaluated comprehensively applying a normal cloud model based on entropy-AHP theory, which also plays a vital part in this warning system. Additionally, a multiple dimension cloud model, as an extension of the one dimensional cloud model, is a promising evaluation method, which is a worthy study topic for the future. In this paper, the study of an EWS for air quality is still in a starting phase, which merely involves one-step-ahead forecasting. More exploration on multi-step-ahead forecasting and combination forecasting in theory and practicality should be extensively implemented in the future. Author Contributions: Jianzhou Wang designed the experiment of warning system for air quality and wrote the manuscript. Tong Niu made the program in MATLAB and analyzed the data. Rui Wang provided critical review and manuscript editing. All authors read and approved the final manuscript.

Conflicts of Interest:
The authors declare no conflict of interest.  Gaussian   Table A1. The PDF and CDF of five kinds of distributions. Inverse Gaussian