A Novel Air Quality Early-Warning System Based on Artificial Intelligence

The problem of air pollution is a persistent issue for mankind and becoming increasingly serious in recent years, which has drawn worldwide attention. Establishing a scientific and effective air quality early-warning system is really significant and important. Regretfully, previous research didn’t thoroughly explore not only air pollutant prediction but also air quality evaluation, and relevant research work is still scarce, especially in China. Therefore, a novel air quality early-warning system composed of prediction and evaluation was developed in this study. Firstly, the advanced data preprocessing technology Improved Complete Ensemble Empirical Mode Decomposition with Adaptive Noise (ICEEMDAN) combined with the powerful swarm intelligence algorithm Whale Optimization Algorithm (WOA) and the efficient artificial neural network Extreme Learning Machine (ELM) formed the prediction model. Then the predictive results were further analyzed by the method of fuzzy comprehensive evaluation, which offered intuitive air quality information and corresponding measures. The proposed system was tested in the Jing-Jin-Ji region of China, a representative research area in the world, and the daily concentration data of six main air pollutants in Beijing, Tianjin, and Shijiazhuang for two years were used to validate the accuracy and efficiency. The results show that the prediction model is superior to other benchmark models in pollutant concentration prediction and the evaluation model is satisfactory in air quality level reporting compared with the actual status. Therefore, the proposed system is believed to play an important role in air pollution control and smart city construction all over the world in the future.


Introduction
Air is one of the most basic elements for human survival and good air quality is necessary for human health. Unfortunately, air pollution has become a global problem, which has aroused widespread concern from scholars, governments and the public. Some studies have found that exposure to air pollutants is associated with the occurrence of many diseases such as respiratory disease, cardiovascular disease and even cancer, contributing to as many as 4-9 million human deaths per year globally [1,2]. The situation in China is also grim. With the rapid development of industrialization and urbanization, more and more fossil fuels are being burned, which results in increasing emissions of sulphur, nitrogen and particulate matter, causing deteriorating air quality and frequent hazy weather. As the "Capital Economic Circle" and future world-class urban agglomeration, influenced by adverse geographical and meteorological conditions along with industrial structure, the Jing-Jin-Ji region has become one of the most heavily polluted areas, with frequent long duration, wide range and severe degree regional pollution events. To solve this serious problem, researchers have done a lot of work, including air pollutant prediction and air quality evaluation.
Numerous forecasting models have been proposed, mainly for pollutant concentration. According to their principles, these forecasting models can be divided into three categories: statistic forecasting models, numerical forecasting models and machine learning models.
Statistic forecasting models have been widely used in air quality forecasting from the early days because of their simplicity and rapidity, and they still have value in application and research up to now. They can predict pollutant concentrations in the future only by studying the relationship between pollutant concentration and meteorological factors from past records without information about pollution sources. Common statistical models include the multiple linear regression model (MLR) [3], autoregressive integrated moving average model (ARIMA) [4], grey model (GM) and Markov model. For example, Elbayoumi et al. [5] used MLR to predict the annual indoor concentrations of PM 2.5 and PM 10 by analyzing the meteorological variables (wind speed, temperature and relative humidity) collected from 12 natural ventilation systems. Jian et al. [6] used ARIMA to study the effects of meteorological factors on the concentrations of ultrafine particles and PM 10 in Hangzhou under heavy traffic conditions. A first-order variable grey differential equation model was proposed by Pai et al. [7] to predict the hourly PM concentration in Banqiao, Taiwan. A Hidden Markov Model (HMMS) was used to predict daily average PM 2.5 concentrations [8]. Although these statistic forecasting models (linear method) have been widely used in PM concentration prediction (non-linear process), their accuracy is largely limited by their linear mapping ability. Most of the air pollutant time series in the real world are non-linear and irregular, so statistic forecasting model may not be suitable for these data.
Since the 1990s, with the development of computer technology and the abundance of air pollution data, numerical forecasting models have been greatly developed and are currently in the third generation. Based on the idea of "One Atmosphere", they realize two-way coupling between atmospheric dynamics and atmospheric chemistry which can simulate atmospheric physical and chemical processes on different scales and therefore predict the concentrations of different air pollutants [9]. Numerical forecasting model usually consist of meteorological modules, emission modules and chemical modules following this principle that weather or climate modules provide meteorological background fields which drive the chemical transport modules. At present, common numerical forecasting models include the U.S. Models-3 and WRF-Chem, Polyphemus from France as well as Nested Air Quality Prediction Modeling System (NAQPMS) from China [10][11][12]. Although numerical forecasting models are helpful to reveal the mechanism of pollution processes, their accuracy, especially in severe air pollution incidents, is greatly limited by some difficulties such as inaccurate atmospheric boundary layer simulation schemes, insufficient emission inventory of pollution sources and limited knowledge of atmospheric physical and chemical process. Furthermore, they require a lot of computing time.
Machine learning belongs to the field of artificial intelligence. The arrival of the big data era has brought unprecedented opportunities for the development of machine learning. Machine learning has excellent performance in regression and classification problems, and it is usually recognized as one of the most powerful tools in pollutant prediction for its high robustness and fault tolerance. Therefore, there are increasingly studies on pollutant concentration prediction with machine learning models. For example, support vector machine (SVM) [13] and artificial neural network (ANN) [14] are commonly selected. Paschalidou et.al [15] used a radial basis function (RBF) and multilayer perceptron (MLP) to predict hourly concentrations of PM 2.5 in Cyprus. Wu et al. [16] acquired predictions of PM 10 concentrations using a general regression neural network (GRNN).
Pollutant concentration data are too abstract for the public to understand, and people are eager for simplified and intuitive information to quickly understand the state of ambient air, which means air quality evaluation is indispensable. When it comes to methods of air quality evaluation, the most commonly used method is the air quality index (AQI) originally proposed by the US Environmental Protection Agency (EPA). AQI is widely used worldwide, while the standards vary among countries.
China's standard comes from "Technical Regulation on Air Quality Index (on trial) (HJ 633-2012)" issued by the Ministry of Environmental Protection. It considers a variety of pollutants including PM 2.5 , PM 10 , NO 2 , SO 2 , CO, O 3 . However, as with all environmental quality evaluations, there are ambiguities in air quality evaluation due to the vagueness of evaluation factors, criteria and objects, etc., which makes it difficult to justify the use of sharp boundaries in classification schemes, so the air quality index method has some limitations, for example, a slight increase or decrease of pollutant data near a boundary value will change the evaluation level. Such fuzziness has led many researchers to seek advanced evaluation methods [17], for instance, fuzzy mathematics. Fuzzy mathematics is proved to be a useful tool for air quality evaluation [18,19], and many air quality indicators based on fuzziness [20][21][22][23] are proposed.
Individual prediction or evaluation is not enough to help us cope with air pollution, so an integrated and complete system is expected to play a greater value. Some early-warning systems including prediction and evaluation have been gradually proposed. The problem of air pollution in China has attracted increasing attention, but there are relatively few in-depth and targeted studies in air quality early warning based on artificial intelligence. Consideration of pollutants which affect air quality should be as comprehensive as possible, but some studies only focus on single pollutant, mainly PM. Although the selection of experimental sites is of importance, some scholars don't give sufficient reasons such as purpose and significance for their choices. The selection of algorithms and pollutant concentration limits in air quality evaluation also remain to be discussed. Therefore, developing an accurate and robust air quality early-warning system has become an urgent need of society. It is hoped to provide not only air quality information comprehensively and objectively, but also necessary preventive measures for citizens to avoid hazards, and even help relevant departments to better control air pollution and minimize negative impacts.
Based on the above analyses, this paper proposes a novel air quality early-warning system composed of prediction and evaluation. The prediction part took advantage of advanced improved complete ensemble empirical mode decomposition with adaptive noise (ICEEMDAN) and combined whale optimization algorithm (WOA) with extreme learning machine (ELM). The three methods have been proved to be effective in air pollutant forecasting [24][25][26]. Fuzzy comprehensive evaluation (FCE) based on fuzzy mathematics was conducted subsequently.
Generally speaking, the contributions of this paper are as follows: • A complete air quality early-warning system was established and achieved good results in the Jing-Jin-Ji region where air pollution problems are of great concern. • A novel hybrid prediction model ICEEMDAN-WOA-ELM was proposed for the main air pollutants in Beijing, Tianjin and Shijiazhuang. ICEEMDAN and WOA are confirmed to greatly improve the prediction ability of ELM through comparison.

•
The predictive results can be transformed into corresponding air quality levels by fuzzy comprehensive evaluation, which means citizens without professional knowledge of atmospheric science can easily understand the current air quality and get scientific advices to avoid air pollution.

•
The air quality early-warning system is feasible and practical in air pollution treatment, which can not only protect the public from air pollution but also offer services for government decision-making on environmental protection.
The rest of this paper is organized as follows: Section 2 briefly introduces the methodologies adopted in this paper. Empirical research is given in Section 3, along with the description of experiment sites, data, evaluation criteria and so forth. Section 4 gives the conclusions.

The Proposed Air Quality Early-Warning System
In this section, the air quality early-warning system whose core is the hybrid ICEEMDAN-WOA-ELM-FCE model is introduced in detail. The flow diagram consisted of four steps, presented in Figure 1.

•
Step 1: Pollutant concentration data are usually chaotic time series, requiring denoising technology to eliminate the influences of outliers and improve the prediction accuracy. ICEEMDAN is used to process the original data into several IMFs from high frequency to low frequency, which contain different characteristics of the original data.

•
Step 2: The ELM optimized by WOA is applied to build a predictor for each IMF. The WOA algorithm is used to obtain the best parameters of ELM to establish a forecasting model which is not only fast but also accurate. All the predictive results of IMFs are synthesized and the final predictive result is obtained. The optimized ELM model is used to forecast the concentrations of six major air pollutants in Beijing, Tianjin and Shijiazhuang, which will be the key information for the evaluation model.

•
Step 3: Fuzzy comprehensive evaluation can convert the predictive results into air quality levels scientifically and objectively, providing crucial information for further research and analysis.

•
Step 4: The air quality information can be applied to guide people's daily lives. Different colors are assigned to different levels, so air quality information can be easily understood. In addition, brief but practical guidance corresponding to levels can be offered to the public against air pollution. Scientific and precise results also serve the government decision-making on environmental protection. Generally, the proposed air quality early-warning system will play a key role in future air pollution prevention.

The Proposed Air Quality Early-Warning System
In this section, the air quality early-warning system whose core is the hybrid ICEEMDAN-WOA-ELM-FCE model is introduced in detail. The flow diagram consisted of four steps, presented in Figure  1.

•
Step 1: Pollutant concentration data are usually chaotic time series, requiring denoising technology to eliminate the influences of outliers and improve the prediction accuracy. ICEEMDAN is used to process the original data into several IMFs from high frequency to low frequency, which contain different characteristics of the original data. In this section, all individual methods belonging to the air quality early-warning system are described in detail, including ICEEMDAN, WOA, ELM and FCE.

Improved Complete Ensemble Empirical Mode Decomposition with Adaptive Noise (ICEEMDAN)
The empirical mode decomposition (EMD) [27] is a widely used method to analyze non-linear and non-stationary data. Compared with the traditional decomposition algorithm, Fourier transform or wavelet transform which are more applicable to stationary and linear data, EMD is adaptive and highly efficient. Original data can be expressed as a sum of intrinsic mode functions (IMFs) and a final monotonic trend by EMD, but oscillations may be produced with different scales in one mode or with same scale in different modes which called "mode mixing". The ensemble empirical mode decomposition (EEMD) [28] is proposed to address this problem by adding Gaussian white noise to the original signal, but the added noise can't be completely neutralized and different noisy copies of the signal may produce different number of modes. The complete ensemble empirical mode decomposition with adaptive noise (CEEMDAN) [29] provides accurate reconstruction of the original signal, better spectral separation of the mode and computational efficiency, achieving huge improvements on EEMD. Furtherly, the ICEEMDAN [30] improves some aspects of CEEMDAN involving residual noise, "spurious mode" and so forth, becoming the latest decomposition method of EMD family. In this study, considering the non-stationary and non-linear characteristics of pollutant concentration, ICEEEMDAN was used as a data preprocessing method to better dig out the rules behind the pollutant data and serve the prediction later. The main steps of ICCEMDAN are summarized as follows: (1) Calculate local means of I realizations x (i) = x + β 0 E 1 w (i) by EMD to get the first residue is a realization of white Gaussian noise with zero mean unit variance, E k (.) is the operator that produces the kth mode obtained by EMD and M(.) is the operator that generates the local mean of the applied signal.
(3) Calculate the second residue as the average of local means of the realizations r 1 + β 1 E 2 w (i) and define the second mode: (4) For k = 3, . . . K calculate the kth residue r k = M r k−1 + β k−1 E k w (i) .
(5) Calculate the kth mode d k = r k+1 − r k . (6) Return to step 4 for next k.

Whale Optimization Algorithm (WOA)
Inspired by the bubble-net hunting strategy which corresponds to the social behavior of humpback whales, a nature-inspired meta-heuristic optimization algorithm called WOA [31] was proposed in 2016. Tested with 29 mathematical benchmark functions and six structural engineering problems in exploration, exploitation, local optima avoidance and convergence behavior, WOA was proved to be highly competitive compared to the state-of-art meta-heuristic algorithms as well as conventional methods. The mathematical model of WOA is illustrated as follows [31].

Encircling Prey
Humpback whales can identify and encircle the location of their prey. After defining the best search agent, other search agents will try to move to the best location. This behavior is expressed by the following mathematical formulas: where t is the current iteration, → X * is the best position, → X denotes the position vector, · is an element-by element multiplication, and → A and → C are coefficient vectors which can be calculated by the following equations: where → r is a random vector between 0 and 1, and → a is linearly reduced from 2 to 0 in the iteration process.

Bubble-Net Attacking Method (Exploitation Phase)
Humpback whales usually attack their prey using the bubble-net strategy and two approaches are designed: (1) Shrinking encircling mechanism.
This behavior is realized by reducing the value of (2) Spiral updating position A spiral equation is established between whales and prey to simulate the helix-shaped movements of humpback whales: is the distance between the ith whale and the best position obtained so far, b is a constant to define the logarithmic spiral, l is a random number between −1 and 1, and · is an element-by-element multiplication. WOA assumes that there is a 50% probability of choosing shrinking encircling mechanism or the spiral model to update the position of whales in the optimization process. The algorithm is defined as follows: where p is a random number between 0 and 1.

Search for Prey (Exploration Phase)
Humpback whales can randomly search for prey according to the position of each other. In the exploration phase, we can update the location of a search agent based on a randomly selected search agent, rather than the best search agent found so far. This mechanism emphasizes exploration, allowing the WOA algorithm to perform a global search. This mathematical model is expressed as follows: where → X rand is a random location vector selected from the current population. The WOA algorithm (Algorithm 1) starts with a set of random solutions. In each iteration, the search agent updates its location based on the randomly selected search agent or the best solution obtained so far. A random search agent is selected when | → A| > 1, and the best solution is selected when | → A| < 1. According to p value, WOA can switch between spiral and circular movement. The WOA algorithm is terminated when it satisfies the termination criterion. The pseudo code of the WOA algorithm is represented as follows:

Input: Maximum number of iterations Iter Max , Fitness function F i , Current iteration number t,
A random number l between −1 and 1, A constant number b. 1: Initialize the whales population X i (i = 1, 2, 3, . . . , n) 2: for each search agent do 3: Calculate the fitness function F i 4: end for 5: X * = the best search agent 6: while t < Iter Max do 7: for each search agent do 8: Update a, A, C, l and p 9: if p < 0.5 then 10: if |A| < 1 then 11: Update the position of search agent using Eq(2); 12: elseIf |A| ≥ 1 13: Select a random search agent X rand ; 14: Update position of search agent using Eq (8)

Extreme Learning Machine (ELM)
ELM [32] is a simple and extremely fast learning algorithm of single-hidden layer feedforward neural networks (SLFN). ELM randomly assign input weights and hidden layer biases (thresholds) without adjustment in the training process, which leads to thousands of times faster than traditional feedforward network learning algorithms and better generalization performance in most artificial and real benchmark problems. The structure of single-hidden layer feedforward neural network is shown in Figure 2.

Extreme Learning Machine (ELM)
ELM [32] is a simple and extremely fast learning algorithm of single-hidden layer feedforward neural networks (SLFN). ELM randomly assign input weights and hidden layer biases (thresholds) without adjustment in the training process, which leads to thousands of times faster than traditional feedforward network learning algorithms and better generalization performance in most artificial and real benchmark problems. The structure of single-hidden layer feedforward neural network is shown in Figure 2.
SLFN can be expressed as [32]: where w i = [w i1 , w i2 , . . . , w in ] T is the weight vector between the input layer neurons and the ith hidden layer neuron, b i is the threshold of the ith hidden layer neuron, g(x) is the activation function, and β i = [β i1 , β i2 , . . . , β im ] T is the weight vector between the ith hidden layer neuron and the output layer neurons. Formula (9) can be expressed as: where H is the output matrix of the hidden layer, β is the weight vector between the hidden layer neurons and the output layer neurons, T is the expected output of network, represented as follows [32]: The number of required hidden layer neurons L ≤ N when activation function g is infinitely differentiable. Its solution is: where H + is the Moore-Penrose generalized inverse of H. ELM can generate w and b randomly before training and calculate β only by determining L and g(x). Generally, the ELM algorithm has the following steps: (1) Determine the number of neurons L in the hidden layer, and randomly set the connection weight w between the input layer and the hidden layer and the threshold b of hidden layer neurons.
(2) An infinitely differentiable function g(x) is selected as the activation function of the hidden layer neurons, and then the output matrix H of the hidden layer is calculated.
(3) Calculate the weight of the output layer:β = H + T.

Fuzzy Comprehensive Evaluation (FCE)
Environmental quality is a huge and ambiguous system with a large number of uncertain factors. Fuzzy mathematics [18] can effectively solve the influences of ambiguity of evaluation boundary and monitoring error on evaluation. Using membership function to represent air quality level can eliminate subjective and artificial factors in classification, objectively reflecting regional air quality. The concrete steps of fuzzy comprehensive evaluation are as follows: (1) Establish the factor set A factor set is a set of elements that affect the evaluation object, usually represented by U = {u 1 , u 2 , . . . , u m }. It is well known that different pollutants can cause different hazards to human health, so these parameters should be treated separately. Therefore, six main pollutants are selected as air quality parameters in this project: (2) Set up the evaluation set Because this research is carried out in China, air pollutant concentration limits from "Technical Regulation on Air Quality Index (on trial) (HJ 633-2012)" of China have a reference value. On account of the lack of values of O 3 (8 h) beyond the fifth level, we have a decision that the evaluation set comprises five levels: V = {v 1 , v 2 , . . . v 5 } = {I, II, III, IV, V} and the corresponding air quality categories are "Excellent, Good, Moderate, Poor, Hazardous". The air quality levels and corresponding concentration limits of different pollutants are given in Table 1. (1) Establish fuzzy matrix The fuzzy matrix can be expressed by the matrix R, where R ij is the membership degree of factor u i aiming at the comment v j : · · · r 1n · · · r 2n · · · · · · r m1 r i2 · · · · · · · · · r mn The membership function can calculate the membership degree of pollutant concentration to the evaluation grade. There are many membership functions such as halved trapezoidal distribution function, Gauss membership function, triangular membership function, etc. In this study, the halved trapezoidal distribution function [33] which has often been used in air quality evaluation is selected and details are presented as follows: (2) Determine the factor weights The weight of a factor is an index to measure the relative degree of a pollutant impact on air quality. The multi-scale weighting method is commonly used in the fuzzy evaluation of environment quality, therefore the weight of pollution factor can be obtained by Equation (16): (3) Evaluation result By synthesizing the weight vector and the fuzzy matrix with the appropriate operator, the final result of the fuzzy comprehensive evaluation can be obtained. The Zadeh operator M(∧, ∨) is commonly used as a solution, therefore it is adopted here: · · · r 1n · · · r 2n · · · · · · r m1 r m2 · · · · · · · · · r mn According to the principle of maximum membership degree, the maximum value of B is the result of fuzzy comprehensive evaluation of air quality.

Experimental Results and Analysis
In this section, in order to evaluate the performance of proposed air quality early-warning system, three datasets from three cities (Beijing, Tianjin, Shijiazhuang) in China were used in case studies (the simple map of the study areas is displayed in Figure 3. The main reasons for the choice are: (1) Jing-Jin-Ji region is a Beijing-centered world-class urban agglomeration which has a developed economy and important strategic position. It covers 13 cities, 110 million people and 218,000 km 2 of land area, so air pollution is really of concern here. (2) In this region, the heavy industrial structure, dense population and limited environmental capacity have led to frequent haze events which cause serious troubles to people's normal life and social development. At the moment, how to balance economic development and environmental protection is urgent and it is hoped our system will be beneficial for air pollution control. (3) Influenced by meteorological conditions, pollutant emissions and transport, secondary transformation of particulate matter, synthetical effect of nature and human, the air pollution is extremely complex and prominent here. This problem not only seriously endangers human health and economic development, but also has impacts on climate and environment change. Therefore, relevant research conducted in this region is representative and referential for air pollution control of other metropolis in the world.

Experimental Results and Analysis
In this section, in order to evaluate the performance of proposed air quality early-warning system, three datasets from three cities (Beijing, Tianjin, Shijiazhuang) in China were used in case studies (the simple map of the study areas is displayed in Figure 3. The main reasons for the choice are: (1) Jing-Jin-Ji region is a Beijing-centered world-class urban agglomeration which has a developed economy and important strategic position. It covers 13 cities, 110 million people and 218,000 km 2 of land area, so air pollution is really of concern here. (2) In this region, the heavy industrial structure, dense population and limited environmental capacity have led to frequent haze events which cause serious troubles to people's normal life and social development. At the moment, how to balance economic development and environmental protection is urgent and it is hoped our system will be beneficial for air pollution control. (3) Influenced by meteorological conditions, pollutant emissions and transport, secondary transformation of particulate matter, synthetical effect of nature and human, the air pollution is extremely complex and prominent here. This problem not only seriously endangers human health and economic development, but also has impacts on climate and environment change. Therefore, relevant research conducted in this region is representative and referential for air pollution control of other metropolis in the world.

Dataset Description
Datasets used in this study were from the Ministry of Ecology Environment of China including daily concentration of six main air pollutants in three cities from 1 September 2016 to 30 September 2018. For missing data, the nearby mean was used as the missing data. The sample size of one pollutant in one city was 760, which were divided into training set (699) Table 2.

Evaluation Criteria
To evaluate the performance of the proposed system in forecast, a set of four criteria [34] are applied: Mean absolute error (MAE), Root mean square error (RMSE), Mean absolute percentage error (MAPE) and Theil's inequality coefficient (TIC). MAE reflects the difference between the predicted and actual value. RMSE reflects the extent of the difference between the predicted and actual values. MAPE is an index to measure the forecasting accuracy of a model in statistics. TIC is an indicator used to measure the predictive capability of a model. For all criteria, the smaller the value is, the better predictive performance the model has.

•
Mean absolute error (MAE): • Root mean square error (RMSE): • Mean absolute percentage error (MAPE): • Theil's inequality coefficient (TIC): where N is the number of data,F i and F i are the predicted and actual value at time i, respectively.

Diebold-Mariano (D-M) Test
The Diebold-Mariano test [35] is a hypothesis test that is employed to evaluate the significance of the performance of proposed model compared with other models. The hypothesis test is defined as follows: where l is the loss function, ε 1 n+t and ε 2 n+t are the forecast errors of two forecasting models. Each forecast accuracy is evaluated by an appropriate loss function, and the commonly used loss function is the MAE (Equation (18)) function [36]. For given significance level, the null hypothesis indicates that there is no significant difference between proposed model and comparison model in predictive performance.
The statistical function of the DM test is as follows: where s 2 is the estimate of variance of D i = l ε 1 n+t − l ε 2 n+t . The null hypothesis is that the two prediction models have the same predictive accuracy. The DM statistic converges to the standard normal distribution N(0, 1), and the null hypothesis will be rejected if |DM| > Z α/2 . Z α/2 denotes the critical z-value of the standard normal distribution, and α is the significance level.

Case Studies
In this paper, case studies were carried out to measure the performance of forecasting model. Single model and hybrid model including ARIMA [37], GRNN [38], ELM [26], GA-ELM, WOA-ELM and EEMD-WOA-ELM were used as benchmarks to assess the proposed hybrid model. The experiment was first conducted in Beijing to verify the predictive performance of the model in details, and then experiments in Tianjin and Shijiazhuang were used to prove universality. If the proposed model outperforms other models in all case studies, we can certainly draw the conclusion that the proposed model has not only high accuracy but also universal applicability in different environments. Meanwhile, the model was assessed by the statistical test based on DM test. Furthermore, the trial-and-error method was used to determine the best experimental parameters which are listed in Table 3.

Case Study One: Beijing
The daily concentrations of six air pollutants from 1 September 2016 to 31 September 2018 in Beijing were employed to verify the forecasting performance of the proposed hybrid model. Daily pollutant concentrations of two months from 1 August 2018 to 31 September 2018 were predicted and compared with actual data. Figure 4 shows the predictive results and Figure 5 shows the daily relative errors (relative error = (predicted value − actual value)/actual value). In addition, four performance indicators are calculated and given in Table 4. At the same time, Table 4 also shows the predictive effectiveness of ARIMA, GRNN, ELM, GA-ELM, WOA-ELM, EEMD-WOA-ELM and ICCEMDAN-WOA-ELM as comparison, and the bold values represent the best values for each criterion. It is evident that ICEEMDAN-WOA-ELM model has the most excellent performance among all models. Its predictive results are very closer to actual values than other models. Influenced by many factors, though relative errors of PM 2.5 and PM 10 are larger than that of other pollutants for the highly nonlinear and non-stationary characteristics, the proposed model is more satisfactory.          Based on the information in Figure 4, Figure 5 and Table 4, it is clear that the proposed model obtains the best results for all evaluation indicators. Therefore, we can conclude that the proposed ICEEMDAN-WOA-ELM model is superior to benchmark models in the prediction of air pollutant concentrations. More comparative analyses are presented as follows: (1) As one of time series forecasting models, ARIMA is superior to single artificial intelligence models in accuracy. The four indexes (MAE, RMSE, MAPE, TIC) of ARIMA are almost better than those of single artificial intelligence models, which is attributed to the high volatility and irregularity of air pollutant concentration data. The results show that single artificial intelligence models cannot meet the requirements of air pollutant prediction which means it is urgent to develop a hybrid model to improve the predictive performance.
(2) From the comparison between ELM and GA-ELM as well as WOA-ELM, we can conclude that optimization algorithms can really help neural network model improve performance. The ELM optimized by GA or WOA provides better predictive results for six air pollutants. Compared with other optimization algorithms, WOA not only is simple, flexible and effective, but also can achieve a balance between exploration and exploitation.
(3) It can be clearly seen that the data preprocessing algorithm has brought a great improvement to the neural network model. Compared with other models, EEMD-WOA-ELM and ICEEMDAN-WOA-ELM are so outstanding in prediction, which fully proves the concept "decomposition and integration" or "divide and conquer" to be effective for establishing a robust air pollutant prediction model. It is obvious that model with ICEEMDAN performs better than the counterpart with EEMD in any cases. For instance, four metrics are 14.1866, 17.2348, 13.9759, 0.0665 and 21.1571, 24.5963, 21.8809, 0.0938, respectively, for O 3 . The results show that the decomposition method can greatly reduce predictive errors of model, and moreover, ICEEMDAN is superior to other decomposition methods in data decomposition.
Through the above analyses, MAE, RMSE, MAPE and TIC were used to prove the proposed ICEEMDAN-WOA-ELM hybrid model is obviously superior to all considered benchmark models for its higher accuracy and stability. Compared with single model, all models based on "decomposition and integration" framework have better predictive effectiveness, which shows this framework can effectively improve the model performance. The proposed forecasting model fits all the data of air pollutants with high volatility and irregularity, so it is qualified as the prediction part of air quality early-warning system.

Case Study Two: Tianjin and Shijiazhuang
In order to verify the predictive and universal capabilities of the proposed model, the daily air pollutant concentration data of Tianjin and Shijiazhuang (from 1 September 2016 to 30 September 2018) were also used in case studies. The main purpose is to test the generality of model under different environments. The predictive results are shown in Figures 6-9 and Tables 5 and 6. In Tables 5 and 6, bold values represent the best values for each criterion among all models. From predictive results, we can see at a glance that the proposed model has better predictive results and predicted values are much closer to real values.
These experiments lead to the same conclusion as case study one that the hybrid model ICEEMDAN-WOA-ELM is superior to all listed benchmark models. For example, for the PM 10   Four typical evaluation indicators (MAE, RMSE, MAPE and TIC) were used to measure the performance of all models. The hybrid model ICEEMDAN-WOA-ELM performs best for it has the greatest evaluation criteria. Through these experiments, we can reasonably draw following conclusions: data preprocessing algorithm and optimization algorithm can significantly improve the predictive performance of the model. The proposed model with excellent predictive performance will be the bedrock of establishing air quality early-warning system. In addition, for their universality, these methods can be combined with some basic models to meet the needs of other research fields.

Diebold-Mariano Test
In this section, Diebold-Mariano test was used to examine the effectiveness of the proposed hybrid model. DM test is employed to test under which circumstance an experiment will enable us to reject null hypothesis at a given significance level. The detailed description of DM test is presented in Section 3.3. The null hypothesis (Equation (22)) here is that there is no significant difference between the two models. Table 7 shows the DM test statistic value based on the MAE (Equation (18)) function.
The DM values from all models are greater than the upper limits at the 1% significance level, which reflects that the proposed hybrid model significantly outperforms other comparison models.

Fuzzy Comprehensive Evaluation of Air Quality
In this section, predicted data of September 2018 were used for fuzzy comprehensive evaluation and further analysis. This work could visualize the predicted results of three cities. Limited by the length of paper, we had to take only the results of twenty days in September 2018 as examples which included evaluation results based on predicted and actual value as comparison to get the accuracy of the model in level forecast. Firstly, according to the methodology of fuzzy comprehensive evaluation described in Section 2.4, the evaluation set V = {I, II, III, IV, V} was established. Secondly, the membership degree of each factor to each evaluation level was calculated by the membership degree formula, and the fuzzy matrix R was established. Thirdly, the weight of pollution factor value calculated by multi-scale weighting method was an index to measure the relative degree of environmental hazards which greatly affected the evaluation result. Finally, according to the fuzzy matrix and weight index, the membership degree of evaluation level and air quality level were obtained. Evaluation results of Beijing are shown in Table 8. Taking the result of one day (01/09/2018) as an example, the probability of air quality as "I" is 0.3759, and the probability of "II", "III", "IV" and "V" are 0.3409, 0, 0 and 0 respectively. According to the principle of maximum membership degree, the comprehensive evaluation level of air quality should be "I" and the corresponding category is "Excellent".
The consistency ratio of the two results in Beijing is 27/30 (90%), which not only shows the high accuracy of the proposed hybrid model in level forecast but also indirectly proves predicted concentration data are so accurate that they can fully satisfy the need of air quality early warning. Using the same algorithm, the fuzzy comprehensive evaluation of air quality for Tianjin and Shijiazhuang were conducted and the results are shown in Tables 9 and 10. Overall, the evaluation results are basically same in these two cities. The consistency rates in Tianjin and Shijiazhuang are 26/30 (87%) and 30/30 (100%), respectively.  I  II  III  IV  V  Level  I  II  III  IV    Therefore, the evaluation method can effectively link the pollutant concentration prediction with air quality early warning. Nevertheless, precise predictions of pollutant concentration and air quality level are not enough, because achievements of scientific research are expected to truly serve the society. Further work can be performed based on former research which means more intuitive air quality information can be released and public alarms can be issued. Therefore, an air pollution early-warning handbook was compiled and details are shown in Table 11. This work can not only guide people's daily activities against air pollution but also provide decision-making support for government such as evaluate whether the air quality of a city meets the criteria or which temporary but mandatory measures should be taken to address potential air pollution problems. Tips: six major air pollutants include PM 2.5 , PM 10 , NO 2 , SO 2 , CO and O 3 . It is necessary to know their characteristics: • PM 2.5 : namely fine particulate matter, particle size less than or equal to 2.5 µm, it has smaller size, larger area, stronger activity, easier to attach toxic and harmful substances, longer residence time and transportation distance in atmosphere which mean more harmful to human health and air quality than PM 10 , can enter bronchioles and alveoli causing cardiopulmonary disease and even lung cancer. • PM 10 : namely inhalable particulate matter, particle size less than or equal to 10 µm, can reduce the atmospheric visibility, enter upper respiratory tract causing respiratory disease. • NO 2 : rufous and irritating odor, can promote acid rain and ozone, damage respiratory tract. • SO 2 : colorless and irritating odor, can be oxidized into sulfuric acid mist (acid rain) or sulfate aerosol, cause respiratory diseases and cancer. • CO: colorless and tasteless, mainly from uncompleted combustion, cause suffocation even death. • O 3 : light blue with special odor, major constituent of photochemical smog, damage human mucosa and respiratory tract.

Conclusions
Air pollution is a long-standing problem that plagues the whole world, seriously harming human health, social development and natural environment. In order to solve this problem, a great deal of manpower and material resources have been invested, but unfortunately the results are not satisfactory enough. There is always a way and the rapid development of artificial intelligence in recent years has brought new hope for air pollution control. This proposed air quality early-warning system is hoped to play a key role in future for its accuracy and effectiveness. This system mainly consists of two parts: prediction model and evaluation model.
In order to establish the prediction model, ELM, which is famous for accuracy and robustness, was employed. Taking ELM as the core, a hybrid model ICEEMDAN-WOA-ELM was proposed. Firstly, according to the theory of "decomposition and integration", the original time series of pollutant concentration were decomposed into IMFs by decomposition algorithm (ICEEMDAN). Secondly, the ELM optimized by WOA was used to predict each IMF. Finally, all the predictive results were combined to get the final predictive result. In this study, six main air pollutants PM 2.5 , PM 10 , NO 2 , SO 2 , CO and O 3 in Beijing, Tianjin and Shijiazhuang were chosen. This proposed prediction model was used to predict air pollutant concentrations and compare with the six benchmark models including ARMA, GRNN, ELM, GA-ELM, WOA-ELM and EEMD-WOA-ELM. The simulation results showed that the proposed ICEEMDAN-WOA-ELM model was superior to other models and ICEEMDAN decomposition algorithm along with WOA optimization algorithm played important roles in improving the prediction accuracy of neural network.
In addition to prediction of air pollutant concentration, air quality evaluation was an indispensable part of the air early warning system. For the sake of understanding the future state of air, air quality was evaluated with the above predicted data by fuzzy comprehensive evaluation. The evaluation results were satisfactory enough compared with the actual status, which means our proposed evaluation model can meet the requirement of early warning. Furthermore, air pollution early-warning handbook was compiled to provide the public with intuitive air quality information and reasonable measures.
The combination of air pollutant prediction and air quality evaluation lays a solid foundation for the establishment and implementation of air quality early-warning system. The proposed system can offer us accurate air pollutant concentration prediction, correct air quality evaluation, reasonable countermeasures and scientific decision-making support, which means it will become a sharp weapon for air pollution control and even smart city construction in future.