Air Pollution Forecasts: An Overview

Air pollution is defined as a phenomenon harmful to the ecological system and the normal conditions of human existence and development when some substances in the atmosphere exceed a certain concentration. In the face of increasingly serious environmental pollution problems, scholars have conducted a significant quantity of related research, and in those studies, the forecasting of air pollution has been of paramount importance. As a precaution, the air pollution forecast is the basis for taking effective pollution control measures, and accurate forecasting of air pollution has become an important task. Extensive research indicates that the methods of air pollution forecasting can be broadly divided into three classical categories: statistical forecasting methods, artificial intelligence methods, and numerical forecasting methods. More recently, some hybrid models have been proposed, which can improve the forecast accuracy. To provide a clear perspective on air pollution forecasting, this study reviews the theory and application of those forecasting models. In addition, based on a comparison of different forecasting methods, the advantages and disadvantages of some methods of forecasting are also provided. This study aims to provide an overview of air pollution forecasting methods for easy access and reference by researchers, which will be helpful in further studies.


Introduction
Air is a basic requirement for the survival and development of all lives on Earth. It affects health and influences the development of the economy. Today, due to the development of industrialization, the increase in the number of private cars, and the burning of fossil fuels, air quality is decreasing, with increasingly serious air pollution. There are many pollutants in the atmosphere, such as SO 2 , NO 2 , CO 2 , NO, CO, NO x , PM 2.5 , and PM 10 . Internationally, a large number of scholars have conducted research on air pollution and air quality forecasts, concentrating on the forecasting of contaminants.
Air pollution affects the life of a society, and even endangers the survival of mankind. During the Industrial Revolution, there was a dramatic increase in coal use by factories and households, and the smog caused significant morbidity and mortality, particularly when combined with stagnant atmospheric conditions. During the Great London Smog of 1952, heavy pollution for 5 days caused at least 4000 deaths [1,2]. This episode highlighted the relationship between air pollution and human health, yet air pollution continues to be a growing problem in cities and households around the world.
Air pollution is made up of a mixture of gases and particles in harmful amounts that are released into the atmosphere due to either natural or human activities [3]. The sources of pollutants can be divided into two categories: (1) Natural sources Natural pollution sources are natural phenomena that discharge harmful substances or have harmful effects on the environment. Natural phenomena, such as volcanic eruptions and forest fires, will result in air pollutants, including SO 2 , CO 2 , NO 2 , CO, and sulfate.
(2) Anthropogenic (man-made) sources Man-made sources such as the burning of fuels, discharges from industrial production processes, and transportation emissions are the main sources of air pollution. There are many kinds of pollutants emitted by man-made pollution sources, including hydrogen, oxygen, nitrogen, sulfur, metal compounds, and particulate matter.
With the increasing world population and the developing world economy, the demand for energy in the world has increased dramatically. The large-scale use of fossil energy globally has also led to a series of environmental problems that have received much attention due to their detrimental effects on human health and the environment [3][4][5]. Air pollution is a fundamental problem in many parts of the world, with two important concerns: the impact on human health, such as cardiovascular diseases, and the impact on the environment, such as acid rain, climate change, and global warming [6]. These environmental impacts are described below.
(1) Climate change Some chemicals released into the atmosphere by human activities, such as CO 2 , CH 4 , N 2 O, and chlorofluorocarbons (CFCs, exemplified byFreon-12), cause a greenhouse effect [7,8]. The burning of fossil fuels and other human activities increase the concentration of greenhouse gases, leading to global warming. This also leads to a rise in sea level, more extreme weather, and melting glaciers and ice caps. More alterations to the environment are inevitable as temperatures continue to climb [7].
The studies have indicated that the rate of sea level increase was the fastest in the twentieth century, and data have proven this point of view. The sea level has risen 14 cm in the twentieth century. A study shows that the sea level will rise by 28 cm and is expected to reach a total of 131 cm in 2100 [3,7,9], while average global temperature will increase by 3.6 • F to 8.1 • F (2 • C to 4.5 • C) [7].
(2) Ozone Hole The ozone layer is a relatively high level concentration of ozone in the stratosphere, and its main function is to absorb ultraviolet radiation. It has many useful functions for Earth, and the most important of those functions is to protect human beings, animals, and plants from short wave ultraviolet radiation [10]. It also protects against the heating effect, as ozone absorbs the Sun's ultraviolet rays and converts it to heat energy that heats the atmosphere [11].
Freon, a halohydrocarbon, and N 2 O can produce the greenhouse effect and can also react with stratospheric ozone, resulting in the depletion of the ozone layer and creation of holes in the ozone layer [10,12].
The decline of the stratospheric ozone level from anthropogenic source is internationally recognized as one of the Earth's most important environmental issues [13]. The ozone hole is affecting human health and the environment negatively and can cause severe diseases, such as skin cancer, eye damage, and genetic mutations [10,12]. Research results show that if stratospheric ozone concentrations decreased by 1%, the amount of ultraviolet radiation will be increased by 2%, and the cataract rate will increase 0.2-0.6%. Moreover, the depletion of the ozone layer seriously harms the human body, crops, and forests, even destroying natural biosphere generation and the marine ecological balance [12].
In recent years, scientists discovered that the phenomenon of ozone reduction occurs in both the Antarctic and Arctic [11]. In the spring of 2011, ozone column loss had reached 40%. According to the observations of Chinese atmospheric physics and meteorology over the Qinghai-Tibetan Plateau, the ozone layer is being reduced at a rate of 2.7% per 10 years.
(3) Particulate matter pollution Atmospheric particulate matter consists of solid or liquid granular substances in the atmosphere. Thick smog along with particulate matter (PM) occurs and covers most cities of world frequently [4]. According to medical research, PM causes different degrees of harm to human respiratory, cardiovascular, and central nervous, and immune systems and to genes [14,15].
China, as the largest developing country, has attracted great attention from all over the world for its rapid economic development and its air pollution. In 2015, China's air pollution situation was very serious with most cities' air quality exceeding the China National Standard. Moreover, some cities in China have been selected as the 10 most polluted cities in the world [16]. In recent years in China, high concentrations of particulate matter have received increasing attention [17].
Generally, air pollutants do not just harm the local or regional environment. They can also cause damage on a global scale. Certain man-made chemicals have damaged the planet's protective ozone layer, allowing more harmful solar radiation to strike the Earth's surface. Although the use of these chemicals is being phased out, their destructive effects will linger for many more decades.
Control of air pollution and improving air quality are presently concern of scientists globally [18]. As one of the important results of urban air pollution control, urban air pollution forecasting has established an urban air pollution alarm system, effectively reducing the cost of air pollution control. The establishment of a reasonable and accurate forecasting model is the basis for forecasting urban air pollution. Forecasting is a requisite part of in the science of big data and can be used to infer the future development of an object relative to previous information. So "pollution forecasting" can be understood as estimation of pollutant concentration at specified future date.
Since the 1960s, with the development of air pollution control and research, it has become urgent for people to understand the influence of air pollution and the trends of pollution. Therefore, forecasting air pollution began. Forecasting pollution using different patterns of performance can be divided into three types: potential forecasts, statistical models, and numerical models. For different elements, it is divided into pollution potential forecasting and concentration forecasting [19]. Statistical methods and numerical modelling methods result in concentration forecasts. A potential forecast is mainly based on the meteorological conditions for atmospheric dilution and diffusion capacity. When the weather conditions are expected to be in line with the standards for possible serious pollution, a warning will be issued. A concentration forecast will forecast the concentration of pollutants in a certain area directly, and the forecast results are quantitative. These air pollutions forecasting models can be divided into parametric and nonparametric models, or deterministic and nondeterministic models. It is easy to distinguish the parametric models from nonparametric models, and deterministic models from nondeterministic models, but it is difficult to differentiate the parametric models from deterministic models. The most significant difference between parametric models and deterministic models is that for a deterministic model, the output can be determined, as long as inputs are fixed, regardless of the number of trials; while the parametric model is to determine the parameters of equations in the known model, and its output is uncertain. For example, the diffusion models in this paper belong to the deterministic model, and they are based on physical equations, driven by the chemistry and the transport of pollutants, requiring many accurate input data [20]; models based on large amounts of historical data, such as regression, principal component analysis, etc., are usually parametric models.
The most popular statistical method uses artificial intelligence (AI) models. The accuracy of neural network (NN) forecasting models is higher than that of other statistical models [21][22][23], but they should be improved. Therefore, some scholars have been improving the forecast accuracy by other methods. Grivas et al. developed an artificial neural network (ANN) that combined meteorological and time-scale input variables [22]. Elangasinghe et al. built an ANN air pollution forecast tool based on meteorological parameters and the emission pattern of sources [23]. The improved ANN models were found to be more effective based on the same input parameters [24][25][26].
A commonly used numerical model is the Community Multi-scale Air Quality (CMAQ) modeling system. Since the 1970s, three generations of CMAQ models have been developed. Lou et al. used the CMAQ modeling system to analyze and evaluate air pollutant ozone concentrations in China and proposed that this method could be applied to other oxides of nitrogen [27].
Up to now, a large number methodologies and approaches have been proposed for air pollution forecasting but no comparison of these methods in the accuracy of forecast have been made. In the present paper, we have discussed various approaches and given statistical analysis to find out an accurate method. Figure 1 shows the plan of the study. used the CMAQ modeling system to analyze and evaluate air pollutant ozone concentrations in China and proposed that this method could be applied to other oxides of nitrogen [27].
Up to now, a large number methodologies and approaches have been proposed for air pollution forecasting but no comparison of these methods in the accuracy of forecast have been made. In the present paper, we have discussed various approaches and given statistical analysis to find out an accurate method. Figure 1 shows the plan of the study.

The Current Status of Pollution Research
Air pollution is regarded as an unavoidable reality. Over the past few years, much news about environmental pollution accidents have been reported, especially air pollution events. If the environmental problems are ignored in process of social progress, the ecological environment of the earth will gradually deteriorate, so the Earth is always in danger and every day will be "2012". It is well-known that, compared with land pollution and water pollution, the consequences of air pollution are more serious. Scholars have conducted a series of studies on air pollution, from pollution sources to pollution management and pollution forecasts, including the problem of emissions inventories, pollution assessments, and pollution alarms. These topics lay the foundation for the research into air pollution covered in the following sections.

The Current Status of Pollution Research
Air pollution is regarded as an unavoidable reality. Over the past few years, much news about environmental pollution accidents have been reported, especially air pollution events. If the environmental problems are ignored in process of social progress, the ecological environment of the earth will gradually deteriorate, so the Earth is always in danger and every day will be "2012". It is well-known that, compared with land pollution and water pollution, the consequences of air pollution are more serious. Scholars have conducted a series of studies on air pollution, from pollution sources to pollution management and pollution forecasts, including the problem of emissions inventories, pollution assessments, and pollution alarms. These topics lay the foundation for the research into air pollution covered in the following sections.

The Current Status of Pollution Emission Inventory Research
In the words of Seika, the emission inventory (EI) is a comprehensive list of various types of air pollutants emitted by various sources of pollution in a given area within a given time interval [28]. EIs provide a description of the polluting activities that occur across a specific geographic domain and are widely used as input for air quality modeling for the assessment of compliance with environmental legislation [29]. Air pollution control requires complex environmental management, in which clear EIs are the basis for other research.
United States Environmental Protection Agency (EPA) developed an emission inventory improvement program (EIIP) in 1993. This program promoted the development and usage of collection, storage, reporting, sharing and other standardization process of data. The EIIP documents were designed to provide standardized approaches for emission estimation, the emission estimates formula is as follows [30]: For point sources, activity levels represent the operating rate of the facility, estimated at the facility level. For area sources, replacing emissions with some other variable, such as population count in a region, is used as an activity level. The correlates between surrogate activity factor and the emission rate for the source determine the quality of the estimates. The emission factor is the value of the amount of pollutants released into the atmosphere per unit activity associated with the release of the contaminant. And the level of control is equal to the amount controlled, one minus the level of control is represents the amount emitted after control [30].

The Health Effect of Pollution
Exposure to air pollution has been clearly associated with a range of adverse health effects. A report from the OECD indicated that outdoor air pollution could cost the world $2.6 trillion a year, by 2060, which includes the cost of sick days, medical bills and reduced agricultural output. Moreover, welfare costs associated with premature death by 2060 will rise to as much as $25 trillion [31]. Lafuente et al. performed a systematic review to assess the effects of air pollutants on sperm quality [32]. They set up four semen quality parameters, including DNA fragmentation, sperm count, sperm motility, and sperm morphology. Most studies concluded that air pollution impacted at least one of the four semen quality parameters included in the review.
Wei et al. studied the effects of ambient NO 2 , SO 2 , and PM 10 on childhood eczema in Shanghai, China. They selected 3358 preschool children for their 6-year research program. This study indicated that gestational and lifetime exposures to NO 2 were risk factors for atopic eczema in childhood; moreover, exposure to SO 2 , and PM 10 may enhance the effect of NO 2 exposure on childhood eczema [33].
Beelen et al. developed a multi-center cohort study for Europe. The results indicated that the risk of natural mortality was significantly increased when exposed to PM 2.5 for a long time [34]. The study showed that there is a positive correlation between PM 2.5 and heart disease mortality. In addition, as the PM 2.5 concentration increased, the mortality rate of patients with heart disease increased.
Various studies have testified that air pollution is harmful to human and other kind of creatures, and lead to varies diseases and loss, such as respiratory disease, cardiovascular disease, Death of animals and plants and economic losses.

Air Pollution Assessment
In recent years, air pollution accidents have occurred frequently, which have damaged the economy and human life. To assess the extent of the damage, air pollution control must be evaluated in order to have a quantitative understanding of pollution.
The assessment of air pollution is identify and measure the degree and scope of damage caused by environmental pollution cover the economic, legal, technical and other means reasonably [35][36][37].
Two of the more mature assessment methods will be described. The market value method is a type of cost benefit analysis method. It uses the change of product yield and profit caused by the environmental quality change to measure the economic loss related to the environmental quality change.
Environmental pollution and damage caused by air pollution can be prevented, restored, or replaced by the existing environmental functions. Therefore, the cost of preventing, restoring, or replacing the original functional protection facilities can be used to estimate the loss caused by pollution or damage to the environment. This method is called the engineering cost method.
The main equation and the meaning of the variables in those methods are given in Table 1, and the flowchart of the assessment methods is given in Figure 2. Two of the more mature assessment methods will be described. The market value method is a type of cost benefit analysis method. It uses the change of product yield and profit caused by the environmental quality change to measure the economic loss related to the environmental quality change.
Environmental pollution and damage caused by air pollution can be prevented, restored, or replaced by the existing environmental functions. Therefore, the cost of preventing, restoring, or replacing the original functional protection facilities can be used to estimate the loss caused by pollution or damage to the environment. This method is called the engineering cost method.
The main equation and the meaning of the variables in those methods are given in Table 1, and the flowchart of the assessment methods is given in Figure 2.

List of Assessment Methods Types Main Equations Meaning of Variables
Market value method is the economic loss of environment quality; P is the market price of good i; ΔR is the yield reduction of good i that caused by pollution and ecological damage.

Opportunity cost method
is the opportunity cost of the loss; V2 is the Unit opportunity cost of the certain resource; W2 is the amount of resources being polluted or damaged.

Engineering cost method
S3 is the cost of prevention and controlling pollution or destruction; V3 is the unit costs of protecting, restoring or replacing the original environmental functions; Q is the unit costs of protecting, restoring or replacing the original environmental functions;

Study of Air Pollution Control Efficienc
In order to solve increasingly serious environmental pollution problems, many countries have introduced policies to control pollution. In addition, the United Nations has organized international conferences, appealing to all countries to jointly manage global pollution. So, are these pollution control methods effective? The efficiency of environmental pollution control is the input and output efficiency in the process of environmental pollution control, reflecting the input of environmental pollution control and its pollution control effect.

Types Main Equations Meaning of Variables
Market value method S 1 is the economic loss of environment quality; P is the market price of good i; ∆R is the yield reduction of good i that caused by pollution and ecological damage.
Opportunity cost method S 2 is the opportunity cost of the loss; V 2 is the Unit opportunity cost of the certain resource; W 2 is the amount of resources being polluted or damaged.
Engineering cost method S 3 = V 3 × Q S 3 is the cost of prevention and controlling pollution or destruction; V 3 is the unit costs of protecting, restoring or replacing the original environmental functions; Q is the unit costs of protecting, restoring or replacing the original environmental functions;

Study of Air Pollution Control Efficienc
In order to solve increasingly serious environmental pollution problems, many countries have introduced policies to control pollution. In addition, the United Nations has organized international conferences, appealing to all countries to jointly manage global pollution. So, are these pollution control methods effective? The efficiency of environmental pollution control is the input and output efficiency in the process of environmental pollution control, reflecting the input of environmental pollution control and its pollution control effect.
Larsson et al. calculated air pollution control efficiency of the different enterprises in Norway [38]. They examined the effect of both technical efficiency and environmental efficiency. The governance efficiency of SO 2 in each province of China was calculated by Shi et al. using the data envelopment analysis (DEA) method [39]. Wang et al. used a super efficiency DEA model to analyze the atmospheric pollution governance efficiency in various provinces of China from 2004 to 2009 [40]. Xie et al. studied Beijing and built an odd-and-even license plate model by a probabilistic modeling method and the analysis of means to quantify the pollution caused by vehicle exhaust emissions and the actual effect of the license plate limitation rule [41]. Fan et al. indicated that the rate of industrial waste gas governance is low, and there are significant differences in the governance efficiency of different pollutants [42]. Moreover, the Fan et al. research on China's industrial air pollution control showed that, in different sectors, the air pollution treatment efficiency and its contributions from efficiency change and technology change differ significantly, and the contribution of technology advancement to the efficiency of industrial air pollution treatment are weak [43].

Remark 1.
There is much air pollution control efficiency research for different industries and different pollutants, and the main result of that research is to take pollution management related costs as input variables and pollutant emission reductions as output variables and use a DEA model to calculate pollution abatement efficiency.

Air Pollution Early Warning and Forecast
The most important function of air pollution early warning systems is to report the air quality to relevant departments when the air quality reaches the early warning standard. A complete pollution warning system includes the pollutant, resource, and scope of influence [44].
Air quality forecasting is an effective way of protecting public health by providing an early warning against harmful air pollutants [9]. Urban air pollution events can be forecasted by meteorological elements to provide an early warning. Therefore, in the face of more and more urban air pollution incidents, in addition to risk prevention management and emergency measures, air pollution forecasts should also include the emergency warnings as an important part of the whole emergency system. The early warning system for air pollution is triggered before the heavy pollution of urban air, according to the forecast of meteorological elements. Corresponding emergency measures are initiated as early as possible to reduce the discharge of pollutants and mitigate the consequences. Many countries have early warning systems for pollution. For example, the Air Quality Index (AQI) value is an index for the classification of the early warning level in China, and the early warning level is determined according to the upper limit of the pollution forecast. Therefore, the forecasting of air pollution as the basis for pollution warning systems and pollution control should be highly valued by all countries.
In China, Tang believed that air pollution is different from water pollution, because serious pollution incidents may occur in a short time, if the weather situation is not conducive to the spread of pollutants [45]. However, Hong et al. showed that the potential forecast only forecasts the weather conditions of air pollution, and this method failed to give exact quantitative results for air pollution. Therefore, they developed a numerical forecasting method [46].
Among these various methods, there is a classic forecasting method used to forecast air pollution quantitatively, namely the AI method. Grivas et al. developed an Artificial Neural Network (ANN) combined with meteorological and time-scale input variables [22]. The input variables were selected by using a genetic algorithm optimization procedure.
In addition, the hybrid model also developed and performed well, and widely used in various fields [47][48][49] [51]. Xu et al. proposed a robust early warning system that includes an evaluation module, forecasting module, and characteristics estimation module. This system defines the air quality levels and is also used to determine the main pollutants [52].
In the following sections, a thorough analysis and summary of the forecasting of air pollution will be provided. The structure of the second section is clearly summarized in Figure 3.  [51]. Xu et al. proposed a robust early warning system that includes an evaluation module, forecasting module, and characteristics estimation module. This system defines the air quality levels and is also used to determine the main pollutants [52]. In the following sections, a thorough analysis and summary of the forecasting of air pollution will be provided. The structure of the second section is clearly summarized in Figure 3.

Abbreviation Explanation and Error Assessment Index
There are many methods were descripting in our paper. And it is necessary to assessment the results of forecasting, different articles use different assessment indexes. For the convenience of reading, we give the abbreviations and used assessment indexes of various methods in the form of a list for easy reference.

Abbreviations
Abbreviations of methods are listed in Table 2.

Abbreviation Explanation and Error Assessment Index
There are many methods were descripting in our paper. And it is necessary to assessment the results of forecasting, different articles use different assessment indexes. For the convenience of reading, we give the abbreviations and used assessment indexes of various methods in the form of a list for easy reference.

Abbreviations
Abbreviations of methods are listed in Table 2.

Error Assessment Method and Index
Many performance indexes have been proposed in the field of error assessment. The definitions and formulas of the indexes involved in this study are shown in Table 3, where F i represents the forecasting value, and A i represents the actual value. Table 3. The definitions and formulas of indexes involved in this paper.

Metric
Definition Equation

MAE
The mean absolute error of N forecasting results The mean squared error of N forecasting results The square root of average of the error squares The normalized average of the squares of the errors The index of agreement of forecasting results Table 3. Cont.

AE
The absolute error of forecasting results

Potential Forecasting Methods
Potential forecasting was widely used in the 1960s. It is based on combination of statistical and numerical prediction, according to certain conditions of the pollution source, the weather situation, and the meteorological index to construct the pollution potential index (PPI), and the qualitative or semi quantitative forecast of the atmospheric environment quality in the future is conducted. When the weather conditions are expected to be in line with the standards of possible serious pollution, a warning is issued [19,45].
Scott et al. applied an air pollution potential forecast model based on a synoptic climatological approach to forecast the concentration of SO 2 in a heavily industrialized area in Durban (South Africa). Their proposed model identified periods of elevated SO 2 successfully [53]. However, potential prediction without considering the location of pollution sources and emissions of pollutants and the accuracy of prediction is low. Therefore, more statistical models, artificial intelligence, and hybrid models are used currently.

Statistical Forecast Methods
Statistical forecast methods analyze the events without knowing the mechanism of the change; therefore, this method is not dependent on physical, chemical, or biological processes. Statistical forecasts include parametric and non-parametric statistical methods [54]. Parametric models are traditional statistical models such as linear regression and principal component analysis; nonparametric models have no defined function form. Generally, nonparametric regression includes neural network models, Gaussian process regression etc., a detailed review of the application of statistical prediction models was published in [55].

Regression Methods
Regression analysis is a statistical tool that investigates relationships between variables. Usually, the researchers seek to ascertain the causal effect of independent variables Y upon dependent variables x i [56]. When we use the model to forecast y for a particular set of values of x i , we want to measure how large the error of the forecast might be. All these elements, including dependent and independent variables and error, are part of a regression analysis, and the resulting forecast equation is often called a regression model [57]. Regression analysis is a basic technique in air pollution forecasting.
Linear regression plays a strictly utilitarian role in the field of statistical methods. Its expression is as follows: A multiple-linear regression (MLR) model is given as: or: where Y is the dependent variable, x and x i are the independent variables, b and b i are the regression coefficients, and e is the error. It has a normal distribution with a mean of 0. For air pollution forecasting, Y represents the pollutant concentration forecast at time t + 1, x i represents the pollutant concentrations and meteorological variables at time t, b i are the regression coefficients, and e is an estimated error term obtained from independent random sampling. The values of b i can be obtained by using a least squares error technique [58].
Nonlinear regression analysis is an extension of the linear regression analysis, as well as the structural model of a traditional econometric analysis. In the social reality of economic life, many relationships between the analysis and forecast are generally used in nonlinear regression methods instead of a linear relationship.
In the classical regression analysis, solving the nonlinear regression problem requires the conversion of variables to a linear relationship and the use of linear regression theory to determine the regression coefficients [59]. This method has been widely used for many years in practice.
General nonlinear regression models can be written in the following form [59]: For some special nonlinear relationships, variable transformations can be used to convert the nonlinear relationship into a linear one. The nonlinear equation can be transformed into a linear equation using the categories shown in Table 4. Table 4. Transformation of the nonlinear regression and linear regression.

Types Nonlinear Function Do Transformation Linear Function
Hyperbolic function Cortina-Januchs et al. used the cluster algorithm to find relationships between PM 10 and meteorological variables and then used multilayer regression to forecast the concentration of PM 10 . The results show that meteorological variables are important in air pollution forecasting [60].

Remark 2.
It should be noticed that there are many hypotheses for different regressions; and if any hypothesis is violated, the resulting estimate is biased. Therefore, the availability of regression methods should be taken into full consideration in solving exact problems. Moreover, in order to improve the prediction accuracy of regression equations, researchers often increase the variables in the regression equation. However, the increase of independent variables will increase the calculations. The regression process becomes longer, and the prediction problems and control problems become complicated. Therefore, the main problem of the regression model is to choose the variables for the regression equation. This requires significant experimental investigation.

ARIMA Methods
The autoregressive integrated moving average (ARIMA) model is a linear model that can show steady state in both stationary and non-stationary time series. When constructing the ARIMA model, there are three main steps (Rahman et al. [21]): Step 1. Tentative identification

Step 2. Parameter estimation
Step 3. Diagnostic checking ARIMA with a seasonal difference is called SARIMA. SARIMA processes the data with a seasonal period length S; and if d and D are non-negative integers, the difference series, is a stationary autoregressive moving average process [61]. The SARIMA model can be written as: where N is the number of observations up to time t; B is the backshift operator defined by is a seasonal moving average operator of order Q; ε t is identically and independently distributed as normal random variables with mean zero, variance α 2 and cov(ε t , ε t−k ) = 0, ∀k = 0 [61]. In the definition above, p represents the autoregressive term; q is moving average order; P represents the seasonal period length of the model, S, of the autoregressive term; Q represents the seasonal period length of the model, S, of moving average order; D represents the order of seasonal differencing; and d represents the order of ordinary differencing [61].
When fitting a SARIMA model to data, the estimation of the values of d and D is primary, with the orders of differencing needed to make the series stationary and to remove most of the seasonality. The values of p, q and Q need to be estimated by the autocorrelation function (ACF) and partial autocorrelation function (PACF) of the differenced series and other parameters can be estimated by suitable iterative procedures [61]. Rahman et al. (2015) forecasted the API from three different stations [21]. The forecasting accuracy of the possible SARIMA model is shown in Table 5.

Remark 3.
The ARIMA model requires time series data to be stable or stable after differentiation. Moreover, the ARIMA model can only describe the linear relationship between variables to model and predict and cannot describe the nonlinear relationship between variables. However, pollution data are complex and combine geography, weather, and other factors to make data unstable and nonlinear, so the data should be processed into a stable and linear format before forecasting by ARIMA. If the data cannot be processed into stable and linear, other forecasting models should be chosen.

Projection Pursuit Model (PP)
This method was developed in the 1970s. The main idea of air pollution forecasting statistical methods is to be a "supposition-simulation-forecast", so those methods are not suitable for analyzing the data of nonlinear relationships or non-normal distributions. In contrast, the projection pursuit (PP) technique presents a new method of exploratory data analysis of "review of data-simulation-forecast", which can be used to a certain extent in some nonlinear problems [62]. The main idea of projection pursuit is to machine-pick low dimensional projections of high dimensional point cloud by numerically maximizing a certain objective function or projection index [63].
The general form of an order K PP autoregression model is as follows: where Z m is the estimated value of time series {x} at t time; x i represents K time series forecast factors, its selection is ultimately determined by the data structure; a m represents the projection direction for the mth content, it satisfies a m = 1; G m is the optimal piecewise linear function of Z m , called ridge function. It is a numerical function; β m is the weight coefficients of the contribution of the mth ridge function to X t . The optimization process of the final model can be divided into two steps [6]: Step 1. Local optimization process The highest linear combination of M and the optimal parameters α m and β m , and the ridge functions G m are determined by the stepwise alternating optimization method.

Step 2. Global optimization process
In order to find a better model, the linear combination of M and the number of parameters were optimized further, eliminating the unimportant items in the model one by one. The model number decreased to M u ,M u−1 ,L,1, determined the number for M, and found the best solution of the minimum M. Deng et al. (1997) used PP regression to forecast SO 2 concentration based on historic data [62]. At first, standardizing SO 2 concentration data according to Equation (9): The range of C 1 i values is listed in Table 6: Table 6. Division of C 1 i values.
Sample test results are shown in Table 7. The authors defined that when the absolute relative errors were less than 20%, the result was qualified; therefore, in their study, the forecast accuracy of the sample was 75%.

Remark 4.
The PP method overcomes the difficulties of the "dimensionless curse" caused by high dimensional distribution and has the advantages of assumption, objective, robustness, anti-interference, accuracy, wide applicability, and rapid modeling. It can adapt to the form of flexible development requirements. For different research objects, it can use various forms of the model based on this method. Therefore, a series of methods, such as the PP regression and PP clustering methods, have been derived. However, this method also has many disadvantages, including complex computation, difficulties in finding the optimal projection direction, falling into local optima easily, and difficulties in solving highly nonlinear problems.

Principal Component Analysis Model
A principal component analysis (PCA) is a multivariate statistical analysis technique based on data compression and feature extraction. PCA is able to extract the dominant patterns in the matrix in terms of a complementary set of score and loading plots. And those extracted patterns contain majority information of the original data [64].
A PCA reduces the number of predictor variables by transforming them into new variables; those new variables are called principal components (PCs). These PCs retain the maximum possible variance of the same data. The correlation matrix of the normalized input data can provide the PCs, and the eigenvalues of the correlation matrix "C" are obtained from its characteristic equation as given in Equation (10) [25]: where λ is the eigenvalue, and I is the identity matrix. For every eigenvalue, there is a non-zero eigenvector, which can be defined as: The ith variance of the ith PC is given as: After obtaining all of the PCs, the initial data set is transformed into the orthogonal set by multiplying the eigenvectors [58,65].  proposed a PCR model to forecast AQI in Delhi. The so-called PCR model transformed the data set into a multiple linear regression equation [58].

Remark 5.
The PCA algorithm reduces the dimensions of a series. It converts a number of related variables into a small number of unrelated variables that contain large amounts of original information. In the application of PCA, we choose the index to be representative, objective, independent, and comprehensive. At the same time, if the data set contains extreme values and nonlinear variables, the analysis effect will be greatly discounted. Therefore, the nonlinear PCA and independent component analysis methods are proposed, and these two methods are widely used in the forecasting field, but they are rarely used in pollution forecasting and need to be further explored.

Support Vector Regression
Support vector regression (SVR) is the application of support vectors in a regression function. There are two main types used for the regression analysis in SVR: ε-SVR and ν-SVR. SVR have advantages in high dimensionality space because SVR optimization does not depend on the dimensionality of the input space [66].
In the highly dimensional feature space, there is a linear function, which maps the input data into higher dimensional space through nonlinear mapping. Such a linear function is known as the SVR equation [24]: where f (x) indicates forecast value; w is N-dimensional weight vector; the dimension of w is the dimension of feature space; b is the threshold. The specific calculation method of (w,b) is given in [26]. Chen et al. used SVR to forecast the concentration of SO 2 . First, they analyzed and forecasted the influencing factors. Next, as a key step, they preprocessed the daily average concentration of SO 2 , covering the period during 2001-2002 in Xi'an by using PCA to reduce the dimensionality of the input factors. Finally, the support vector regression model based on the radial basis function (RBF) kernel was established [67].

Remark 6.
Statistics are widely used in the forecasting field, and many existing models are based on it, such as the support vector machine (SVM). However, there are some problems in the application of classical statistical forecasting. For example, forecasting results from a single model are worse and have a low degree of integration with other methods. Therefore, researchers improve the statistical forecasting methods through various channels, such as proposing new hybrid models, changing the form of input variables, and studying new criteria for error evaluation. These measures have improved the prediction accuracy to varying degrees.

Artificial Neural Network
An ANN is a NN that mimics animal behavior characteristics. It is a mathematical model of distributed parallel information processing. ANN relies on the complexity of the system, through adjustment of the internal connection between a large numbers of nodes, to achieve the purpose of processing information. The NN has the capabilities of self-learning and self-adaptation.
A common feed forward Network Multilayer consists of three parts: the input layer, hidden layer, and output layer, and each of the layers contains several processing units connected by acyclic links. Those link points are named neurons.
From the viewpoint of mathematics, the hidden neuron h j can be described by the Equation (14) [68]: where φ(z j ) is an activation function, usually expressed as The relationship between the output f (x) and the inputs has the following representation: where w j is a model parameter, often called connection weights; q is the number of hidden nodes. An ANN is representative of AI methods for forecasting air pollution. Wang et al. (2015) used an ANN model to forecast the concentrations of SO 2 and PM 10 in four stations in Taiyuan to compare with a hybrid model. The ANN forecast accuracy is shown in Table 8 [68]. In Rahman's study, they contrasted the results of SARIMA, ANN, and a fuzzy time series (FTS), and the results are shown in Table 9. The study results indicated that the ANN model was capable of modeling and forecasting index values of API [21]. Elangasinghe et al. built an ANN air pollution forecast model based on meteorological parameters and the emission patterns of the sources. First, they identified the various data sets, and after cleaning, normalizing, and randomizing the data, they built an ANN model. Then, they applied forward selection, backward elimination, and genetic algorithms with sensitivity analysis techniques as the selection tool to eliminate the irrelevant inputs from the network [23].

Remark 7.
The ANN, as the simplest NN, has been applied to predict air pollution. It has good nonlinear fitting ability and improves the prediction accuracy. However, there are many factors affecting pollution and the relationship is complicated for clarifying the relationship between these factors and improving the prediction accuracy of the ANN.

Back Propagation Neural Network
Back propagation (BP), meaning "error backward propagation", is one of the most widely used NN models, which is trained by the error back propagation algorithm. It consists of two processes: the forward propagation of information and the back propagation of error. When the actual output is not in conformity with the expected output, the reverse propagation phase of the error is entered. The error is corrected by the output layer, and the weight of each layer is updated by the error gradient descent method. The cycle of information forward propagation and error back propagation processes and the constant adjustment of the weights of each layer are the learning and training processes of the NN, and those two processes are executed until the network output error is reduced to an acceptable level or pre-set learning times are reached.
When modeling a BPNN, the number of hidden nodes is the primary variable to be determined. Recently, the trial and error method and an empirical formula (Equation (16)) have been applied to solve this issue [24]: Bai et al. improved the BPNN model based on wavelet decomposition to improve the feature representations in multi-scales and weaken the randomness. The operations of the model are as follow [24]: Step 1: Collect the modeling data that contain historical air pollutants concentrations C and meteorological data M.
Step 2: Perform the stationary wavelet transform (SWT) to decompose the time series of C.
Step 3: Normalize the meteorological parameters and one level of wavelet coefficients into [0, 1] according to Equation (17): Step 4: Calculate the tth wavelet coefficients of the zth scale using BPNN z , z = 1, 2, . . . , l, l + 1 with the tth meteorological data and (t − 1)th wavelet coefficients: Step 5: Perform the inverse SWT to generate the estimated daily pollutants concentrations.
Step 6: Output the forecasting result.
The comparison between the results of W-BPNN and BPNN are shown in Table 10. From the table, we find that the values of the mean absolute percent error (MAPE) and root mean square error (RMSE) for W-BPNN are lower than the values for BPNN, which indicates that W-BPNN has the best forecasting performance. Wang et al. improved the BPNN from other side. They added SSA algorithm to reduce the effect of chaotic nature on pollution sequences and improve BPNN forecasting performance [69].

Remark 8.
The convergence speed of the BPNN is slow, and it cannot guarantee the convergence to the global optimum. At the same time, the selection of the operational parameters of the BPNN is generally based on experience and lacks theoretical guidance. Therefore, when using BP, it should be combined with other optimization algorithms to improve its prediction accuracy.

Wavelet Neural Network
Another commonly used NN is the wavelet NN. As the name suggests, the term wavelet means a small waveform, and "small" means that it has decay, and "wave" refers to its volatility. Wavelet analysis is used to gradually refine the signal (function) through expansion and translation operations.
Finally, the high frequency is subdivided by time, and the low frequency is subdivided by frequency. A wavelet analysis can automatically adapt to the requirements of a time-frequency signal analysis, so it can focus on any details of the signal.
The wavelet function ψ(t) refers to a shock characteristic that can quickly decay to zero for a class of functions, defined in Chen [70] as: If ψ(t) satisfies the following admissibility condition (Equation (21)), we term ψ(t) as a basic wavelet or wavelet: After dilation and translation of function ψ(t), we obtain Equation (22): This is called a wavelet sequence, where a is the expansion factor, and b is the translation factor. The wavelet transform of the function f (t) is as follows: where w f (a,b) are the wavelet coefficients, which can reflect the characteristics of the frequency domain parameter a and the time domain parameter b. When parameter a is smaller, the resolution of the frequency domain is lower, but the resolution is higher in the time domain. In contrast, when a is larger, the resolution of the frequency domain is higher, and the resolution is lower in the time domain. Therefore, the wavelet transform can realize the time frequency localization of the fixed size and variable shape of the window. Chen applied the method of wavelet analysis and neural networks to forecast the concentration of atmospheric pollutants. The steps in the study were [70]: Step 1: The low frequency coefficients of the highest layer are reconstructed after wavelet decomposition, clearly determining the annual change of atmospheric pollutant concentration. By using wavelet decomposition, the lowest two layers with high frequency signals are reconstructed, so abrupt change points of the time series of atmospheric pollutant concentration can be clearly judged.
Step 2: The time series of atmospheric pollutant concentration are decomposed into different frequency channels by wavelet decomposition, and then the corresponding time series model is considered. Finally, the predicted values of different frequency channels are combined to obtain the predictive value of the original time series.
Step 3: The input samples of the NN prediction model are studied, and the input variables of the NN prediction model are analyzed by using the principle of atmospheric pollution meteorology. Then, the PCA is used to reduce the dimension of the input variables.
Step 4: The annual variation trend of atmospheric pollutant concentration time series are segmented by wavelet decomposition and reconstruction. On this basis, the NN prediction model is designed for each segment.
Step 5: The decomposed wavelet coefficients are reconstructed to the original scale, and the NN that contains the meteorological elements is applied to analyze the wavelet coefficients of low and medium frequency. For the high frequency wavelet coefficients, the wavelet coefficients of the first few days are used as the input values of the NN model. Finally, the forecast of each wavelet coefficient sequence value is synthetized, and the forecasted value of the original sequence is obtained.
A summary of commonly used wavelet is shown in Table 11. Table 11. Short summary of commonly used wavelet.

Wavelet Main Equations Description
Haar function is the earliest use of wavelet analysis in the wavelet, and is also the simplest wavelet. The function itself is a step function Mexican Hat wavelet is the two-order derivative of Gauss function (plus minus) Morlet wavelet does not have orthogonality and no compact support set, so it can only satisfy the condition of continuous wavelet, but cannot be discrete wavelet transform and orthogonal wavelet transform Daubechies wavelet Assuming, P(y) = ∑ N−1 k=0 c N−1+k k y k among them, is the binomial coefficient; Daubechies wavelet function is the standard orthogonal wavelet, which makes it possible to analyze the discrete wavelet transform.

Remark 9.
Wavelet analysis is proposed to overcome the shortcomings of the Fourier transform in analyzing non-stationary signals, and it can effectively extract the local information of signals and has good analysis ability for the localization time-frequency. However, the selection of the wavelet basis is difficult.
ANNs have the advantages of self-organized learning and adaptive and good fault tolerance; however, the traditional NNs also have some shortcomings, such as slow convergence, and they easily fall into local minima. Therefore, Zhang et al. proposed the concept and algorithm for a wavelet NN in 1992. Wavelet NNs inherit the merits of a wavelet analysis and NN and function well for of approximations and for their pattern classification ability; therefore, they are widely used in prediction [71].

Support Vector Machine (SVM)
Support vector machine (SVM) is a new generation of machine learning technology based on statistical learning theory developed by Vapnik, and practical problems, such as small samples, nonlinearity, high dimensions, and local minimum points, can be solved effectively. SVM is successfully used in classification, regression and time series forecasts, and other fields. Originally, SVMs were used for classification purposes, but their principles can be extended easily to the task of regression and time series forecasting [61].
Forecast models of atmospheric pollutant concentration change based on SVM. The key issues are the determination of the input mode, the selection of training samples, and the selection of model structure parameters [72]. The steps to build an atmospheric pollutant concentration forecast model are as follows:
Select kernel function and parameter values.

3.
Train the sample to provide the SVM forecast model with optimized parameters, get the support vector, and then determine the structure of the SVM.

4.
Train the support vector predictor to forecast the test samples.
Chen used SO 2 as an example and established a forecast model for atmospheric pollutant concentration. The author chose different kernel functions to analyze and compare each function's mean relative error (MRE) and RMSE. Ultimately, studies showed that different kernel functions have different prediction results. They established the model that combined wavelet decomposition with SVM to forecast urban atmospheric pollutant concentration [67]. Wang et al. improved the forecast accuracy of SVM by using the Taylor expansion forecasting model to revise the residual series [68]. The forecast accuracies are shown in Table 12. Remark 10. SVMs were initially used as a pattern recognition method based on statistical learning theory and has better predictive processing ability for small samples and nonlinear data. However, the SVM algorithm is difficult to be implemented in large-scale training samples. When the number of training samples is large, the storage and computation of the data matrix will consume a great deal of machine memory and computation time. At the same time, SVMs are sensitive to missing data. If there are more missing values in the data sequence, the accuracy of the forecasting results will be affected. SVMs have no general solutions to nonlinear problems. Since the choice of kernel function is the key to solving problems, the kernel function should be carefully chosen. In addition, the SVM algorithm only gives a two-class algorithm; there are limitations on the solution of multi-classification problems.

Fuzzy Time Series (FTS) Analysis
In 1993, Chissom and Song proposed the definition of fuzzy time series (FTS) based on fuzzy set [73]. At present, FTS has been used in the field of air pollution forecasting. The method for forecasting the API using the FTS simply can be presented as follows [21,74]: Step 1: Define and partition the universe of discourse U = (D min − D 1 , D max + D 2 ) into several equal intervals denoted as u 1 ,u 2 ,L,u m .
Step 2: Based on the SARIMA model, determine the FLRs.
Step 3: In order to select the best input for FLR, different combination inputs are attempted from single input to two inputs, three inputs and four inputs.
Step 4: The optimum length of intervals was calculated by following the average-based length.
Step 5: The forecasted outputs are calculated.
Rahman et al. forecasted the API for three different stations in Malaysia [21]. The forecasting accuracy in the testing period of the FTS model is shown in Table 13.

Fuzzy Recognition
Fuzzy pattern recognition recognizes a given object, and there are usually two processes in fuzzy identification: the recognition process and the learning process. Figure 4 shows the steps for the two processes. Xiong et al. proposed the fuzzy recognition theory and model for air pollution concentration forecasting and made an empirical study based on the measured concentration data of SO2. Fuzzy recognition can be used to forecast the information [75]. The forecast model contains the index weight matrix, which provides a new way of improving the forecast accuracy.

Remark 11: Theoretically, fuzzy methods have a high fault tolerance and do not require accurate mathematical models with each man-machine interaction; however, those methods have a relatively low accuracy and lack systematization. The computation of fuzzy identification is complex, and the performance of the fuzzy time series forecasting method is affected when outliers exist.
The fuzzy method should be further optimized during its application, for example, combining subtractive clustering, optimizing the parameters of membership functions, and combining a BPNN to optimize fuzzy rules. The quantization factor and proportionality factor are optimized based on genetic algorithms.

Adaptive Neural Network Fuzzy Inference System
ANFIS is a fuzzy inference system based on adaptive networks structure, it uses neural network algorithms to obtain fuzzy rules and membership functions from data, and uses neural networks to implement fuzzy inference processes. The general principles and methods of ANFIS have been systematically studied and summarized in the Ref [76]. ANFIS is composed of two parts: primary and inference. These two parts are connected by a network with fuzzy rules [77].
The most commonly used ANFIS structure is the Sugeno fuzzy model. The basic structure can be expressed as a feedforward NN with five layers [78]: Xiong et al. proposed the fuzzy recognition theory and model for air pollution concentration forecasting and made an empirical study based on the measured concentration data of SO 2 . Fuzzy recognition can be used to forecast the information [75]. The forecast model contains the index weight matrix, which provides a new way of improving the forecast accuracy.
Remark 11. Theoretically, fuzzy methods have a high fault tolerance and do not require accurate mathematical models with each man-machine interaction; however, those methods have a relatively low accuracy and lack systematization. The computation of fuzzy identification is complex, and the performance of the fuzzy time series forecasting method is affected when outliers exist. The fuzzy method should be further optimized during its application, for example, combining subtractive clustering, optimizing the parameters of membership functions, and combining a BPNN to optimize fuzzy rules. The quantization factor and proportionality factor are optimized based on genetic algorithms.

Adaptive Neural Network Fuzzy Inference System
ANFIS is a fuzzy inference system based on adaptive networks structure, it uses neural network algorithms to obtain fuzzy rules and membership functions from data, and uses neural networks to implement fuzzy inference processes. The general principles and methods of ANFIS have been systematically studied and summarized in the Ref [76]. ANFIS is composed of two parts: primary and inference. These two parts are connected by a network with fuzzy rules [77].
The most commonly used ANFIS structure is the Sugeno fuzzy model. The basic structure can be expressed as a feedforward NN with five layers [78]: Layer 1: In this layer, every node i is an adaptive node and the node function is the membership function to determine the degree of satisfaction. All the parameters in this layer are called antecedent parameters.
where x is the input to node i, A i is a linguistic label to node i, and o 1 i is the membership grade of A i . Layer 2: Every node in this layer is a circle node labeled o 2 i and the output is the multiplies of all incoming signals [79]: Layer 3: The output of every node i is called normalized firing strength. Each node calculates the rate of the ith rule's firing strength to the sum of all the rules' firing strengths and normalization [78]: Layer 4: This layer is the conclusion layer, every node i is a square node or adaptive node with a node function. And parameters in this layer will be referred to as consequent parameters [79].
where ω i is the output of Layer 3 and (p i ,q i ,r i ) is the parameter set of this node. Layer 5: In this layer, the single node is a fixed node that computes the summation of all incoming signals [77].
Remark 12. It is being proven that the accuracy of AI forecasts is higher than traditional statistical forecasts. More recently, researchers select meteorological or geographic factors as input variables, and those adjusted models are shown to improve the accuracy of forecasting. From the results of a comparison by Rahman (2015), ANN can be used to predict the fluctuation series, which contain certain trends and seasonality, such as those in air quality data. However, ANN and SVM have limitations inherent to their input variables; their main defect is a failure to obtain complete information about research questions related to learning goals. Therefore, the shortcomings of ANNs facilitate the development of hybrid models.

Three Dimensional Models
Over the past few years, studies of air pollutants concentration forecast have been expanded from two-dimensional space to three-dimensional space. Compared to two dimensional models, three-dimensional models are more accurate when addressing complicated terrain, boundaries, etc. Three dimensional models take the movement of pollutants in the horizontal and vertical directions into account, and are more consistent with actual emission conditions, so that the distribution of pollutants concentration is more realistic. A detailed review of three dimensional model can be found in Ref [80]. Here we mainly introduce some representative models.

Emissions Methods
The atmospheric emissions model is also known as the air quality model. It has undergone three generations of development and improvement in the past 50 years.
The first generation of air quality model mainly includes the box model based on the mass conservation law, the Gaussian model based on the turbulence diffusion statistical theory, and the Lagrange trajectory model [81]. The first generation models had no or only simple chemical reaction modules, which limits their application in practice. However, these models are still widely used in the simulation of conventional pollutants due to their simple structure, fast calculation speed, and high accuracy of long-term concentration simulation.
From the early 1980s to the early 1990s, advanced in the study of physics and atmospheric chemistry mechanisms of clouds and precipitation, the air quality model has been correspondingly developed, in which more complex meteorological models, meteorological parameters, and detailed nonlinear chemical reaction mechanisms are added. Gradually, the second-generation air quality model based on the Euler grid model was formed.
Since the 1990s, in order to consider all of the atmospheric problems, the third generation air quality model based on the so-called "one atmosphere" was concept. The "One atmosphere" use the entire atmosphere as a research object, simulating all atmospheric physics and chemical processes at various spatial scales. The current mainstream models include CMAQ, Comprehensive Air Quality Model with Extensions (CAMx), and Weather Research and Forecasting Model coupled with Chemistry (WRF-Chem) etc.
At present, there are two main calculation methods of emission model: (1) Air quality forecasting based on statistical methods. They use statistical methods to analyze existing data, explore changes in the atmospheric environment, and predict concentrations of air pollutants by establishing statistical forecast models between air pollution concentrations and meteorological parameters. (2) Numerical forecasting based on atmospheric dynamics theory. These methods are based on the understanding of the physical and chemical processes of the atmosphere and use computers to forecast the dynamic distribution of air pollutants concentrations by establishing a numerical model for the transport and diffusion.

The Atmospheric Dispersion Modelling System
The Atmospheric Dispersion Modelling System (ADMS) is developed by the Cambridge Environment Research Corporation (CERC) and is currently one of the mainstream models of international atmospheric diffusion. The ADMS model uses a three-dimensional Gaussian model to calculate the concentration of pollutants with a Gaussian distribution, and to consider the distribution of pollutants under neutral and unstable conditions. ADMS needs to input the pollution source intensity data, meteorological condition data and topographic data of calculation point. The output includes the average concentration of contaminants at a certain point or area, and the average time ranges from 10 min to the annual average. The output also includes dry and wet deposition, and radioactivity effects. In addition, according to the air quality standard, the number of violations can also be calculated. At the same time, the meteorological preprocessing module will also generate an output file of meteorological data, the data is different from conventional meteorological data.
Riddle et al. compared atmospheric diffusion modeling capabilities between ADMS and another model which was based on computational fluid dynamics. The results showed that ADMS performed better under neutral stability, due to its fast calculation speed and simplicity in model setting up [82].
Manar et al. coupled the regional mesoscale model WRF and local scale ADMS to structure a one-way coupled modeling system WRF-ADMS. The main function of ADMS was to model fast atmospheric stability resolving and turbulence with Gaussian dispersion model. And in their study ADMS was driven by WRF in an offline mode [83]. In this system, the authors input large scale weather data (resolution initial and boundary conditions), and static data (resolution topography, resolution land use and soil data) to WRF framework, and then output three dimensional weather forecast predictors. And the input data of ADMS was very fine grid data, consisting of detailed characteristics of the stack properties, hourly averaged meteorological data, mesoscale wind filed, sensible heat flux and boundary layer height etc. And the output from ADMS system are wind/turbulence flow filed data and pollutants concentrations/depositions etc. The experimental results indicated that ADMS model allowed for simulations with the mesoscale forecast. And WRF-ADMS dispersion modeling system could simulate the observed sptial distributions of Perfluoromethylcyclohexane plumes [83].

The California Puff Model
The California Puff model (CALPUFF) is an unsteady three-dimensional Lagrangian puff transport and dispersion model that simulates the discrete and transform processes along the advects "puff" of matter emitted from model source. Puff mode is a relatively simple and flexible diffusion mode that can deal with severe weather conditions and pollution source parameters of variation in temporal and spatial and is more widely used than Gaussian plume mode.
Tartakovsky et al. calculated and forecasted the concentrations of particulate emissions from a quarry located in hilly terrain by CALPUFF and AERMOD (the American Meteorological Society-United States Environmental Protection Agency (US EPA) Regulatory Model). They compared the result of two models, and the result indicated that when the terrain was simple, and the data was good, the CALPUFF performed better. Moreover, the model was more sensitive to the quality of the meteorological data [84].
Abdul-Wahab et al. coupled the CALPUFF and WRF to investigate the transport and dispersion patterns of SO 2 from refinery in Oman, and to forecast the concentration of SO 2 . The initial phase in their study was to input meteorological data and geographical information to WRF in order to obtain meteorological fields for CALWRF (an interface program). Then, the second phase was to input meteorological fields generated from CALWRF and geophysical data to California Meteorological Model (CALMET). The final step was to extraction meteorological parameters from CALMET output file, and put those meteorological parameters into CALPUFF dispersion model to get predicted concentrations [85]. The process of WRF-CALMET-CALPUFF model is shown in Figure 5. The California Puff model (CALPUFF) is an unsteady three-dimensional Lagrangian puff transport and dispersion model that simulates the discrete and transform processes along the advects "puff" of matter emitted from model source. Puff mode is a relatively simple and flexible diffusion mode that can deal with severe weather conditions and pollution source parameters of variation in temporal and spatial and is more widely used than Gaussian plume mode.
Tartakovsky et al. calculated and forecasted the concentrations of particulate emissions from a quarry located in hilly terrain by CALPUFF and AERMOD (the American Meteorological Society-United States Environmental Protection Agency (US EPA) Regulatory Model). They compared the result of two models, and the result indicated that when the terrain was simple, and the data was good, the CALPUFF performed better. Moreover, the model was more sensitive to the quality of the meteorological data [84].
Abdul-Wahab et al. coupled the CALPUFF and WRF to investigate the transport and dispersion patterns of SO2 from refinery in Oman, and to forecast the concentration of SO2. The initial phase in their study was to input meteorological data and geographical information to WRF in order to obtain meteorological fields for CALWRF (an interface program). Then, the second phase was to input meteorological fields generated from CALWRF and geophysical data to California Meteorological Model (CALMET). The final step was to extraction meteorological parameters from CALMET output file, and put those meteorological parameters into CALPUFF dispersion model to get predicted concentrations [85]. The process of WRF-CALMET-CALPUFF model is shown in Figure 5. Falke et al. designed a system to enable the fire location, forecast smoke and calculates population statistics. They initialize a CALPUFF smoke model by using fire locations derived from satellites and surface observations and reconciled through US forecast service. Forecasting results were used to obtain population information that was expected to be affected by wildfire smoke, by combining with web service. These population information can be used to conduct preventive work [86]. Falke et al. designed a system to enable the fire location, forecast smoke and calculates population statistics. They initialize a CALPUFF smoke model by using fire locations derived from satellites and surface observations and reconciled through US forecast service. Forecasting results were used to obtain population information that was expected to be affected by wildfire smoke, by combining with web service. These population information can be used to conduct preventive work [86].

CMAQ Model
CMAQ was designed from the start as a community model. "Community modeling" refers to the concept that air quality model development should be a collective effort by a broad community of developers. For more details, please visit the website, https://www.cmascenter.org/cmaq/.
The CMAQ consists of several processors and chemical-transport models: Meteorology-chemistry interface processor (MCIP) Photolysis rate processor (JPROC) Initial conditions processor (ICON) Boundary conditions processor (BCON) CMAQ chemical-transport model (CCTM) The relationship of the modules is described as follows: (1) The core of the CMAQ is the chemical transport module CCTM, and it can simulate the transport process, chemical process, and sedimentation process of pollutants. (2) The initial module ICON and the boundary module BCON provide the initial field and boundary field of pollutants for CCTM. (3) The photochemical decomposition rate module JPROC calculates the photochemical decomposition rate. (4) The meteorological chemical interface module is the interface between the meteorological model and CCTM, and it can transform meteorological data into a CCTM identifiable data format.
The CCTM module can be extended, for example, to add a cloud process module, diffusion and transfer module, and aerosol module in this module. The operator can choose to add these modules in CMAQ in order to facilitate the simulation of the model in different regions. The meteorological field needed by the numerical calculation of CMAQ is provided by the meteorological models, such as the Mesoscale Meteorological Model 5 (MM5) and the Weather Research and Forecasting (WRF) Model. The required emission inventory is provided by an emission processing model, such as the Sparse Matrix Operator Kernel Emissions (SMOKE). But in MM5 version 3, output will be able to be processed by MCIP, and the MCIP is no longer needed with SMOKE [87]. CMAQ requires two primary types of inputs: meteorological information and emission rates from sources of emissions that affect the air quality [27].
Model-3/CMAQ is made up of three parts: CMAQ, MM5, and SMOKE. CMAQ is the core, and MM5 and SMOKE provide the necessary data. In this model, the meteorological background field provided by MM5 is developed first, and then the Meteorology-Chemistry Interface Processor (MCIP) is used to deal with the meteorological field and to provide data to CMAQ and SMOKE. The relationship between MM5, CMAQ, and SMOKE is shown in Figure 6.
Glahn et al. put forward the model output statistics method (MOS) [88]. The CMAQ-MOS model uses early atmospheric monitoring data combined with the data of CMAQ, which can correct the deviation of the air quality forecast caused by the subjective pollutant emission source. The CMAQ-MOS model needs the atmospheric monitoring data of the meteorological elements and multiple pollutants. The CMAQ provides the output of multiple pollutants using the CMAQ model. The mathematical expression of CMAQ-MOS is as follows [65]: where S n is the forecast object which represents the concentration of a particular pollution on the forecasting day (the nth day); the (n − 1)th day is the initial day; ̂ and ̂+ + are coefficient matrices; m is time stage and ̂+ + has three forecast factors represented as follows [65]:  The variables for CMAQ-MOS experiment include the wind field (U, V), temperature field (TS), and relative humidity (RH). The CMAQ provides the output of multiple pollutants using the CMAQ model. The mathematical expression of CMAQ-MOS is as follows [65]: (29) where S n is the forecast object which represents the concentration of a particular pollution on the forecasting day (the nth day); the (n − 1)th day is the initial day;Ĝ m andÂ M+N+L m are coefficient matrices; m is time stage andŜ M+N+L has three forecast factors represented as follows [65]:  The variables for CMAQ-MOS experiment include the wind field (U, V), temperature field (TS), and relative humidity (RH).
Zhao et al. used the information above to build equations for 12 observation sites to forecast the concentrations of PM 2.5 , PM 10 , SO 2 , NO 2 , and O 3 in Tianjin [65].
(1) PC is the forecast value of the pollutant concentration.
(2) "[]" represents the monitoring values on the initial forecasting day (1 represents the average concentration of the whole area, and 2 represents the average concentration of a single observation site) and the monitoring values of the meteorological element. (3) "{}" represents the CMAQ products for the forecast data. (4) a 1 L a n , b 1 L b m , c 1 L c l are coefficients and can be calculated by mathematical methods.

Remark 13.
The forecast accuracy of air quality models is largely dependent on the accuracy of the pollution sources and meteorological conditions. Therefore, it is more suitable for short-term pollution forecasting.
Moreover, the box model, Gaussian model, and K model, as the commonly used air quality models for air pollution forecasting, have some shortcomings. For example, the assumption of the box model is a deviation from the facts, so the concentration forecast value of the boundary layer of the Earth is lower. The Gaussian model has the advantages of being simple and practical and having high spatial resolution, but it has the following deficiencies [89]: (1) When the simulation scale is up to tens of kilometers or because of an uneven surface of the underlying surface, the flow field is more complex, and it is difficult to meet the requirements of the accuracy of the Gaussian smoke flow model. (2) Deposition and chemical transformation of the Gaussian model can only be treated roughly, when these processes are very important for the study and the Gaussian model is not applicable.
Moreover, because the K model is derived from the assumption that the gradient transport of molecular diffusion is modeled, it has some limitations: (1) It is assumed that the gradient transport is required to satisfy certain scale conditions so that the diffusion equation is correct when the smoke flow scale is larger than the dominant eddy. (2) In the convection condition, the relationship between gradient and transport is not established, so the K model cannot be applied. (3) The requirements for the basic information and input parameters of K model are very high.

Atmospheric Pollution Forecasts in China
The Hawaii Regional Climate Model (HRCM) model system, which uses an Euler model, was developed by the Institute of Atmospheric Physics of the Chinese Academy of Sciences. The system is composed of the following parts [70]: (1) Mesoscale meteorological model.  Its flow chart is shown in Figure 6. Among them, the (1)-(4) provide the input parameters and the initial field and boundary conditions. The HRCM model is the core of this system, which satisfies the following equations [70]: where: c = c j P L , c j is the volume mixing ratio of chemical substances, P L = P S − P t . σ = P − P z /P S − P t , P is air pressure, P t is the top pressure of model (P t = 100 hPa); P S is the pressure of surface. w = dσ dt is vertical velocity of σ coordinate system. P c and L c are the production and consumption rate caused by the chemical reaction. W t is the rate of change in the concentration of material caused by cloud. D y is the rate of change of concentration caused by dry deposition. E s is a source of pollution. The required data are the hourly three-dimensional wind field, temperature field, moisture field, turbulent diffusion field, underlying surfaces, hourly ground rainfall, and pollution emission inventory.
The Monte-Carlo multi-source model system is a multi-source Lagrangian model, including transport, diffusion, migration, and transformation processes. This forecast system is similar to the HRCM system, including a mesoscale meteorological model, dry and wet deposition model (DSDM), the planetary boundary layer model (PBLM), and HRCM (Monte-Carlo multi-source model system) [70].
The Monte-Carlo multisource model is developed from a statistical point of view, and the results of the trajectory tracking derive the probability density distribution function P: where r is the coordinate of the d xr ,d yr ,d zr small volume center at t time, and t 0j is the time when a particle j is away from the source. Therefore, the average concentration of pollutants is calculated as follows: The Monte-Carlo multi-source model needs input parameters of temporal and spatial scope, the time and space step, the three dimensional wind temperature field forecasted by the mesoscale-β meteorological model in the meteorological field, the temperature with changes in altitude, PBL turbulence statistical parameters, temporal and spatial precipitation distribution, underlying surface types, and sources of pollution data [70].
The CAPPS Model System uses the atmospheric advection diffusion model grid box and never considers the advection diffusion equation chemical reaction of atmospheric pollutants departure [70]: where c is the concentration of air pollutants; = k is the turbulent exchange system; ν d is the dry deposition velocity; ν w is the wet deposition velocity; ∑ q i δ(r i ) is in the volume τ; and the strength of several sources are located in r = (x i ,y i ,z i ) as the sum of the q i sources.
Integrated within the τ range and then volume averaged, the forecast equation of the average concentration in the box is obtained [70]: (34) According to the definition of the pollution index and potential pollution index integral to the above equation, the corresponding expression can be launched. The flow charts are shown in Figure 6.

Meteorological Models
Meteorological models, being part of the air quality model, are commonly used to provide meteorological parameters for air quality models. Or they coupled with other model to simulate the diffusion and trajectories of pollutants. In short, they are rarely used alone. The meteorological models calculate the weather data for the prediction by inputting specific data.

CALMAT Model
CALMET provides three-dimensional meteorological field for the CALPUFF diffusion model, including the diagnostic wind field module and the micrometeorological module. The diagnostic wind farm module generates the first wind field by adjusting the topographic dynamics, slope flow, and terrain obstruction effects for the initial guessed wind field, input the observe data and generate the final wind field through interpolation, smoothing, vertical velocity calculation and divergence minimization. The micrometeorological module uses the surface heat flux, boundary layer height, friction velocity, convection velocity, and other parameters to describe the boundary layer structure based on the parameterized method.
Cartellea et al. established a PrOlor system to forecast environment odor. This system was based on WRF, CALMET and CALPUFF model. Among them, CALMET produced very high resolution meteorological fields over the study domain [90].

WRF and MM5 Model
MM5 is the fifth generation of the National Center of Atmospheric Research/Penn State mesoscale model, WRF is Weather Research and Forecasting model. MM5 and WRF provide the meteorological input fields for vary air quality models, moreover, most studies coupled WRF with chemistry model to simulated and forecast concentration of pollutants (most details, see Section 6.3). In principle, MM5 and WRF are the same but with different characteristics. The simulation results of MM5 and WRF modes were compared by Cheng et al. [91]. The comparisons are shown in Table 14. The simulation results show that both MM5 and WRF can simulate the high and low areas of the temperature, but the temperature information forecast by WRF is more consistent with the measured data, and the simulated values of MM 5 are lower than measured values. Comparatively speaking, the WRF simulation of the high and low pressure center position and intensity is closer to the measured pressure field, and the forecast value from MM5 is bigger than the measured pressure. The velocity vector field of MM5 and WRF are consistent with the measured wind field, higher than the measured value.
In summary, MM5 simulates the high and low value area distribution of temperature, and the pressure and wind field meteorological elements are different from the measured values. Its simulation is worse than the WRF simulation; the relative humidity simulation results of MM5 and WRF are relatively higher than the actual data [91].

Chemical Models
In the real atmosphere, chemical and physical processes affect each other. For example, aerosols can affect the balance of atmospheric radiation. Cloud condensation nuclei can also be formed in the cloud, further affecting precipitation. Weather phenomena such as precipitation, wind, or turbulence can affect the Chemical transport and sedimentation process [92]. So, the chemical model is often used in couple with other models. The most popular chemistry coupling model is Online Coupled Chemistry with WRF. In the coupled model, the air quality component of the model is fully consistent with the meteorological component; such as, the same transport scheme (mass and scalar preserving), the same grid (horizontal and vertical components), and the same physics schemes for subgrid-scale transport [92]. Chuang et al. used WRF/Chem-MADRID to forecast real-time air quality. WRF/Chem-MADRID represents the WRF model with Chemistry combined with the Model of Aerosol Dynamics, Reaction, Ionization, and Dissolution [93]. But the forecast result is not satisfying, the concentration of O 3 is over-prediction and the concentration of PM 2.5 is under-prediction, the authors proposed the improvement scheme in the paper from meteorological perspective. Werner et al. applied the on-line WRF-Chem model to forecast the concentration of PM 10 over Poland. Based on forecast results, the author indicated that WRF-Chem performed better in O 3 forecast, confirming the significance of the non-linear processes taken into account in an online coupled Eulerian model, but WRF-Chem was difficult to capture the peak, it needs higher resolution sector based emission data and temporal emission profile. [94]. Table 15 lists the main recent studies on the three dimensional models in different urban areas.

Remark 14.
In recent years, the air quality simulation technology has developed rapidly, in particular, a model that combines geographic information and meteorological data. Currently, various air quality models have been widely used in environmental impact assessment, major scientific research, and environmental management and decision making, but they also encountered many problems in the practical application. Although simulation results of complex advanced model are good, the heavy calculation burden makes it infeasible in practical application. Therefore, finding an alternative model or simulating simplification is still an important problem.

Hybrid Systems
A hybrid system (HS) is characterized by a combination of any two or more of the methods [96]. The purpose of the HS model is to utilize the advantages of each method and improve the accuracy of forecasting as much as possible.

PCA-ANN
Mishra et al. proposed a hybrid model that combined statistical regression with a specific computational intelligence method for forecasting hourly NO 2 concentrations at the Taj Mahal in Agra, India [25]. At first, they used PCA to find the correlations between meteorological forecasting variables and air pollutants. Then, the significant variables were taken as the input parameters to propose the reliable physical ANN-multi layer perceptron model for forecasting air pollution in Agra. The forecast results are given in Table 16. The result indicated that the ANN-MLP model could not forecast well during high concentration pollution periods. However, the anthropogenic activities are the most important variables for forecasting.

Multilayer Perceptron Neural Network and Clustering Algorithm
Clustering analysis, also called group analysis, is a kind of multivariate statistical analysis method for the classification of samples or indicators. The object of discussion is a large number of samples, which can be reasonably classified according to their respective characteristics; this classification does not need prior knowledge or a model, and it can be used as reference.
K-Means algorithm. K-means is one of the unsupervised learning algorithms that solve clustering problems. K-means clustering is a clustering method used for a given cluster number k. The main idea is to randomly select K objects as the initial cluster centers, then calculate the distance between each object and the cluster center and assign each object to the nearest cluster center. This algorithm aims at minimizing the squared error function of the objective function. The objective function is [60]: 2 is a chosen distance measured between a data point x i (i) and the cluster c j . This is an indicator of the distance of the n data points from their cluster centers [60]. Fuzzy c-Means (FCM) algorithm. This clustering method allows one piece of data to belong to two or more clusters, and each element is associated with a set of membership levels. The algorithm is based on optimizing the objective function given by Equation (40) [60]: (36) where the matrix U = [µ j ]òM FCM is a fuzzy partition of the data set Z, and V = [ν 1 ,ν 2 ,L , ν c ] is the vector of prototypes of the clusters, which are calculated according to D ikA = z k − ν 2 i . This is a square inner product distance norm. The optimal partition U* of Z for the FCM algorithm is reached through the couple (U*,V*) that minimizes locally the objective function J FCM according to the alternating optimization.
Cortina-Januchs et al. [60] implemented clustering algorithms (K-means and FCM) to build the patterns as follows: P = C PM 10 , WS, WDI, T, HR where, C PM 10 is the PM 10 concentration; WS represents wind speed; WDI is the Wind Direction Index (WDI); T is temperature; HR is the relative humidity. The best forecast results for the three stations are shown in Table 17. The time window indicates the number of hours needed to make the forecast. Authors indicated that clustering algorithms can add useful information to the ANN by identifying groups with similar data characteristics and finding relationships between them that would not be obtained from other methods.

Hybrid Artificial Neural Network and Hybrid Support Vector Machine
Grivas et al. developed a model that uses a combination of meteorological data and time-scales as input variables for the ANN [22]. There are four methods in this study: feed forward multi-layer perceptron (FFMLP) NN, multi-layer perceptron (MLP) based on a genetic algorithm (GA) optimization procedure (mainly used to select input variables), MLP developed without meteorological input variables, and MLR. Finally, they compared the forecasting ability of these models. The results can be seen in Table 18. The differences in Table 18 indicate that FFMLP and GA-MLP are more effective than MLPnomet and MLR. In other words, the forecasting model will work better if more elements are considered or a hybrid model is proposed and applied. Step 1. Selection of appropriate predictors based on gray correlation analysis. Some air pollutants (CO, NO 2 , O 3 , and SO 2 ) and meteorological factors (WS/D, T, H, and P) might affect the PM concentration, and using the gray correlation analysis to obtain the influence law of PM is a primary concern.

CS-EEMD-BPANN Model
Step 2. Use the EEMD technique to filter out the white noise or useless information for selecting influencing factors and PM concentration.
Step 3. The data sets with the noise removed are input into the BPANN model to obtain the forecasted values. In this study, the forecast model is a novel BPANN-based multi-step-ahead forecasting model, and the CS algorithm is used to optimize the connection weights and thresholds of the BPANN architecture to make it more stable.
The authors used data that were measured in the winter of 2013-2014 in Beijing, Shanghai, Guangzhou, and Lanzhou, and the results are shown in Table 19.
It can be found from the comparisons that the CS-EEMD-BPANN model performs better.

ICEEMD-SVM-WOA
Xu et al. proposed a hybrid air quality early-warning system, which is combined with ICEEMD, SVM, and WOA. The steps of the hybrid method can be summarized as follows [52]: Step 1. ICEEMD is used to decompose the original time series into several intrinsic mode functions (IMFs) for eliminating the negative influence of noise and to exploring the inner characteristics of the data Compared with CEEMD model, the ICEEMD model is mainly improved from two aspects: (a) CEEMD modes contain some residual noise; (b) the signal information appears "later" than in EEMD with some "spurious" modes in the early stages of the decomposition [98].
Step 2. The SVM optimized by WOA is employed to build a predictor for each IMF. SVM is used to predict each IMF, among them, WOA is used to obtain the proper weight coefficient of each predictor. The leave-one-out strategy is performed to integrate all forecasted IMFs and then obtain the final forecast result.
The ICEEMD-SVM-WOA model results in study areas are shown in Table 20. It's proved that the hybrid model ICEEMD-SVM-WOA is superior to the other four benchmark models used in this study. To facilitate the comparison, the above mentioned hybrid models are summarized in Table 21. Table 21. The short summary of hybrid system for air pollution.

Author Main Contribution
Chen et al. [17] Combining numerical forecast (WRF) with statistical analysis (temporal synoptic index) to forecast high-PM 10 concentration in Beijing. This hybrid forecast system forecasts high-PM pollution events is more accurately than current forecast methods. It combines the strengths of various methods while avoiding the disadvantages found when statistical forecast methods are used alone.
Zhou et al. [99] Established a hybrid EEMD-GRNN model to forecast the concentration of pollutants in Xi'an, which was shown to be superior to other conventional models.
Qin et al. [97] Proposed the CS-EEMD-BPANN model for forecasting PM concentrations in Beijing, Shanghai, Guangzhou and Lanzhou. The forecasting result is improved and this method is more stable than BPNN and EEMD-BPANN.
Qin et al. [100] Using an a priori algorithm mined the spatial and temporal associations of intercity PM, also mined cross spatial and temporal associations of PM 10 and PM 2.5 in the Jing-Jin-Ji region (China).
Wang et al. [68] They used HANN, HSVM and Taylor expansion forecasting model in Taiyuan. The innovation involved in this approach is that it sufficiently and validly utilizes the useful residual information on an incomplete input variable condition.
Feng et al. [101] 1. Using trajectory based geographic parameter as an extra input to ANN model; 2. Applying forecast strategy at different scales and then sum them up; 3. The backward trajectories from Hybrid Single-Particle Lagrangian Integrated Trajectory (HYSPLIT) model were used to track the transport corridors of air masses.
Xu et al. [52] Proposed ICEEMD-SVM-WOA model and FE model. This model not only forecast the concentrate on air pollutants, but also evaluates the effectiveness of the new forecast system by fuzzy evaluation method.
Wongsathan et al. [102] Proposed a fundamental hybrid forecast model. This model can improve the performance of the forecast models, the exogenous variable may be considered as well as the modified of the hybrid algorithm Remark 15. With the popularity of hybrid systems, more and more scholars construct air quality forecasting system based on the HS. Generally, air quality forecasting system contains three modules: data pre-processing module, optimization module, and forecasting module. The function of data pre-processing is eliminate chaotic noise and extract effective features that lie in original series; optimization module aims to optimize the parameters utilized in the forecasting module to improve the forecasting accuracy. The research process for an air quality forecasting system is usually divided into three steps.
Step 1. Decompose the original series. Some signal processing tools are used in this step, such as wavelet transform, short-time Fourier transform, and EEMD. The main purpose of signal processing is to weaken the redundant content in the signal, remove the mixed noise and interference, and transform the signal into a form for easy processing and analysis for subsequent research.
Step 2. Optimize the forecast model. There are many methods available in this step, such as genetic optimization algorithms, Ant colony optimization algorithms, and whale optimization algorithms.
Step 3. Construct the forecast model. This is an important step in the study, and several methods can be chosen, such as statistical methods (regression, principal component analysis, etc.) and AI methods (ANN, wavelet NN, etc.).
When constructing a hybrid model, we should take into account the specific situation of the study areas to choose different models for signal decomposition, forecasting, and optimization. Finally, the hybrid model is used for predictions.

Other Methods of Air Pollution Forecasting
In addition to the common models described above, some scholars forecast by using new models. The most popular method is the hybrid system, and more and more scholars forecast air pollution by applying hybrid models and indicate that the accuracy of the hybrid model is higher than individual models. Moreover, some researchers use common models to forecast, taking into account geographic factors. This section will describe these models in detail.

Geographic Methods
Kurt et al. believed that using geographical factors in the experimental area for air pollution forecast can improve the accuracy of forecasting [26]. Therefore, when forecasting the air pollution, the pollution situation in the nearby area can be used as a reference. The interaction between pollutants is determined by the different geographic terrain, the geometric characteristics of the building, and other geographical and environmental attributes. These complex interactions play a significant role in the forecasting of air pollution. In addition, the location and distances between districts are also important, generally, the closer the distance is, the higher the similarity is. [26]. So, Kurt et al. presented three geographic models with the increasing order of complexity to forecast SO 2 , PM 10 and CO concentration in Istanbul, these methods are described in Table 22. Table 22. The description of three geographic models.

Model Description
Single-site neighborhood model The main idea of this model is to use the air pollution index of one or more neighboring regions as the input variables of the forecast area.
Two-site neighborhood model This model considers two neighboring districts. The rationale for this model is that using more predictor variables should achieve higher accuracy.

Distance-based model
In this model, the weighted average value of air pollutants is calculated according to the distance between the adjacent regions and the forecasted distance. The model is based on the idea that the effects of air pollutant levels of the neighboring district are inversely proportional to the distance between the two districts.

Remark 16.
Few forecasts considered the geographical factors of neighboring cities. However, Kurt et al. [26] proved that the error in geographic methods was always lower than that in a non-geographic model. The best neighborhood and the minimum error produced on an experimental date can be determined experimentally. However, there are some drawbacks for geographic models; for example, it is difficult to choose the proper neighboring districts and models because the error may be higher between two cities in a single-site neighborhood model but lower in other models, so it needs many comparative analyses.

Grey System (GM)
Grey system theory was initiated by Deng in 1982, mainly in the case of uncertainty and lack of information, making full use of historical data to build a model. Gray system prediction model is expressed by differential equation [62], generally, the grey model (GM) is written as GM (m, n), and in which m is the order of the differential equations and n is the number of variables of the model. Thus, GM (1, 1) is short for "grey model first order one variable," which is the dominant model of the grey forecasting theory in grey systems theory. GM (1, 1) has been widely used in forecasting studies because of its advantages, low requirement for data items to build forecasting models and higher forecast accuracy as compared with other forecasting methods [6].
There are four kinds of commonly used gray forecasting models: (1) Sequence forecast: A grey forecast model that can reflect the characteristic of the forecast object is constructed based on the observation of the time series. Pan et al. employed a grey dynamic model group and grey relational analysis to forecast the air quality change trend of Tianjin, and the forecasting result indicated that model group had high accuracy [103]. An-order multiple grey system (GM (1, N)) optimized by grey-genetic algorithm was proposed by Tsai et al., who used this novel model to forecast the air pollution in Taiwan. The result indicated that the grey-genetic algorithm can refine the prediction accuracy of GM models [104]. At first, the nonlinear prediction model GM (1, 1) is established because the relationship between the variables in the atmospheric environment system is nonlinear. The data sequence is then cumulatively generated, and the forecasted values are obtained by correlation calculation. Finally, the model is diagnosed and the reliability of the model is analyzed. If the test statistic is within the allowable range, the predicted value can be calculated; otherwise, it is necessary to modify it by analyzing the residual sequence and then making a prediction.

Remark 17.
In practical application, it is found that when using a GM prediction model to predict, sometimes good prediction results can be obtained, but sometimes the prediction results are not accurate. Scholars indicate that this is because those researchers ignore the premise of using GM prediction: the original data sequence must satisfy the exponential law, and the speed of data sequence change must be slow. Moreover, GM prediction models that have an inherent unavoidable error because this biased index model is a small sample prediction method, and the precision depends on the conformation of the background value and the selection of original condition.

Natural Source Pollution Forecasting
In addition to man-made air pollution, natural sources must not be ignored. Wildfires, such as forest fire and agricultural burning, always produce heavy smoke, which is harmful to human respiratory system. The volcanic ash contains not only water-insoluble particles matter, but also heavy metal, and that are harmful to the human body, meanwhile, affecting human life and productive activities. If the transport and dispersion processes of wildfires and volcanic ash can be simulated, adverse effects and losses can be reduced. To the best of our knowledge, the development and the spread of pollutants are more dependent on the weather, so most of fire smoke and volcanic ash disperses models are based on weather forecast [95].
Bhoi et al. use the case of forest fire in the Eastern United States to forecast the emission of PM 2.5 and CO in wildfires. In their proposed framework, Operational Multiscale Environment modeled with Grid Adaptivity, Real-time remote sending data were used to automatically detect fire pixels, and the output was generated in GIS format. This system will help to assign persons involved in wildfires management, improved work efficiency and reduce fire damage [105].
Kochanski et al. coupled an atmosphere-fire model named WRF-SFIR and WRF-Chem to simulation and forecast the smoke emission and dispersion. First of all, WRF-SFIRE conversion the standard fuel categories to the Moderate Resolution Imaging Spectroradiometer (MODIS) land cover types, and after that, the fuel consumption rates for each fire grid based on the mas of fuel burnt in one time step was calculated. Next, the emission fluxes were calculated as the products of the consumption rates and the fuel-specific emission factors. Finally, the computed results described above were put into the WRF-Chem to obtain simulation and forecasting result. This model coupled the atmosphere-fire and atmosphere-chemistry in high level, giving an opportunity for studying complex interactions between the fire and the atmosphere [95].
Another popular framework is BlueSky, and it is a smoke model to simulate the cumulative smoke impacts from fires [106]. This model includes input module and output module, and the fire information and meteorological information as the initial input data are entered into the model primarily, meanwhile fuel loadings and moisture conditions are determined, and consumption is calculated. The emission from the consumption are speciated and allocated diurnally, then the dispersion and trajectory models were drive by these emissions. In this model, WRF and MM5 are used to provide meteorological parameter, CALPUFF is applied to simulate the dispersion and trajectory, and CALMET is employed to provide meteorological field for the CALPUFF [106]. Goodrick et al. concluded the smoke transport model. They indicated that models for forecasting the effects of wildfire smoke consisted of four basic components. The first part included the description of the emission sources such as pollutants and heat release. The second part determined the vertical range of the plume by checking the stability of the atmosphere, the wind profile and the rate of the exothermic fire source. The third part was the actual smog movement (transportation and diffusion) of environmental winds. The fourth part explored the chemical transformation of smoke constituents to explore a series of air quality issues [107]. They also introduced several prediction models, for more details please see [107].
To facilitate the comparison, we summarize studies regarding different methods in Table 23. Table 23. Different models of air pollution forecast.

Method Types Authors Models Main Conclusions
Statistical methods Silibello et al. [108] Kalman filter (KF) and Hybrid forecast (HF) Use two adjustment techniques, the HF and the KF, to improve the accuracy of forecasting supplied by an air quality forecast system Huebnerova et al. [109] Generalized linear models with log-link and gamma distribution It's shown that the predicted meteorological variables are used to predict well though comparative analysis of the two models

Artificial intelligence methods
Catalano et al. [110] ANN and ARIMAX Forecasted the extreme concentrations by integrating the two models into an ensemble Feng et al. [111] SVM-GABPNN Proposed a hybrid model which SVM was used to classify data, GA used to optimize the BPNN model.
Bai et al. [24] W-BPNN Using wavelet transform to realize feature extraction and characterization of air pollutants Siwek et al. [112] Wavelet transformation, the multilayer perceptron, radial basis function, Elman network, SVM and linear ARX model Decomposed the data into the wavelet coefficients and used different NN to individual prediction, then combined the few predictors in the ensemble. This approach does not require very exhaustive information about air pollutants, and it has the ability of allowing the nonlinear relationships between very different predictor variables.

Hybrid methods
Feng et al. [101] Hybrid ANN Used trajectory based geographic parameter as an extra input to ANN model; using wavelet transformation decomposed original series into a few sub-series with lower variability Fu et al. [113] RM-GM-FFNN Enhanced FFNN model with RM and GM to assess the possible correlation between different input variables for improving forecast accuracy Song et al. [4] ANF, Distribution functions, Proposed interval prediction method and ANF to address the uncertainty of PMs according to the pollutant emission distribution.
Three dimensional models Luo et al. [27] Models-3/CMAQ Provided a method of analyzing the change of pollutants' concentration in the condition of lacking practical pollution data.
Grell et al. [92] Fully coupled online chemistry with the WRF model The accuracy of forecasting of meteorological modules and chemical modules under different conditions of separation and coupling is explored.
The result indicate that the ability to predict a slight increase Other methods Kurt et al. [26] Neural networks based on geographic forecasting models The models which considered the geographic factor performed better than the models which unconsidered.
Pan et al. [103] GM Grey relational analysis Selected 30 indexes of 5 categories, and find mainly impact factors by using grey relational analysis, then used GM (1, 1) model to forecast the concentration of pollutants Remark 18. From Table 23, it can be observed that many approaches have been developed for air pollution forecasting, and each method has its own characteristics. In addition, the specific problems are that the pollutants are different and the specific factors in the models are quite different, so it is difficult to select the most suitable approach for forecasting air pollution. Researches should not only consider the advantages of forecasts but also the disadvantages. Statistical models require a large amount of historical data and have a high dependence on data time series. AI methods are unstable and have a high dependence on data. The process of building hybrid models is a little complex. Therefore, it is necessary to make a full comparison of these methods and determine the most appropriate method for forecasting.

Conclusions
As a serious concern, air pollution in the 21st century has received great attention in recent years, and various air pollution forecasting methodologies and approaches have been advanced. This work mainly reviewed the methods of air pollution forecasting. At the beginning of this work, we reviewed the current research status of air pollution from the perspective of pollution emission inventories, health effects, and air pollution assessment to air pollution control efficiency and air pollution early warning systems. Then, we reviewed the methods of air pollution forecasting as the core of this work. Based on relevant literature, these methods can be roughly divided into three categories: potential forecast model, three dimensional forecast methods, and hybrid system. These methods have advantages and disadvantages. According to the application of these methods in the forecast, the conclusions of these methods and models are summarized below: • Statistical models have a wide application and require less time to build models, but they require a large amount of historical data and have a high dependence on the data time series approach. • AI methods, such as the NN approach, have good performance and can solve nonlinear data, but the models are unstable and have a high dependence on data. Moreover, most optimization algorithms are easy to be understood and combined with other methods; however, they easily fall into local optima.

•
As the most popular method, a hybrid system has good robustness with low risk and strong adaptability and can take advantage of other models. However, the process of building models is relatively complex. • Traditional AI performance is better than that of statistical methods, but worse than that of the hybrid model. • Processed original series did better than the unprocessed original series in terms of air pollution forecasting. • It is proven that forecast performance is better when considering the meteorological variables and the geographic factors.
In conclusion, as the atmospheric environment is a complex system, there are many factors affecting the quality of the atmospheric environment, and the relationship between them is complicated. Therefore, air pollution forecasting based on the area and different pollutants should choose different forecasting methods. Moreover, there is no one best approach to make the most accurate forecast.