Statistical Modeling Approaches for PM 10 Prediction in Urban Areas ; A Review of 21 st-Century Studies

PM10 prediction has attracted special legislative and scientific attention due to its harmful effects on human health. Statistical techniques have the potential for high-accuracy PM10 prediction and accordingly, previous studies on statistical methods for temporal, spatial and spatio-temporal prediction of PM10 are reviewed and discussed in this paper. A review of previous studies demonstrates that Support Vector Machines, Artificial Neural Networks and hybrid techniques show promise for suitable temporal PM10 prediction. A review of the spatial predictions of PM10 shows that the LUR (Land Use Regression) approach has been successfully utilized for spatial prediction of PM10 in urban areas. Of the six introduced approaches for spatio-temporal prediction of PM10, only one approach is suitable for high-resolved prediction (Spatial resolution < 100 m; Temporal resolution ď 24 h). In this approach, based upon the LUR modeling method, short-term dynamic input variables are employed as explanatory variables alongside typical non-dynamic input variables in a non-linear modeling procedure.


Introduction
Ambient respirable particles [1], or particulate matter with a diameter of less than 10 µm (PM 10 ), have attracted special legislative and scientific attention due to their effects on human health.Particles with a diameter of less than 10 µm constitute the so-called inhalable fraction of particles, which are able to reach the bronchi-tracheal area [2].PM 10 is made up of a variety of solid and liquid substances derived from natural sources (e.g., volcanoes, dust storms, forest and grassland fires, living vegetation, and marine salts) and human activities (e.g., central heating, industry, construction works, vehicular traffic, domestic heating, and incinerators) [2][3][4].From a chemical point of view, a complex mixture of organic and inorganic carbon, metals (lead, arsenic, mercury, cadmium, chrome, nickel, and vanadium), nitrates, sulphates and phosphates are present in the particulates [2].
PM 10 has primary and secondary origins [5].The major primary sources of PM 10 in urban areas are road traffic (e.g., carbonaceous compounds from exhaust emissions [6], re-suspension of road dust [7], and tyre abrasion [8]) and combustion processes.Secondary particles are mainly formed by the condensation of vapors, or chemical reactions such as atmospheric oxidation of SO 2 to H 2 SO 4 , and NO 2 to HNO 3 [9].PM 10 concentration in urban areas is the result of a combination of regional background, urban and traffic concentrations [8,[10][11][12][13].For example, Figure 1 shows the source apportionment of PM 10 in Berlin.Particulate matter significantly affects aspects of atmospheric chemistry and air quality, such as dry and wet deposition, visibility, solar radiation, and cloud formation [5,15,16].PM10 also has a direct effect on health via inhalation [5].Glinianaia et al. [17] and Šrám et al. [18] have reported the effects of particulate matter on infant birth weight.Samet et al. [19] found that an elevated PM10 concentration can increase mortality rates the following day.In some studies, a significant relationship between health effects (mortality and morbidity) and elevated concentration of particulate matter was found (e.g., [20][21][22][23][24][25][26]).Moreover, it has been demonstrated by a number of epidemiological studies (e.g., [27][28][29]) that low concentrations of particulate matter can also have a large effect on human health.To address the PM10 problem, the European Union has set two limit values on PM10.According to these limits, the mean daily PM10 concentration may not exceed 50 (μg/m 3 ) more than 35 times per year, and the mean annual PM10 concentration may not exceed 40 (μg/m 3 ) [30].
Most major European urban areas experience severe shortterm PM10 episodes that are harmful to the environment and human health [31].The public must be informed when high PM10 concentration conditions are present [5].Furthermore, administrations must attempt to reduce pollutant concentrations by limiting vehicular traffic on some days (e.g., alternative circulation of even and odd number plate alteration and "Sundays on foot") [2,32], industrial emission restriction, and urban planning [33].
Longterm forecasting is employed for urban planning and design, as well as transportation networks, industrial sites and residential areas management, in order to minimize unacceptable risks to public health [31].
In order to diminish or prevent the risk of critical concentration levels, abatement actions (such as traffic reduction) should be planned at least one or two days in advance [31].Therefore, a shortterm forecasting platform must be developed and used as a rapid alert system to inform the public of harmful air pollution events, as well as to adapt air pollution control strategies [2,[31][32][33][34].When air pollution concentration exceeds imposed (limit) values, the use of forecasting models could inform decisions on the enforcement of regulations.This would prevent unnecessary inconvenience to the cityʹs residents [2].
Spatial and temporal variations of PM10 concentration are related to complex interactions among many parameters [5], and PM10 prediction in urban areas is more difficult than the prediction of other air pollutants (e.g., NO2) [35].Therefore, despite the need for an accurate air quality forecasting model to alert the public and activate pollution control activities [31,34], often no effective action can be imposed during elevated PM10 conditions because of nonexistent or inadequate forecasting models [31].
There are two main approaches to the prediction of PM10 in urban areas; mechanistic and statistical models.
Particulate matter significantly affects aspects of atmospheric chemistry and air quality, such as dry and wet deposition, visibility, solar radiation, and cloud formation [5,15,16].PM 10 also has a direct effect on health via inhalation [5].Glinianaia et al. [17] and Šrám et al. [18] have reported the effects of particulate matter on infant birth weight.Samet et al. [19] found that an elevated PM 10 concentration can increase mortality rates the following day.In some studies, a significant relationship between health effects (mortality and morbidity) and elevated concentration of particulate matter was found (e.g., [20][21][22][23][24][25][26]).Moreover, it has been demonstrated by a number of epidemiological studies (e.g., [27][28][29]) that low concentrations of particulate matter can also have a large effect on human health.To address the PM 10 problem, the European Union has set two limit values on PM 10 .According to these limits, the mean daily PM 10 concentration may not exceed 50 (µg/m 3 ) more than 35 times per year, and the mean annual PM 10 concentration may not exceed 40 (µg/m 3 ) [30].
Most major European urban areas experience severe short-term PM 10 episodes that are harmful to the environment and human health [31].The public must be informed when high PM 10 concentration conditions are present [5].Furthermore, administrations must attempt to reduce pollutant concentrations by limiting vehicular traffic on some days (e.g., alternative circulation of even and odd number plate alteration and "Sundays on foot") [2,32], industrial emission restriction, and urban planning [33].
Long-term forecasting is employed for urban planning and design, as well as transportation networks, industrial sites and residential areas management, in order to minimize unacceptable risks to public health [31].
In order to diminish or prevent the risk of critical concentration levels, abatement actions (such as traffic reduction) should be planned at least one or two days in advance [31].Therefore, a short-term forecasting platform must be developed and used as a rapid alert system to inform the public of harmful air pollution events, as well as to adapt air pollution control strategies [2,[31][32][33][34].When air pollution concentration exceeds imposed (limit) values, the use of forecasting models could inform decisions on the enforcement of regulations.This would prevent unnecessary inconvenience to the city's residents [2].
Spatial and temporal variations of PM 10 concentration are related to complex interactions among many parameters [5], and PM 10 prediction in urban areas is more difficult than the prediction of other air pollutants (e.g., NO 2 ) [35].Therefore, despite the need for an accurate air quality forecasting model to alert the public and activate pollution control activities [31,34], often no effective action can be imposed during elevated PM 10 conditions because of non-existent or inadequate forecasting models [31].
There are two main approaches to the prediction of PM 10 in urban areas; mechanistic and statistical models.
A mechanistic (deterministic) approach involves numerically solving a set of differential equations.This approach does not require a large amount of measured data, but it does require a complete knowledge of pollutant sources, and temporal variation of the emission quantity, chemical composition of emissions, and physical processes in the atmosphere [36].Detailed information about the source of pollutants and other parameters is often unavailable.These parameters must be estimated or simply ignored due to a lack of available information [37], and these simplifications increase the uncertainty of the results.One of the major properties of mechanistic models is causality.Mechanistic models predict more frequent events reasonably accurately, but they are not able to accurately predict extreme events [38], due to the complexity and inherent uncertainty associated with turbulent flow [38].Because of the complex transportation and transformation of PM 10 , the development of mechanistic models that can accurately predict the spatio-temporal variation of air pollutants is not easy [39], and these models are not capable of the accurate prediction of time series for short and medium time ranges.Therefore, these models are not suitable for planning and regulation [37,38].
Insufficient knowledge of pollutant sources and emission inventories, and inaccurate description of physico-chemical processes, can lead to significant bias and error in air quality estimations of mechanistic models [40][41][42][43].However, assimilation and mapping techniques (e.g., bias correction, model output statistics, ensemble Kalman filtering, statistical approaches, geostatistical approaches) can improve the air quality estimations of mechanistic models (e.g., [44][45][46][47][48]).Some studies have generated high-resolution maps of air quality in urban areas by combining regional mechanistic and local dispersion models, which estimate regional (or background) and near-road concentrations, respectively (e.g., [49,50]).Furthermore, the strong physical and mechanical bases of mechanistic models should not be ignored.In the future, as the structure of mechanistic models improves, and computational power increases, these models will have more potential for accurate air quality predictions and short-term warnings.The current models are complex, time consuming and inaccurate [51].Accordingly, a suitable methodology should be adopted for PM forecasting in urban areas [51].
To overcome the limitations of mechanistic models, statistical models are employed for air pollution predictions [37], and they have been successfully developed for the spatio-temporal prediction of air pollutants [39].Statistical models are suitable for the description of the complex site-specific relationship between air pollutants and explanatory variables, and they often make predictions with a higher accuracy than mechanistic models [36].However, one drawback of statistical models is that they do not consider the physics behind the data and consequently, the developed model for a study area is not applicable to other sites [5,36].Fernando et al. [51] compared a mechanistic model with a statistical modeling approach for daily PM 10 forecasting in 2005 at Central Phoenix station in Phoenix, Arizona, USA.A Neural network was employed as the statistical model.The utilized mechanistic model was MODEL3-CMAQ, which consists of the MM5 (Mesoscale Meteorological Model) model for the simulation of meteorological parameters, the SMOKE (Sparse Matrix Operator Kernel Emission) model for the simulation of emission processing, and the CMAQ (Community Multi-scale Air Quality) model for the simulation of PM 10 concentration.The statistical model was found to be easier, faster, and more accurate than the mechanistic model, and it did not require costly emission inventories and computer resources.
Given the importance of PM 10 prediction in urban areas, this paper reviews the existing statistical approaches for PM 10 prediction (1-temporal, 2-spatial and 3-spatio-temporal prediction).

Search Strategy and Study Selection
We systematically identified and reviewed the statistical approaches to PM 10 prediction in urban areas.The Google Scholar database was searched for relevant literature published since 2000.Search terms included "Particulate matter", "PM 10 ", "urban area", "city", "prediction", "forecasting", "simulation", "modeling", "geostatistical" and "statistical" with different combinations.Titles and abstracts were read to select relevant articles, and the full texts were then downloaded.In addition, the bibliographies of the selected articles and relevant review papers were investigated to identify further reading.We only selected articles published in English.Most were published in peer-reviewed journals as full publications.Some conference papers and books were also selected.We selected articles in which the air pollutant, modeling approach and case study were outdoor PM 10 , statistical approach and an urban area, respectively.The articles were then categorized according to the presentation of the detailed necessary information (e.g., input variables, modeling procedure, time scale of forecasting, and period of observations).

PM 10 Predictors in Statistical Models
Statistical modeling techniques require input (explanatory) variables.In this section, PM 10 predictors that have often been employed in PM 10 prediction studies are introduced, and their relation to PM 10 is briefly explained.High accuracy input data is very important in the prediction of PM 10 .The utilization of the results of numerical weather forecast models as input variables in statistical PM 10 models can add some uncertainties to PM 10 predictions because of the uncertainties associated with numerical weather forecast models [33].Consequently, the results of numerical weather forecasts are rarely used as the input variables of statistical models.PM 10 values in preceding time steps are similar to the initial conditions for the PM 10 prediction in the following time steps [5], and they have often been considered as an explanatory variable in forecasting models.Including lagged PM 10 in the set of input variables is expected to improve modeling results [52].Stadlober et al. [32] showed that lagged PM 10 is a more important variable than temporal or meteorological variables in the forecasting of PM 10 in urban areas.
Wind speed is a major meteorological parameter that determines the horizontal transport, dispersion and re-suspension of air pollutants.Low wind speed is associated with high PM 10 concentration [53][54][55][56][57]. Wind speed is a suitable indicator for the transport of PM 10 ; it has a direct relation to the atmospheric dispersion processes, and is a principal factor in the control of air pollution levels [58].
Wind direction can be related to the PM 10 concentration under non-homogeneous spatio-temporal PM 10 emission conditions [5], and it has a major role in the transport, dilution and re-suspension of PM 10 [53,57].
Solar radiation, cloud cover and air temperature are the effective parameters in the formation of secondary PM 10 [5,56,58].In addition, high air temperature in an area leads to slow moving high-pressure atmosphere systems, clear and sunny skies, stable atmospheric condition with subsiding air, accumulation of air pollutants, and high PM 10 concentration [56].Hence, temperature is considered as one of the strongest predictors of PM 10 concentration [56].Furthermore, air temperature change is related to incoming solar radiation, and enhances turbulence kinetic energy, influencing the mixing layer height.A shallow mixing layer leads to an increase in PM 10 concentration at ground level.Perez and Reyes [59] observed that high PM 10 concentration occurs when the difference between maximum and minimum daily temperature in winter is large, and this temperature difference is an important meteorological parameter for the forecasting of daily maximum PM 10 concentration.
Motor vehicle exhaust is a source of PM 10 [60], and vehicular traffic re-suspends PM 10 [8].Road transport is one of the major sources of primary PM 10 , and annual average daily traffic and other derived traffic variables are often suitable parameters for incorporation into long-term models.However, these parameters may not always be suitable for short-term spatial modeling, as traffic has short-term variations and its variability is unpredictable [39].
When data on traffic flow and speed are not available, CO and NOx can be employed as surrogates for traffic variables [57].Moreover, SO 2 and NOx are considered to be sources of secondary PM 10 concentration [61].
Land use patterns can also influence air pollution.For example, plants reduce PM 10 [62] and water bodies prevent PM 10 re-suspension [60].Population density and meteorological parameters can influence the spatial pattern of PM 10 [60].
Short-term variations of city activities, such as traffic intensity, influence the short-term variation of PM 10 emission [56].In addition, long-term variation of PM 10 (monthly or seasonally), which can be attributed to central heating (as winter months show higher PM 10 values than other months), can influence the long-term variations of PM 10 sources.Temporal variables (e.g., hour of day, day of week, month of year and day of year) can be used in the presentation of information on the intensity of PM 10 emission sources [58].

Temporal Prediction (Forecasting) of PM 10 in Urban Areas
In temporal prediction studies, the PM 10 concentration in one or more stations in the urban area is simulated and forecast.Different statistical techniques have been employed for the temporal prediction of PM 10 in urban areas, and they are introduced in this section.

Multi-Variate Linear Regression (MLR)
MLR has been widely used for PM 10 forecasting in urban areas.Although it has an accuracy problem due to linear representation of non-linear systems, and it is not able to capture extreme values (episodes) [37], it does not require continuous historical data [63].Table 1 exhibits the main recent studies on the temporal prediction of PM 10 in different urban areas.Although Table 1 implies that MLR has been widely used for PM 10 forecasting, comparison between the results of MLR and other techniques demonstrates the weakness of the MLR approach.The stepwise input variable selection technique is often used in MLR for the determination of suitable explanatory variables for regression.Collinearity among the input parameters has often occurred in different MLR studies, and sometimes PCRA (Principal Component Regression Analysis) is used to overcome the problem of collinearity [33].Another linear regression technique, which has been applied for temporal prediction of PM 10 in urban areas, is Lazy Learning (LL).LL is a local linear forecasting technique, which employs the local learning algorithm when a forecast is required.This technique showed better performance than MLP (Multi Layer Perceptron) and PNN (Pruned Neural Network) in a study on the daily forecasting of PM 10 in Milan, Italy [68].

Artificial Neural Networks
Artificial Neural networks (ANNs) have been used for the forecast of a wide range of pollutants and their concentrations at various time scales with very good results.Nagendra and Khare [75] pointed out that ANNs have recently become an alternative to conventional methods, and in the near future they will become an important tool for modeling air pollutant distribution.The most widely used non-linear techniques for the temporal prediction of PM 10 are neural networks with different structures (MLP: Multi Layer Perceptron; ELMAN; PNNs: Pruned Neural Networks; RBF: Radial Basis Function).MLP has been utilized more than the other structures of ANNs.Table 1 presents the major recent studies on the application of ANN for temporal prediction of PM 10 in urban areas.
The main advantages of ANNs are their application in cases in which a full theoretical approach is not available, incorporation of a large number of heterogeneous variables, implementation speed, and their ability to simulate complex problems with non-linear behavior [64,66].In addition, the main advantages of an ANN forecasting system, compared to mechanistic atmospheric modeling systems, are that less input data and computational time are required (in operation mode) [5].Moreover, ANN models, unlike stochastic models, do not require pre-assumptions about the data distribution [70].
Chaloulakou et al. [66] compared the MLR and ANN techniques for daily forecasting of PM 10 in one station in Athens, Greece.They evaluated the developed models (Model 1: without lagged PM 10 ; Model 2: with lagged PM 10 data as an input variable) for the forecasting of high PM 10 (>75 µg/m 3 ).They demonstrated that the utilization of PM 10 as an input variable can significantly improve the forecasting results (See Table 2).In total, ANNs performed better than MLR in forecasting high PM 10 concentration events.and "mutual information and false nearest neighbor" for determination of suitable meteorological and hourly PM 10 variables, respectively.Grivas and Chaloulakou [58] used a genetic algorithm for input feature selection.MLP models have some problems in dealing with high dimensional input variables (the curse of dimensionality) [67].Another problem with the use of MLP is the local minima.Computational penalty for the mitigation of this problem is unavoidable [67].Lu et al. [67] used PCA to decrease the dimension of the input variables and utilized RBF (Radial Basis Function) neural network for PM 10 forecasting in Hong Kong.They stated that the RBF has a lower computational cost than MLP.However, Paschalidou et al. [33] showed that the MLP performs better than RBF at the hourly forecasting of PM 10 .In addition, they showed that the developed MLP model is suitable for extreme events forecasting in a case over Cyprus (POD = 0.68-0.71,FAR < 30%).
In total, ANNs have performed better than MLR.Inclusion of PM 10 in the input variables list improves the forecasting results significantly.Although MLP has been employed more than other ANN structures for PM 10 forecasting, the best ANN structure is still unknown.In addition, frequency distribution of PM 10 in the training dataset may highly influence the modeling results, and the utilization of PM 10 with uniform frequency distribution may lead to an appropriate model for the forecasting of extreme events.Therefore, combining two PM 10 forecasting models, developed using two training datasets with different frequency distributions, may lead to a suitable model for the forecasting of low to high PM 10 concentrations.

Other Techniques
Besides linear and ANN techniques, other techniques have occasionally been employed for temporal prediction of PM 10 in urban areas, obtaining results comparable with the two widely used techniques (MLR and ANNs).Some of these techniques are ARIMA (Auto-Regressive Integrated Moving Average), hybrid models, BRT (Boosted Regression Tree), CART (Classification And Regression Trees), GAM (Generalized Additive Model), QRM (Quantile Regression Model) fuzzy, and SVM (Support Vector Machines).Table 3 presents the recent studies on temporal PM 10 prediction using these techniques and their comparison with MLR and ANNs.ARMA (Auto Regressive Integrated Moving Average) and ARIMA (Auto Regressive Integrated Moving Average) [80] have been employed in some studies for PM 10 forecasting (e.g., see [34,37] in Table 3).However, these techniques have an accuracy problem due to the linear representation of non-linear systems, and are not able to capture extreme concentrations [34].In addition, these methods require continuous historical data [63].ARMA and ARIMA models have the capability to include external explanatory variables and in this case, they are named ARMAX and ARIMAX, respectively.
In some studies, the ARMA or ARIMA have been coupled with other methods such as ANNs, and the developed models are called hybrid models.In hybrid ANN and ARIMAX, an ARIMAX model is first developed, and then the ANN model is used to describe the residuals of ARIMAX [37].Diaz-Robles et al. [37] evaluated the performance of ARMAX, ANNs, MLR and hybrid ARIMAX-ANN for the forecasting of daily maximum PM 10 moving average values in one station in Temuco, Chile.The hybrid ARIMAX-ANN was the best model (See Table 3), and its input variables were not only the inputs, employed by an ANN, but also the ARIMAX model outputs and the residuals of ARIMAX.
Goyal et al. [34] presented a hybrid of MLR and ARIMA, whose structure is almost the same as the ARIMAX model.They also demonstrated that the hybrid model performs better than ARIMA and MLR (See Table 3).
The foundation of Support Vector Machines (SVM) is a machine learning technique that was initially developed for classification problems by Cortes and Vapnik [81].A regression technique based upon SVM was then developed [82].SVM is able to perform linear and non-linear regressions.
The advantages of SVM, compared with MLP, are its better generalization ability and its capability for learning, using a small number of training data and huge number of input variables [77].Suárez Sánchez et al. [77] employed SVM for simulation of PM 10 in Avilés, Spain and they showed that SVM with different Kernel function performs better than MLP (See Table 3).Suárez Sánchez et al. [79] reported similar findings (See Table 3).
Raimondo et al. [83] compared SVM and ANN for temporal Prediction of PM 10 in a station in Goteborg, Sweden, and they found that the SVM produced better hourly PM 10 forecasting results than an ANN.In addition, Sotomayor-Olmedo et al. [84] demonstrated the appropriate performance of SVM with Gaussian kernel function for the forecasting of monthly PM 10 in Mexico City.
GAM (Generalized Additive Model) [85] is an extension of Generalized Linear Model (GLM).This technique has also been employed in a few studies for modeling PM 10 in urban areas (See [57,76,78] in Table 3).
QRM (Quantile Regression Model) has occasionally been employed for modeling PM 10 in urban areas.Sayegh et al. [57] showed that QRM outperforms MLR, GAM, and BRT in the modeling of daily PM 10 in Makkah, Saudi Arabia (See Table 3).For details about QRM refer to Koenker [86].
CART (Classification And Regression Trees) analysis is a statistical procedure, introduced by Breiman et al. [87].The methodology used by CART is known as binary recursive partitioning.The CART technique splits the data into two groups-nodes and this binary splitting is repeated until some conditions are satisfied [52].Slini et al. [52] showed that CART technique preforms better than linear regression techniques (See Table 3).The CART technique results in new methods for analysis and forecasting such as BRT, which was employed for daily PM 10 simulation in Makkah, Saudi Arabia [57].
Yetilmezsoy and Abdul-Wahab [88] utilized CH 4 , CO, wind speed and direction, relative humidity and solar radiation as the input variables of fuzzy and linear regression techniques for the estimation of PM 10 in Khaldiya, Kuwait.The fuzzy method outperformed the linear technique (MLR: R 2 = 0.756; Fuzzy technique: R 2 = 0.997).
In general, among the rarely used approaches, the SVM and hybrid models outperform ANNs, and these techniques are promising for suitable temporal PM 10 prediction.Hence, based upon previous studies, ANN, SVM and hybrid techniques can be considered as suitable techniques for temporal prediction of PM 10 in urban areas.

Spatial Prediction (Spatial Distribution) of PM 10 in Urban Areas
One technique for the spatial prediction of air pollution is spatial interpolation.Some air pollution studies have employed deterministic (e.g., Inverse Distance Weighting (IDW), nearest-neighbor, and splines) and geostatistical (e.g., Kriging) interpolation techniques [89].Kanakiya et al. [90] used Kriging, IDW, nearest-neighbor and splines for spatial prediction of PM 10 in Prune, India.Kriging and IDW were employed for spatial prediction of PM 10 in Istanbul, Turkey [91] and Phoenix, Arizona, USA [92].In total, Kriging has been applied more often in urban areas (e.g., metropolitan areas of Barcelona and Bilbao, Spain [93], an urban scale in Europa [94], Phoenix metropolitan region, Arizona [95] and Mumbai, India [96]), than have other typical deterministic interpolation techniques.
Although spatial interpolation techniques (relying on conventional geostatistical techniques), are suitable for spatial prediction of air pollutants at national, regional and global scales [97,98], they are not suitable for spatial prediction at smaller scales such as urban areas [99,100].Conventional geostatistical techniques consider the spatial autocorrelation information with or without broad scale variations (or trends) (e.g., ordinary Kriging, universal Kriging and Kriging with external drift).Air pollution sources in urban areas are extremely varied and complex.There are many local emission sources, and there is a steep gradient of pollutant concentration away from these sources [100,101].Accordingly, a dense air pollution monitoring network must be employed if conventional geostatistical techniques are to be used.This is rarely available in urban areas [102][103][104][105], and interpolation using an inadequate number of monitoring stations can lead to highly biased and smoothed results [89,100].
Consequently, the number of published applications of Kriging on spatial prediction of air pollutants is relatively low [100,105].
Recently, some promising new approaches to spatial interpolation techniques, based upon geostatistical techniques, have been introduced and applied to spatial prediction of PM 10 in urban areas.Co-kriging technique, using PM 10 predictions of Transport Chemical Aerosol Model (TCAM) [106] as a secondary variable, was successfully applied for spatial prediction of PM 10 over Milan metropolitan area, Italy [107,108], and it forecast with higher accuracy than the Kriging technique [107].Pollice and Lasinio [109] employed the Bayesian Kriging-based technique [110] for spatial prediction of daily PM 10 in Taranto, Italy.This technique is characterized by the utilization of time varying weather covariates [109].Park [111] employed a spatio-temporal Kriging technique for spatial prediction of monthly PM 10 in Seoul metropolitan area, South Korea, and demonstrated that this technique outperforms conventional Kriging.In addition, Functional Kriging (an alternative to spatio-temporal Kriging) was successfully applied for spatial prediction of PM 10 in Madrid, Spain [112].
LUR (Land Use Regression) has been introduced as a credible alternative technique for the spatial prediction of air pollutants in urban areas with small number of monitoring stations [89], [104].LUR has frequently been used for air pollution exposure assessment, and modeling of small-scale spatial variation of air pollutants in urban areas using different predictor variables [113][114][115].There is no standard method for conducting LUR, but some explanations about the general approach can be found in the literature (e.g., [98,[116][117][118].In LUR, a statistical relationship between air pollutants and some urban characteristics (e.g., land use characteristics, traffic intensity, and population density) is established [39,119].In some studies, air pollution emission sources (i.e., traffic and industrial point sources data) and the concentration of some pollutants at particular locations are also included in the list of predictor variables (e.g., [39,[120][121][122]).In total, the explanatory variables in different LUR models are not unique, due to the city characteristics and data availability [74].LUR models have seldom employed the morphological parameters, which may consider the dispersion field near to the pollution sources [123].Tang et al. [123] added 4 morphological parameters to the traditional LUR model and improved the performance of the LUR model.
Hoek et al. [113] reviewed the studies on the application of LUR for spatial modeling of air pollutants.They showed that the main studies have focused on PM 2.5 and NOx, and the LUR approach has rarely been employed for the spatial modeling of PM 10 in urban areas.Table 4 shows the main studies on the application of LUR for the modeling of spatial distribution of PM 10 in urban areas.There is no specific method for the determination of the optimum number of monitoring stations for developing LUR models [113,125,127].The studies use data measured in 20-100 monitoring stations [89], and most studies have a small to medium number of sampling stations   [130].Table 4 shows that the number of stations employed in the spatial modeling of PM 10 in urban areas is between 14 and 52.Basagaña et al. [130] employed 147 stations for LUR modeling of NO 2 in the cities of Girona and Salt, Spain, and they tried to determine the effect the number of monitoring stations had on the modeling results.They found that a high number of sampling stations leads to better performance in LUR modeling.However, this effect may be masked by the adjusted R 2 and Leave-One-Out Cross Validation (LOOCV) [130].In addition, a large number of measurement sites (more than 80 stations for Girona and Salt) are required for the characterization of local air pollution levels in complex urban settings [130].European cities typically have a limited number of air pollution monitoring stations, so some additional in-situ measurements are necessary for appropriate spatial prediction of PM 10 [131].However, obtaining data from additional stations consumes time, cost and resources [113,[132][133][134].This is the main constraint on the development of dense air pollution monitoring networks in urban areas [135].
Taheri Shahraiyni et al. [136,137] presented a new technique for the development of dense air pollution monitoring networks for urban areas, by generating virtual stations.They successfully implemented their technique in the development of a dense PM 10 monitoring network in Berlin, Germany.The presented technique by Taheri Shahraiyni et al. [136,137] reduces the need for additional in-situ measurement data, and enables a low-cost method for spatial prediction of PM 10 , which is suitable for policy making.Although the MLR technique is often utilized for LUR model development, Zhang et al. [126] used MLP for spatial simulation of the annual PM 10 concentration in the urban core area of Taiyuan, China.The intercept of MLR in the LUR approach implies the background concentration values [60], [113] but Chen et al. [120] found that the intercept of the MLR model for PM 10 in Jinan, China, is higher than the background values.
Previous studies showed that the initial input variables, utilized for PM 10 modeling, are often collinear.A reduction of collinearity is sought (e.g., [74,125], and [127]) as a developed model using collinear input variables is not robust, and is sensitive to small changes in the data [138,139].Hence, a collinearity reduction technique is applied to the input variables.The different thresholds for correlation coefficients have been considered for the collinearity reduction in different LUR studies on PM 10 (e.g., 0.6 [74]; 0.67 [125]; 0.75 [127]).Li et al. [129] introduced LUR modeling with a semi-circular buffer that is able to incorporate wind direction into the LUR model, and it performs better than the LUR model with circular buffer in the modeling of seasonal PM 10 in Changsha, China.
In general, the LUR modeling approach has presented different results for different urban areas (R 2 = 0.3-0.88).
LUR studies have mainly been focused on the spatial modeling of seasonal or annual PM 10 .Accordingly, the developed PM 10 models have high spatial resolution, but they have no temporal variation [74], [134].Short-term PM 10 prediction models are important for the evaluation of short-term exposure in human health studies [74], rapid decision-making to inform and alert the public of harmful air pollution events, and to adapt air pollution control strategies [2,[31][32][33][34].Short-term variations of PM 10 have often been ignored in previous studies, and many studies have assumed that they have no impact on long-term exposure.This assumption is only valid, however, if temporal changes in PM 10 in the whole study area are equal [134].This assumption is not valid in urban areas.
It is possible to develop an LUR model at any given time frame.A few studies, which have focused on the development of spatio-temporal variations of air pollutants using LUR models [134], are presented in the next section.

Spatio-Temporal Prediction (Forecasting of Spatial Distribution) of PM 10 in Urban Areas
Given the importance of improving the temporal resolution of spatial PM 10 models, some studies have tried to develop spatio-temporal models for air pollutants in urban areas.
The developed methods for spatio-temporal modeling of PM 10 can be categorized into the following groups: 1.
Temporal trend: The simplest method is the utilization of the temporal trend of the air pollutant, derived from the local background monitoring stations, for the adjustment of LUR results for past years.The disadvantage of this technique is that the trend of the monitoring station is extended throughout the urban area [134,140].This approach is easy and it can be suitable if a representative fixed station is employed.However, a fixed station, which is affected by local air pollution sources (a non-representative station), cannot properly calibrate the pollutant concentration [74].Taheri Shahraiyni et al. [141] presented a technique for the determination of the most representative stations in urban areas.Combining this new technique with temporal trend may lead to an appropriate spatio-temporal model for long-term variations of PM 10 .

2.
Temporal adjustment: Another approach is the temporal adjustment of the values of the model's predictors.This approach has some disadvantages.Many predictors change very slowly over time (e.g., land use) and consequently, this approach only predicts long-term variations in PM 10 levels.In addition, the temporal changes of the predictors do not necessarily reflect the temporal changes of the pollutants [134], and this method does not account for changes in the relationship between predictors and air pollutants [140].

3.
Temporal adjustment and trend: The combination of the previous two approaches (temporal adjustment and temporal trend) can be considered as an approach for spatio-temporal prediction of PM 10 .In this approach, the spatio-temporal PM 10 concentration is first calculated by the temporal adjustment technique, and then the temporal trend is added to the developed model [140].4.
Temporal recalibration: The change, or recalibration of the coefficients of the existing model, is another approach for the development of a suitable model for other times [134,140].Mölter et al. [134] recalibrated the LUR model for calculation of annual spatial variations of PM 10 in Manchester, UK, over a long period.They concluded that this technique allows for the extrapolation of the LUR model over a long period.Wang et al. [140] compared different approaches (approaches 1-4) for hindcast and forecast of NO and NO 2 in Vancouver, Canada and showed that the best approach is the recalibration technique.

5.
Temporal model development: Some studies develop several models in different time steps (e.g., [142]).However, this approach requires a huge amount of human and material resources for data collection and model development [74].Consequently, it is not time or cost-effective.6.
Employment of temporal predictors: Although the previous approaches (approaches 1-5) derived a spatio-temporal PM 10 model, they have been utilized for long-term variations of air pollutants, and accordingly are not useful for the derivation of short-term variations of PM 10 .Employment of temporal predictors enables the estimation of short-term variation of air pollutants, by the utilization of some short-term dynamic input variables in the spatio-temporal model.For example, Gryparis et al.In conclusion, further studies into the development of high-resolved spatio-temporal PM 10 prediction in urban areas are necessary.Future studies may attempt to develop new techniques for high-resolved spatio-temporal PM 10 prediction in urban areas.Non-linear approaches seem better than linear models for the development of spatial and spatio-temporal models with higher accuracy.Therefore, future studies may attempt to develop different non-linear spatial and spatio-temporal PM 10 prediction models.
In order to diminish or prevent the risk of critical concentration levels, abatement actions should be planned at least one or two days in advance [31].Consequently, PM 10 forecasting with daily or smaller time steps can be considered as high temporal resolution (Short-term forecasts).In order to perform suitable monitoring at the urban scale, the grid size should be comparable with urban blocks (often less than 100 m) [146].In addition, Merbitz et al. [147] showed that PM 10 concentration in urban areas decreases exponentially from the emission source, and the effect of the emission source is dampened at a very short distance (about 100 m).Therefore, to suitably consider short-range PM 10 variations, the grid size for high-resolved spatial PM 10 prediction in urban areas should be less than 100 m.

Summary and Conclusions
Figure 2 depicts the summary of this study.The review of previous studies on the statistical modeling of PM 10 in urban areas showed that non-linear techniques outperform linear techniques for temporal prediction of PM 10 .Among the introduced techniques, ANN, SVM and hybrid models have the most potential for better performance.In addition, including PM 10 in the input variables significantly improves the forecasting results.Although MLP has been employed more than other ANN structures for the temporal prediction of PM 10 , the best ANN structure is still unknown.
Frequency distribution of PM 10 in the training dataset may strongly influence the modeling results, and the utilization of PM 10 data with uniform distribution may lead to an appropriate model for the forecasting of extreme events.However, utilization of this training database reduces the accuracy of low and normal PM 10 concentration forecasts.Accordingly, combining two PM 10 forecasting models, which have been developed using two training datasets with different frequency distributions, may lead to a suitable model for forecasting low to high PM 10 concentrations.
Linear approaches are often used for the development of LUR models for spatial prediction of PM 10 .However, non-linear approaches have recently been employed and they can improve results.Consequently, future studies may develop non-linear LUR models for spatial and spatio-temporal PM 10 prediction in urban areas.Although LUR modeling with a high number of sampling stations leads to better performance, there is no specific method for the determination of the optimum number of monitoring stations for the development of the LUR model.Recently, a new technique has been developed for the generation of virtual PM 10 stations, and it can be employed in the densification of the PM 10 monitoring network.This approach reduces the need for additional in-situ measurement data and enables a low-cost method for spatial prediction of PM 10 .
LUR studies have mainly been focused on the spatial modeling of seasonal or annual PM 10 , but it is possible to develop an LUR model at any given time.A few studies have focused on the development of spatio-temporal variations of air pollutants using LUR models.Among six different approaches to the spatio-temporal modeling of PM 10 , only one approach (employment of temporal predictors) enables the estimation of short-term variation in the levels of air pollutants.This is achieved by the utilization of some short-term dynamic input variables in the spatio-temporal model.This approach has rarely been used for high-resolved spatio-temporal prediction of PM 10 in urban areas, and future studies may focus on the development of a high resolved spatio-temporal statistical model for PM 10 prediction in urban areas.

[ 74 ]*
Abbreviations of the parameters: BLH: Boundary layer height; CC: Cloud cover; CDAY: Cosine of hour of the day; DOW: Day of week; DT: Difference between daily maximum and minimum temperatures; DRC: Distance to road center; HOD: Hour of day; MOY: Month of year; MVs: Meteorological variables; P: Atmospheric pressure; PCs: Principal components; Rf: Rainfall; RH: Relative humidity; SD: Street direction; SAR: Street aspect ratio; SDAY: Sine of hour of the day; SEA: Binary seasonal index; SR: Solar radiation; Ta: Air temperature; Tmin: Minimum temperature; Tmax: Maximum temperature; TrV: Traffic volume; TVs: Temporal variables; WD: Wind direction; WDI: Wind direction index; WS: Wind speed; ** Abbreviations of the methods: ANN: Artificial Neural Networks; LL: Lazy Learning; MLP: Multi-variate Linear Regression; MLP: Multi Layer Perceptron; NPR: Non-Parametric Regression; PCRA: Principal component Regression Analysis, PNN: Pruned Neural Networks; RBF: Radial Basis Function; *** Unit of the presented RMSE and MAE values is µg/m 3 ; **** Real time simulation has been performed.

[ 143 ]
and Maynard et al.[144] employed the temporal, meteorological, location (latitude and longitude) and traffic variables, along with black carbon levels measured at one monitoring station, for the development of a daily black carbon model for Boston, USA.However, they did not consider land use parameters.Su et al.[145] incorporated meteorological parameters into LUR models and utilized them for hourly NO 2 estimation in Vancouver, Canada.Alam and McNabola[39] utilized the daily traffic and meteorological parameters, temporal parameter, and transboundary air pollution, derived from back trajectory analysis and population density, as input variables of the different statistical techniques (MLR, NPR (Non-Parametric Regression), ANN) within the LUR conceptual framework for the spatial simulation of daily PM 10 concentration in Vienna (Austria) and Dublin (Ireland).The results showed that ANN (Dublin: R 2 = 0.51; Vienna: R 2 = 0.66) outperforms MLR (Dublin: R 2 = 0.38-0.43;Vienna: R 2 = 0.35-0.39)and NPR (Dublin: R 2 = 0.45; Vienna: R 2 = 0.51).They showed that the utilization of a non-linear technique, instead of linear techniques, can lead to an acceptable level of accuracy.

Figure 2 .
Figure 2. Summary of the review.

Table 1 .
Recent studies on PM 10 forecasting in one or more stations in urban areas, using MLR and ANN techniques.

Table 3 .
Results of the application of other techniques for PM 10 forecasting, and their comparison with MLR and ANN techniques.
* For abbreviation of the input parameters refer toTable1; ** Abbreviation of the methods: ARIMA: Auto-Regressive Integrated Moving Average; BRT: Boosted Regression Tree;.CART: Classification And Regression Trees; GAM: Generalized Additive Model; QRM: Quantile Regression Model; SVM: Support Vector Machines; *** Unit of the presented RMSE and MAE values is µg/m 3 ; **** Real time simulation has been performed.

Table 4 .
Recent studies on the spatial modeling of PM 10 in urban areas.