Prediction of Air Pollutant Concentrations via RANDOM Forest Regressor Coupled with Uncertainty Analysis—A Case Study in Ningxia

: Air pollution has not received much attention until recent years when people started to understand its dreadful impacts on human health. According to air pollution and the meteorological monitoring data from 1 January 2016 to 31 December 2017 in Ningxia, we analyzed the impact of ground surface temperature, air temperature, relative humidity and the power of wind on air pollutant concentrations. Meanwhile, we analyze the relationships between air pollutant concentrations and meteorological variables by using the mathematical model of decision tree regressor (DTR), feedforward artiﬁcial neural network with back-propagation algorithm (FFANN-BP) and random forest regressor (RFR) according to air-monitoring station data. For all pollutants, the RFR increases R 2 of FFANN-BP and DTR by up to 0.53 and 0.42 respectively, reduces root mean square error (RMSE) by up to 68.7 and 41.2, and MAE by up to 25.2 and 17. The empirical results show that the proposed RFR displays the best forecasting performance and could provide local authorities with reliable and precise predictions of air pollutant concentrations. The RFR effectively establishes the relationships between the inﬂuential factors and air pollutant concentrations, and well suppresses the overﬁtting problem and improves the accuracy of prediction. Besides, the limitation of machine learning for single site prediction is also overcame.


Introduction
With industrialization and urbanization, air pollution in most countries is worsening over the years.Many areas including the north China and south of the Yangtze River have suffered severe and continuous haze weather.High level of air pollutant concentrations plays an important role not only in degrading the environment but also in causing respiratory diseases [1][2][3][4][5][6].In order to enable the government to put forward reasonable measures in mitigating air pollution, it is very necessary to accurately predict the concentrations of air pollutants in real time or near real time.
Generally forecasting techniques can be divided into deterministic and stochastic approaches.The deterministic model is suitable for a wide range of trend forecasting, and the stochastic model is suitable for single site prediction.The deterministic air quality models based on numerical models mainly include Chem models, Community Multiscale Air Quality (CMAQ) [7] and Nested Air Quality Prediction Model System (NAQPMS) [8] etc.It mainly uses all kinds of meteorological data and emission source data to estimate the diffusion of air pollutants through the physical and chemical processes.It has a solid theoretical foundation and a relatively transparent model.However, the accuracy of the deterministic model is highly influenced by the boundary condition of the model and the initial conditions.Furthermore, historical data are not be used in the model.At the same time, the computations of the model are complex and the requirement of computing resources is higher.So it is difficult to fully understand and quantify [9][10][11].
The stochastic methods mine the relationships between air pollutant concentrations and the influential factors, including the meteorological variables and human activities based on machine learning methods, and then predict air pollutant concentrations in the future [12][13][14].Statistical methods are considered more reliable tools to predict air pollutant concentrations than deterministic approaches [15][16][17][18][19][20], including principle components analysis (PCA), kriging, inverse distance weighting [21,22], land-use regression (LUR) and artificial neural network (ANNs), etc. [23][24][25][26].Regression methods can learn the intrinsic relationships between the influential factors and air pollutant concentrations [27].Harishkumar [28] proposed to use geographical weighted regression (GWR) method to study the relationships between air pollutant concentration and the influential factors, and achieved good results.LUR is technically simple, easy to fit in calculation and high spatial resolution.Since its emergence in 1997, it has been applied to the predictions of air pollutant concentrations.However, the regression methods do not consider the spatial correlation in the air pollution data and overestimate the importance of covariates.At the same time, because the error does not meet the assumption of independent and identically distributed, the prediction ability of the regression method is low in the spacetime domain.The performances of ANNs are generally higher than air quality numerical models CMAQ and NAQPMS.ANNs have the advantages of less sample data, simple modeling, convenient operation, small relative error [17,20].However, there are generally some disadvantages in ANNs, such as poor generalization ability, over fitting, easy to fall into local optimization.
Geostatistics is based on the principle that the closer the observation value in the space-time domain is more similar than the farther the observation value [29].There is no the assumption of sample independence in Geostatistics and obeys the constraint of normal distribution to obtain a good fit to the data.However, it results in spatiotemporal heterogeneity after adding time dimension, which makes spatiotemporal data visualization and analysis quite challenging.In addition, spatiotemporal data usually contain a long time series of air pollution [18].It is necessary to impose strong assumptions on the process [21,22].
In this paper, RFRs have been employed in this work in order to predict air pollutant concentrations.RFRs have the characteristics of adaptive training and tuning and effectively establish the relationships between the meteorological variables and air pollutant concentrations, and well suppress the overfitting problem and improves the accuracy of prediction.Besides the limitation of machine learning for single site prediction is also overcame.
The remainder of our paper is organized as follows: In the next section we present the study area and the data collected.In Section 3 the basic concepts of FFANN-BP, DTR and RFR are presented, and how the validity indices can be used to identify and compare the predicted results.The critical analysis is followed by predicting air pollutant concentrations based on data from 2016-2017.Finally, we conclude our work at the end part after discussing the results of our experiments.

Area Description Study Area
Ningxia is located in the inland area of northwest China, bounded between the latitudes of 35 • 14 N-39 • 14 N and the longitudes of 104 • 17 E-109 • 39 E, adjacent to Shaanxi in the east, Inner Mongolia in the west and Gansu in the north, with a total area of 66,400 square kilometers, and a permanent population of 6.8179 million.The topography of Ningxia gradually inclines from southwest to northeast.It is divided into three parts: the irrigation area of the Yellow River in the north, the arid zone in the middle and the mountain area in the south.Located within the Yellow River system, Ningxia has a temperate continental arid and semi-arid climate with a high terrain in the south and a low terrain in the north.The southern Liupan Mountains are wet and rainy with low temperature and short frost-free period.The northern part has abundant sunshine, strong evaporation, large temperature difference between day and night, and the annual sunshine reaching 3000 h.
Ningxia which is located in the western margin of China's monsoon area is affected by southeast monsoon in summer, low precipitation, with July being the hottest month, the average temperature is 24 • C. In winter, it is greatly affected by northwest monsoon, with a large fluctuation in temperature, with an average temperature of −9 • C lowest temperature.The annual precipitation in the whole region ranges from 150 mm to 600 mm.The average annual water surface evaporation in Ningxia is 1250 mm, ranging from 800 to 1600 mm.Furthermore, the prevailing north wind lowers the humidity level [30].
In Ningxia, the extremely hot and dry climatic conditions in the area play an important role in the resuspension of fine particle, both the sand storm and the domestic fuel are the sources of air pollution.According to the Ningxia annual reports on air quality, the O 3 and particulate matter (PM) are the most important air pollutants in the city [31].There are 15 air monitoring stations of the China National Environmental Protection Agency and 12 meteorological stations in Ningxia.

Data Preprocessing
The data with concentrations less than 0 µg/m 3 and more than 1000 µg/m 3 are eliminated.If one item of meteorological data is missing or abnormal, all data of that day will be eliminated.Outliers are data points that are far from other data points.They are problematic for many statistical analyses because they can cause tests to either miss significant findings or distort real results and are defined as values that deviate from the mean by more than 3 times the standard deviation.Outliers strongly influence the output of a machine learning model.In this paper, the mean value of the data is used to replace the abnormal and missing values.
In our experiment, the concentrations and the raw meteorological data were scaled to a fixed range from 0 to 1 by using the min-max normalization method.We standardize the data by using scikit-learn with the StandardScaler class.The normalization formula is as follows [32]: where y i is the normalized data, x i is the data before normalization, n is the number of observations.

FFANN-BP
It is well known in FFANN-BP the weighted sum of inputs and bias term are passed to the activation level through the transfer function to produce the output.The network is trained in an iterative process.The number of hidden layers is chosen to be only one to reduce the network complexity, and increase the computational efficiency.Figure 1 shows the architecture of the FFANN-BP [33].The inputs are fed into the input layer and propagate through the activation function, different layers may perform different transformations on their inputs.Then The mean squared error between the outputs and actual target values is backpropagated from the output layer to the input layer.The error is minimized by the adaptation of their connected weights in a supervised way.The most important problem is to decide the number of layers and neurons in the hidden layers.
Without loss of generality, let there be n neurons in the input layer, p neurons in the hidden layer, and q neurons in the output layer.The k-th input vector is x(k) = (x 1 (k), x 2 (k), . . ., x n (k)).The k-th input vector of the hidden layer is hi k = (hi 1 (k), hi 2 (k), . . ., hi p (k)), the k-th output vector of hidden layer is ho(k) = (ho 1 (k), ho 2 (k), . . ., ho p (k)).The k-th input vector of the output layer is yi(k) = (yi 1 (k), yi 2 (k), . . ., yi q (k)), the k-th output vector of the output layer is yo(k) = (yo 1 (k), yo 2 (k), . . ., yo q (k)).The desired output vector is d o (k) = d 1 (k), d 2 (k), . . ., d q (k).The weights between the i-th neuron in the input layer and h-th neu-ron in the hidden layer are w ih .The weights between h-th neuron in the hidden layer and o-th neuron in the output layer are w ho , where i = 1, 2, . . ., n, h = 1, 2, . . ., p, o = 1, 2, . . ., q.The biases of the hidden layer and the output layer are b h and b o respectively.The number of samples is m, and f is the activation function.The commonly used activation function is the sigmoid function: (2) Each connection weight is assigned a random number in the interval (−1, 1).E, ε, M are the error function, the calculation accuracy value, and the maximum learning times respectively.The k-th input sample x(k) = (x 1 (k), x 2 (k), . . ., x n (k)) is randomly selected, and the corresponding expected output are d o (k) = (d 1 (k), d 2 (k), . . ., d q (k)) and calculate the input and output of each neuron in the hidden layer.
Then the total error is computed, The partial derivatives of the error function to each neuron in the output layer are calculated by using the expected output and the actual output of the network δ o (k), then the partial derivative of the error function to each neuron in the hidden layer is calculated by using the connection weights from the hidden layer to the output layer, δ o (k) the output of the output layer and δ h (k) the output of the hidden layer [33].
The algorithm terminates when the error reaches the preset accuracy or the number of learning is greater than the prespecified set maximum number of times.Otherwise, we select the next learning sample and the corresponding expected output and return to enter the next round of learning.

DTR
A decision tree corresponds to a partition of the feature space and the output value on the partition unit which is constructed by recursive segmentation, and the feature with the highest information gain is split first.The training process consists of feature selection, tree generation and pruning.All values of the feature are traversed and the space is divided until the value of the feature minimizes the loss function, and a partition point is obtained.
The optimal segmentation is used as the node of the decision tree.When generating leaf nodes, the most important thing is to pay attention to whether it is necessary to stop the growth of the tree.The process continues iteratively until we reach a prespecified stopping criterion such as a maximum depth, which only allows a certain number of splits from the root node to the terminal nodes.It breaks down a dataset into smaller and smaller subsets while at the same time an associated decision tree is incrementally developed.The final result is a tree with decision nodes and leaf nodes.The topmost decision node in a tree corresponds to the best predictor called the root node.
A primary advantage of DTR is that it is easy to follow and understand.It does not require any transformation of the features according to nonlinear data.In order to reduce storage requirement, the size of a decision tree is controlled by setting parameters such as maximum depth and minimum number of leaf nodes.At each segmentation, the features are always randomly arranged.Its output value is the average of all leaf node samples.Therefore, even if the same training data set is used, the optimal segmentation may be different.DTRs tend to overfit very easily.

Random Forest Regression
RFR is one of the most popular algorithms for regression problems because of its simplicity and high accuracy.It is an ensemble technique that combines multiple decision trees with a voting mechanism.Due to randomness it has a better generalization performance than DTR.This helps to decrease the model's variance.It is usually trained by using the bagging method which combines predictions from multiple machine learning algorithms together to make predictions more accurate than an individual model.They are less sensitive to outliers in the dataset and do not require much parameter tuning.The only parameter in RFRs is typically needed to experiment with is the number of trees in the ensemble.The predictions are calculated as the average prediction over all decision trees.The key lies in the fact that there is a low correlation between the individual models.
RFR is regressor, which adopts a voting mechanism to obtain prediction results based on decision tree.RFRs establish multi-decision trees by dividing the training samples.According to the bootstrap sampling method, part of the data is randomly extracted from the data set as the training sample, and the remaining data is used as the validation sample of each decision tree.When regressing unknown samples, the prediction of each decision tree is output first, and then all the prediction results are synthesized by using the simple voting method to obtain the final prediction.
The most apparent benefit of RFR is its default ability in correcting the overfitting problems of decision trees to their training data sets.By using the bagging method and random feature selection the overfitting problem, which often leads to inaccurate outcomes, is almost completely resolved.

Statistical Indexes
The performances of DTR, FFANN-BP and RFR are evaluated by using four commonly used statistics indices, which are the coefficient of determination (R 2 ), root mean square error (RMSE), mean absolute error (MAE), and Mean Absolute Percentage Error (MAPE) between the predicted and observed air pollutant concentrations.The indices are defined as [34].
where o i , p i , ō, p and n are the observed, predicted and the mean of observed and predicted concentrations and the number of observations, respectively.The coefficient of determination indicates the closeness between the overall trend of the predicted value of the model and the observed value.The mean absolute error and root mean square error reflect the deviation of the observed value from the predicted value.The higher the value of R 2 , the better the model performance.Correspondingly, the lower the value of the RMSE, MAE and MAPE, the better the model acquired.

Data Used
We obtained the air pollutant concentrations from Ministry of Ecology and Environment of the People's Republic of China.Meteorological variables are obtained from China meteorological administration.The concentrations of carbon monoxide (CO), nitrogen dioxide (NO 2 ), ozone (O 3 ), particulate matter (PM) and sulfur dioxide (SO 2 ) are monitored.The data basis consists daily values corresponding to the 2-year period between January 2016 and December 2017.
Furthermore, the air monitoring stations do not monitor the level of meteorological variables.Thus, we select the nearest meteorological stations to represent the levels of the meteorological variables in the air monitoring stations.However, the distances of some air monitoring stations and meteorological stations are too far.Consequently, only three air monitoring stations, Ma Lian Kou, Sha Po Tou and Ma Yuan are selected for this research.Figure 2 and Table 1 showed the geographical regions of Ningxia and the locations of air monitoring stations.Ma Lian Kou is located at the foot of Helan mountain in Yinchuan city and belongs to the northern Yellow river diversion irrigation area.Sha Po Tou is located in the central arid zone, near the Tengger Desert and on the Bank of the Yellow River.Ma Yuan is located in Guyuan city and belongs to the southern mountainous area.A descriptive statistics of these parameters in three monitoring stations for the studied period is presented in Table 2, including minimum, mean, median and maximum levels of the 12 input parameters and the concentrations of air pollutants during 2016-2017.Figures 3-5 show the diurnal variations of air pollutant concentrations in three selected monitoring stations.In Ma Lian Kou, the concentrations of CO fluctuated enormously, with maxima of 38.24 µg/m 3 in January 2016, and 4.59 µg/m in August 2016, respectively.These maxima correspond to the winter months.Concentration minima of CO took place during the summer months.Its values were 0.0722 µg/m 3 and 0.1053 µg/m 3 in April and August 2016.
Similarly, the concentrations of NO 2 fluctuated significantly with several maxima of 54 and 51 µg/m 3 in December 2016, respectively.These maxima correspond to the days of highest energy consumption in homes due to heating and a greater density of cars on the roads during the winter season.Likewise, the minima in the concentrations corresponded to the spring months.
The concentrations of O 3 also fluctuated considerably, with maxima of 213 µg/m 3 in June 2016, and 208 µg/m 3 in May 2016, respectively.These maxima corresponded to the summer months.Concentration minima of O 3 took place in September 2016, its values were 13.375 µg/m 3 and 20.3 µg/m 3 in September.This trend is general throughout the studied years, since the formation of O 3 is associated with photochemical reactions, which requires the presence of strong sunlight as a catalyst.
In a similar way, the concentrations of PM 2.5 went up and down slightly but remained quite stable at around 70 µg/m 3 with two spikes at 307 µg/m 3 in May 2016 and March 2016, and a minimum of 6.9 µg/m 3 in August 2016 and 11.4 µg/m 3 in September 2016.
Similarly, the concentrations of PM 10 went up and down slightly but remained quite stable at around 33 µg/m 3 with two spikes at 195 µg/m 3 in May 2016 and September 2016, and a minimum of 6.1 µg/m 3 in August 2016.SO 2 went up and down slightly but remained quite stable at around 32 µg/m 3 with two spikes at 182 µg/m 3 and 143 µg/m 3 in January 2016.
It is shown that the trends of air pollutant concentrations at the three monitoring stations are generally different, and the concentrations of air pollutants in Ma Yuan are the lowest.Therefore, the differences of air pollutant concentrations are closely related to geographical locations.
The meteorological variables such as ground surface minimum temperature, maximum and mean temperature, minimum relative humidity and maximum relative humidity, air minimum temperature, maximum and mean temperature, minimum, mean, maximum wind speed, sunshine duration supplied by the China meteorological data service center, their units were • C, m/s, and hour, respectively.The meteorological variables were recorded on a daily basis.Table 3 show that the minimum of air temperature (T min ) ranged from −15 in January to 27 • C in July, while the maximum of air temperatures (T max ) varied from −10 • C in January to 41 • C in July.The average sun shine of duration (ssd) was 6.7 h with the minimum and maximum values of 0h and 14 h appearing in November and June, respectively at Ma Lian Kou station.

Selection of the Influential Factors
The selection of input parameters is generally based on the prior knowledge of the formation of the air pollutants and the correlation analysis.Through the descriptive analysis of the air pollutant concentrations and the meteorological variables, we can select the most important input parameters and understand which are the dominant factors for the formation and diffusion of air pollutants.Generally, the levels of air pollutant concentrations are associated with emission sources, the formation of secondary pollutants and wind speed, air temperature and ground surface temperature, etc.It is well known that air pollutants and weather conditions are associated with each other in a complex relationship.With the increase of air temperature, the stronger the atmospheric convection activity, the more unstable the air stratification, which is conducive to the diffusion and dilution of pollutants.The air pollutant concentrations were closely related to the change of meteorological factors.Furthermore, relative humidity shows significant negative effect on the concentrations of O 3 , PM 2.5 and SO 2 , because precipitation will wash out the atmospheric particles.It can be seen from Table 4 that there is a strong negative correlation between wind speed and air pollutant concentration, significant negative effect demonstrates the fact that low concentrations are linked with high wind speed in Ningxia.It is shown in Figure 4a that the concentration of O 3 was higher in hot summer due to the high radiation and temperature, and lower in winter.The ground surface temperatures have the strongest correlations with the air pollutant concentrations, which is due to the enhancement of ultraviolet radiation, the increase of temperature, the enhancement of the decomposition of oxygen molecules, and the increase of the photochemical reaction rate of O 3 formation, resulting in the increase of air pollutant concentrations.The obtained results show that there are strong relationships between ground surface temperatures have and the concentrations of the majority of pollutants in the region of Ningxia.Moreover, air pollutant concentrations have a close relationship with the concentrations at previous time.There is a high possibility of mutual conversion between PM 2.5 and other pollutants, especially PM 10 .PM 2.5 and PM 10 are negatively correlated with air temperature.Furthermore, the concentrations of NO 2 may have a notable influence on the concentrations of O 3 .High levels of particulate matter in Ningxia are mostly caused by sand storms and construction activities near the monitoring stations.High temperature can result in enhanced re-suspension of road dust.Meteorological variables are used for the prediction of air pollutant concentrations.
We only consider variables with a coefficient of correlation greater than 0.30 as input dataset [35].According to the correlation coefficient matrix shown in Table 4, there is a negative relationship between the concentrations of NO 2 and air temperature and wind speed, respectively.Hence the combinations of other air pollutant concentrations, the air pollutant concentrations one day in advance and meteorological variables for each air pollutant concentration are chosen as the input dataset.And we obtained the selected meteorological variables for every pollutant in Table 5.

Experimental Results and Interpretations
For the purposes of comparisons, FFANN-BP, DTR and RFR models are trained in order to predict air pollutant concentrations in the three monitoring stations of Ningxia at a local scale.In this study, DTR, FFANN-BP and RFR were used to evaluate the ability of two-layer random forest model to estimate air pollutant concentrations.The data from 1 January 2016 to 30 June 2017 is used for model training, and the remaining is used for model prediction.It is trained on DTR, FFANN-BP and RFR, and the parameters are fine tuned according to the experimental results.The flowchart of our method is shown in Figure 6.
The initial values of the parameters are set according to the algorithmic characteristics and parameter-adjustment experience of different models, and the grid search provided by scikit learn is used for super parameter optimization.In this paper, the base model of random forest is DTR, and the alternative values of the number of DTRs are set as 10, 20, 30, 40 and 50.Other super parameters such as the maximum number of samples and the minimum number of segmented samples of the leaf nodes use the default minimum value.The final number of DTRs is 20.The stopping criterion is met if there is no improvement in the R 2 after ten iterations, in combination with a maximum number of iterations equal to 500.The optimal parameters of FFANN-BP are that the least mean square error as 0.001, max training time as 1000, and learning rate as 0.15.The size of the network and learning parameters greatly affect prediction performance.The best network structure trained is 5 input nodes and 12 hidden nodes.The output layer has only one neuron, corresponding to the air pollutant concentrations.It has been demonstrated that the BFGS algorithm is the most efficient method to solve the optimization of the object function because of its speed and robustness.Due to space constraints, this paper only shows the experimental results of Ma Lian Kou air monitoring station.To verify the performances of the DTR, FFANN-BP and RFR used in this study, Table 6 shows the RMSE, R 2 , MAE and MAPE between the measured and predicted values of air pollutant concentrations of the above three models at Ma Lian Kou, Sha Po Tou and Ma Yuan air monitoring stations.The R 2 of the three machine learning models is between 0.44 and 0.99, it is shown that the values of these statistical parameters for the three models are all within the recommended range.The RMSE of each model is between 0.25 and 126.7, and the RMSE of the RFR model is the lowest.Compared with the MAE, the RFR model has the lowest MAE of 6.93, followed by the FFANN-BP model of 7.74, and the DTR model has the highest MAE of 10.6.For MAPE, the RFR model is also the lowest among the three models of 17.56.It can be found that RFR shows good experimental results.The time series plots are also shown in Figure 7 to depict the relationships between the observed and predicted data.These results indicate the important goodness of fit of the RFR to the observed data.Following the same methodology, fitting were also made for the other air pollutants as dependent variables using DTR, FFANN-BP and RFR with the results as follows.It is shown that RFR is the best model for predicting the concentration of air pollutant concentrations in the three air monitoring stations at a local scale, since the correlation coefficient of RFR equal to 0.99.
The time series plot of the ground measured air pollutant concentrations and the predictions by DTR, FFANN-BP and RFR are shown in Figures 7-9.It can be observed that there is a higher agreement between the observed and predicted data.It is also shown that the predicted concentrations of RFR are closer to the observed data than those of the DTR and FFANN-BP, meaning that the RFR improves the predicted performance of air pollutant concentrations.We also employ the histograms to provide further insight into the relationship of the predictors with air pollutant concentrations in Figures 10-12.RFR for air pollutant concentrations is very good since the histogram of RMSE is very steep and it is also considerable for the other pollutants in Figures 10-12.At the same time, according to the construction time of the models, RMSE, MAE, MAPE are analyzed to evaluate the model.The prediction accuracy and model construction efficiency of different machine learning models are compared and analyzed.Appropriate variables are selected for the prediction of air pollutant concentrations.In terms of prediction accuracy, the RFR model has the best prediction ability, followed by the FFANN-BP model, and the DTR model.RFRs have stable accuracy and good prediction capability.The results show that RFR not only increases the performance of the prediction of air pollutant concentrations in Ningxia, but also discriminates the influential factors and reduces the dimension of the data, therefore reduces the time complexity of the algorithm.RFR uses the average reduction of node impurity to describe the importance of the variables.The greater the reduction of node impurity by a factor, the more important the factor becomes.The importance of variables in the decision tree model is measured in the form of weight.The greater the weight of a factor, the stronger the influence of the factor in affecting the concentration of air pollutants.In this research, the importance of each factor on the prediction of air pollutant concentration is further analyzed.Figures 13-15 and Table 7 show the analysis of the most important features of DTR and RFR for six air pollutant at Ma Lian Kou.The characteristic variables considered include meteorological factors, air pollutant concentrations of the previous day.
For CO, it can be seen that the concentrations of NO 2 rank first and contribute the most.For NO 2 , it can be seen that PM 10 concentrations rank first and contribute the most.For O 3 , it can be seen that ground surface temperature ranks first and contributes the most.For SO 2 , it can be seen that NO 2 concentration ranks first and contributes the most.For PM 2.5 , it can be seen that PM 10 concentration ranks first and contributes the most.For PM 10 , it can be seen that PM 2.5 concentration ranks first and contributes the most.As shown in Table 5, the weight importance of temperature, relative humidity and air pressure are 14 and 25 in turn, indicating that ground surface temperature and relative humidity have the greatest impact on the concentration of air pollutants predicted by DTR, followed by air pressure and precipitation, and wind speed has the least impact.Figures 13-15 and Table 7 shows the importance analysis of various influencing factors when we use decision tree and random forest algorithm to predict the concentration of various air pollutants in 2016.As shown in Figures 13-15 and Table 7, for CO, NO 2 is the most important factors in both methods.For NO 2 , PM 10 is the most important factor.For ozone, the ground surface temperature is the most important factor.PM 2.5 and PM 10 are the most important influencing factors for each other.NO 2 is the most important factor in the prediction of SO 2 .8 that the running time of DTR is the shortest due to its simple structure, FFANN-BP model takes the longest time to build, followed by RFR model.The running time of RFR is much lower than that of FFANN-BP.This is enough to reflect that RFR has low time complexity.
Due to the randomness of the three methods, the accuracy of the three methods cannot be evaluated by one experimental result.Therefore, this paper runs 1000 Monte Carlo experiments and takes the average of the running results to evaluate the accuracy of the three methods.The results in Table 9 show that the accuracy and prediction stability of RFR are better than the other two methods.The performances achieved highlight that for the extreme concentrations of air pollutants, the performance of the DTR is not significant.The reason is that the construction project of this period is particularly high in Ningxia.However, RFR still acceptedly performs even with the sudden occurrence of such event.For the particulate matter, we find the decrease in performance of DTR and FFANN-BP, making the variance of the concentrations of the particulate matter larger.However, RFR is still more adaptable than FFANN-BP and DTR.It shows that the DTR model has poor prediction ability in using the meteorological elements to predict air pollutant concentrations, and it is recommended to use the RFR model to predict air pollutant concentrations.

Conclusions
In this study, Ningxia Province, where air pollution has been increasing in recent years, is selected as the research area.It is shown that the concentrations of CO, PM 10 and PM 2.5 were higher in the cold and dry winter than those in summer because of the combustion of fossil fuels for heating purposes.The aim of this study was to propose a modelling procedure that would yield satisfactory results for the prediction of ambient air pollutant concentrations.In this work DT, FFANN-BP and RFR models were proposed for predicting the air pollutant concentrations in Ningxia, China.The levels of air pollutant concentrations were observed in three air monitoring stations, the capital of Ningxia and the rural areas of Ningxia.
The collected data for air pollutant concentrations and meteorological variables were used for the development of DTR, FFANN-BP and RFR.Data was prepared by calculating the average of the air pollutant concentrations for each day of the study period.Compared with DTR and FFANN-BP, it is evident that RFR is superior to the other methods.Furthermore, the proposed method has been successfully applied to the analysis of the importance of the predictors.We conducted an uncertainty analysis based on Monte Carlo experiments.The proposed method has worked well in predicting the air pollutant concentrations and can be effectively utilized for the analysis of the importance of the predictors.It reveals that there is a close relationship between air pollutant concentrations and meteorological variables.Hence, the developed model is capable of generating better forecasting performance for air pollutant concentrations.Because of the generality of the algorithm, it can be applied to other area and databases.
It can be incorporated into the control and management for a cleaner air and a better environment in many cities.Furthermore, we will consider other ways of using the spatial and meteorological conceptions.Our future research work will focus on the improvement and optimization of machine learning models.Multimodal analysis can effectively decompose the time periodic change trend and noise of air pollutant concentration.Therefore, the introduction of multimodal analysis into random forest regression model can effec-

Figure 1 .
Figure 1.The architecture of the FFANN-BP.

Figure 2 .
Figure 2. The locations of air monitoring and meteorological stations, where the purple and blue solid point represent the air monitoring stations and meteorological stations in Ningxia, respectively.

Figure 6 .
Figure 6.The flowchart of our method.

3 )Figure 7 .
Figure 7.The observed concentrations and the predicted concentrations of DTR, FFANN-BP and RFR for CO and NO 2 at Ma Lian Kou, where the red line and the black, green and lightcyan lines represent the observations, and the predictions of DTR, FFANN-BP and RFR, respectively.

3 )Figure 8 .
Figure 8.The observed concentrations and the predicted concentrations of DTR, FFANN-BP and RFR for O 3 and SO 2 at Ma Lian Kou, where the red line and the black, green and lightcyan lines represent the observations, and the predictions of DTR, FFANN-BP and RFR, respectively.

3 )Figure 9 .
Figure 9.The observed concentrations and the predicted concentrations of DTR, FFANN-BP and RFR for PM 2.5 and PM 10 at Ma Lian Kou, where the red line and the black, green and lightcyan lines represent the observations, and the predictions of DTR, FFANN-BP and RFR, respectively.

Figure 10 .Figure 11 .
Figure 10.The error histogram of the prediction for CO and NO 2 at Ma Lian Kou.

Figure 12 .
Figure 12.The error histogram of the prediction for PM 2.5 and PM 10 at Ma Lian Kou.

Figure 13 .Figure 14 .Figure 15 .
Figure 13.The importance of predictor variables of RFR for (a) CO and (b) NO 2 at Ma Lian Kou.

Table 1 .
Coordinates of the air monitoring stations in Ning Xia.

Table 2 .
Basic descriptive statistics of the observed concentrations.Minimum, maximum, mean and standard deviation of six output parameters during 2016-2017 used in this study, all in µg/m 3 .The variations of the concentrations for CO and NO 2 at Ma Lian Kou, Sha Po Tou and Ma Yuan in 2016.The variations of the concentrations for O 3 and SO 2 at Ma Lian Kou, Sha Po Tou and Ma Yuan in 2016.The variations of the concentrations for PM 2.5 and PM 10 at Ma Lian Kou, Sha Po Tou and Ma Yuan in 2016.

Table 3 .
Basic descriptive statistics of the measured meteorological variables at the three stations.
GST, RHU, TEM, WIN, SSD represent ground surface temperature, relative humidity, air temperature, wind speed and sunshine of duration respectively.

Table 4 .
The Pearson correlation coefficients between the meteorological variables and air pollutant concentrations in 2016.
*** and ** indicate that the Pearson correlation coefficient test is significant at the level of 1% and 5%, respectively.xa represents the variable x one day in advance.

Table 5 .
The selected influential variables for every pollutant, where x a represent the air pollutant concentration the day in advance.

Table 6 .
The predicted performance of the DTR, FFANN-BP and RFR model for the concentrations of six air pollutants at Ma Lian Kou, Sha Po Tou and Ma Yuan in 2016.

Table 7 .
The importances of influential factor at Ma Lian Kou in 2016.Mean RHU and SSD represent the average ground surface temperature, average relative humidity and sun shine duration, respectively.

Table 8
shows the running time of the three algorithms on the concentrations of six pollutants in Ma Lian Kou in 2016.It can be seen from the Table

Table 8 .
The runtime of the three algorithm at Ma Lian Kou in 2016, all in (s).

Table 9 .
The mean, variance and the confidence interval of the predicted concentrations for six air pollutants based on 1000 Monte Carlo experiments at Ma Lian Kou in 2016.