Forecasting Particulate Matter Concentration Using Linear and Non-Linear Approaches for Air Quality Decision Support

: Air quality status on the East Coast of Peninsular Malaysia is dominated by Particulate Matter (PM 10 ) throughout the years. Studies have a ﬃ rmed that PM 10 inﬂuence human health and the environment. Therefore, precise forecasting algorithms are urgently needed to determine the PM 10 status for mitigation plan and early warning purposes. This study investigates the forecasting performance of a linear (Multiple Linear Regression) and two non-linear models (Multi-Layer Perceptron and Radial Basis Function) utilizing meteorological and gaseous pollutants variables as input parameters from the year 2000–2014 at four sites with di ﬀ erent surrounding activities of urban, sub-urban and rural areas. Non-linear model (Radial Basis Function) outperforms the linear model with the error reduced by 78.9% (urban), 32.1% (sub-urban) and 39.8% (rural). Association between PM 10 and its contributing factors are complex and non-linear in nature, best captured by an Artiﬁcial Neural Network, which generates more accurate PM 10 compared to the linear model. The results are robust enough for precise next day forecasting of PM 10 concentration on the East Coast of Peninsular Malaysia.


Introduction
Air quality in a developing country such as Malaysia has decreased gradually because of rapid urbanization, industrialization and population growth [1]. Southeast Asia cities, including Malaysia are notified as surrounded by particulate matter (PM 10 ) in air quality problems [2][3][4][5]. PM 10 or known as coarse particle is defined as the particulate matter that having an aerodynamic diameter of less than (≤) 10 µm [6]. PM 10 had received special attention especially in Peninsular Malaysia, as it was proven to have the highest index through the Air Pollutant Index (API) compared to other criteria pollutants annually. Moreover, the status of API in the East Coast of Peninsular Malaysia was noted as having a good to moderate level, where only a few days were recorded to have unhealthy levels of PM 10 concentration during the dry season months (May to September) [3]. The main sources of PM 10 in Malaysia are emissions from the motor vehicles, heat and power plants, industries and open combustion. High concentrations of PM in the atmosphere (for instance African dust) happen when there exists a low amount of rainfall combined with high temperatures, which prompts the re-suspension of dust. concentration [15][16][17][18]. In spite of their obvious valuable contributions, no earlier conclusion can be drawn regarding the appropriate model without considering distinctive meteorological factors and gaseous pollutants. Understanding the interaction between these factors and PM 10 concentration is useful in forecasting strategies in Malaysia.
The relationship between PM 10 concentration, meteorological factors and gaseous pollutants has been proven statistically by using several multivariate analyses, especially the development of Multiple Linear Regression (MLR) models to forecast PM 10 concentration [19][20][21]. Unfortunately, MLR relies on several assumptions, such as that the independent variables are linearly independent (multi-collinearity), that there is homogeneity of variance (homoscedasticity) and that the variables are normally distributed [22][23][24]. However, in a real world situation, the PM 10 concentration data that is measured in terms of temporal variation does not demonstrate linear characteristics, and therefore, are hard to examine and forecast accurately. The development of traditional modeling techniques such as MLR in nonlinear situation is proven to give less accuracy in forecasting [25]. Thus, for the purpose of exploration of real data PM 10 concentration temporal data, it seems crucial that non-linear models; for instance, Artificial Neural Networks (ANN) were developed for more accurate and precise PM 10 concentration forecasting [26]. Furthermore, ANN quickly processes the information with no particular presumptions of the way of the non-linearity [27]. Therefore, the development of ANN models in forecasting PM 10 concentration is important as ANN models can capture the complexity and non-linear character of PM 10 in the atmosphere under the influence of meteorological and gaseous pollutants.
Undeniably, various studies have been performed on PM 10 forecasting, worldwide and in Malaysia. However, this effort has less been reported especially in the eastern part of Peninsular Malaysia, rather than the western part of Peninsular Malaysia. The increment of year by year of industrialization, urbanization and population on the East Coast of Peninsular Malaysia is believed to reduce the air quality and affect human health [28]. Hence, it is advantageous to develop three forecasting algorithm techniques, which comprise linear and non-linear algorithms. In the real world situation, an area is distinctly differentiated through land use type, and therefore, different land use that is known as the majority notified in the East Coast of Peninsular Malaysia composed of the urban, suburban and rural areas which representing the cities of Kuala Terengganu and Kota Bharu, Kuantan, and Jerantut, respectively, were selected as the field of interest. Moreover, no comprehensive modeling study has been undertaken that attempts to research the variation of PM 10 concentration caused by different meteorological factors and gaseous pollutants in the East Coast of Peninsular Malaysia. The combination of this knowledge is important in building significant PM 10 forecasting tools for beneficial information in air quality management. In this study, the predictive ability of the linear model, for instance the Multiple Linear Regression (MLR), and that of non-linear models approaches, such as the Multi-Layer Perceptron (MLP) and Radial Basis Function (RBF), for PM 10 concentration forecasting were investigated at the study areas. These models were critically assessed through performance indicators keeping in mind the end goal to choose the best-fitted model for each study areas for accurate forecasting.

Study Area
This research was performed in the East Coast of Peninsular Malaysia, which covers the states of Pahang, Terengganu and Kelantan. Two urban sites were selected in this research, located at Kuala Terengganu, Terengganu and Kota Bharu, Kelantan. One sub-urban site was located at Kuantan, Pahang and one rural site was located at Jerantut, Pahang. Two urban areas were selected in this study due to their future rapid development as planned by the Malaysian government through the National Physical Plan, more specifically the East Coast Economic Region (ECER). Moreover, construction of the East Coast Rail Line (ECRL) will transform several areas to urban areas. Furthermore, the complexity of urban areas is another reason for the selection of two urban areas for better validating and testing the robustness of models being developed. The study areas are depicted in Figure 1. The details information in terms of station ID, the location of the air quality monitoring station, the classification or land use, as well as longitude and latitude of each selected air quality monitoring stations, is stated in Table 1.
Atmosphere 2019, 10, x FOR PEER REVIEW 4 of 25 station, the classification or land use, as well as longitude and latitude of each selected air quality monitoring stations, is stated in Table 1.

Data Acquisition
The secondary data has a 15-year time span, from 1 January 2000 to 31 December 2014. This long term historical data was needed, as it can represent the variation of pollutants comprehensively [29]. The PM10 data used in this research was recorded as a major aspect of a Malaysian Continuous Air Quality Monitoring (CAQM) program, using the β-ray attenuation mass monitor (BAM-1020), as made by Met One Instruments Inc. The instrument basically is built-in with a cyclone and PM10 head

Data Acquisition
The secondary data has a 15-year time span, from 1 January 2000 to 31 December 2014. This long term historical data was needed, as it can represent the variation of pollutants comprehensively [29]. The PM 10 data used in this research was recorded as a major aspect of a Malaysian Continuous Air Quality Monitoring (CAQM) program, using the β-ray attenuation mass monitor (BAM-1020), as made by Met One Instruments Inc. The instrument basically is built-in with a cyclone and PM 10 head particle trap, fiberglass tape, flow control and a data logger. This instrument delivers a fairly high resolution of 0.1 µg/m 3 at a 16.7 L/min flow rate, with lower detection limits of 4.8 µg/m 3 and 1.0 µg/m 3 for 1h and 24 h, respectively [30]. The installation, operation and maintenance of Air Quality Monitoring Stations (AQMS) are performed by Alam Sekitar Malaysia Sdn. Bhd (ASMA) on behalf of the Malaysian Department of Environment [31]. The forecasting of PM 10 concentration in this study is on a daily basis [32]. In order to gain a better understanding of PM 10 concentration, 10 daily averaged parameters were taken into consideration. This study used a daily average of particulate matter with an aerodynamic diameter less than 10 µm (PM 10 , µg m −3 ), ambient temperature (AT, • C), relative humidity (RH, %), wind speed (WS, m s −1 ), global radiation (GR, MJ m −2 ), Mean Sea Level Pressure (MSLP, hPa), rainfall amount (RA, millimeter), carbon monoxide (CO, ppm), nitrogen dioxide (NO 2 , ppm) and sulphur dioxide (SO 2 , ppm). The data were acquired from the Air Quality Division, Department of Environment (DOE), Ministry of Natural Resources and Environment of Malaysia, as well as from the Malaysian Meteorological Department (MMD), Ministry of Energy, Science, Technology, Environment and Climate Change. The reliability of data from DOE has consistently assured through quality assurance and quality control [33]. Moreover, the procedures used to follow the standard drawn by United States of Environmental Protection Agency (USEPA) [30].
The data were divided into two sets; 70% (N = 3835) for the development of the linear model (Multiple Linear Regression (MLR)). In the case of non-linear models, the Multi-Layer Perceptron (MLP) and the Radial Basis Function (RBF) were used, and the remaining 30% (N = 1644) for model validation [34,35]. The separation of this data is important in ANN as a method of early stopping. This early stopping technique is important, as it avoids any underfitting and overfitting of an ANN network [36] and allows the network to stop training the model itself when the verification errors starts to increase when compared to the training error. The lowest squared error on the verification is supposed to have the best generalization ability [37]. During the training of ANN models, the stopping criteria are the number of epochs and the decrease in the training error [38]. The model validation for the linear model, in the case for non-linear models, known as model testing, is aimed at measuring the agreement of the models, generalizing with an independent data set [39].

Imputation of Missing Values
Data in the dataset might missing due to the instruments problem, weather, maintenance and changing of siting monitor. This can introduce errors while developing the prediction model. Missing data in terms of percentage (%) were calculated by dividing the numbers of missing data and numbers of total data; then, data were multiplied with 100%. Linear interpolation technique was applied in the imputation of missing values, as the missing percentage was less than 25%. Linear interpolation imputed the missing values using the mean value of the last and first available data in the dataset in SPSS ® version 23 [40]. The formulation is given as: where x = independent value, x 1 and x 0 = known values of the independent variable and f (x) = value of dependent variable for a value of the independent variable.

Data Normalization
Units of measurement for the variables are different; thus, data normalization is needed. Min-max technique of data normalization is adapted in this study which in the end, the values are between 0 and 1 [0, 1] [41,42] for an accurate predictive model [43]. The technique is based on the following [44]: where x = (x 1 , . . . , x n ) and z i is the ith normalized data.

Multiple Linear Regression (MLR)
The regression analysis is most frequently used for forecasting. The objective is to build up a mathematical model that can be utilized to forecast the dependent variable based on the inputs of independent variables [45,46]. The coefficient of determination (R 2 ) is a marker to decide if the information gives adequate proof to show that the general models contribute to the overall variance in data. In the MLR model, the error term signified by ε is thought to be regularly disseminated with the mean 0 and change σ 2 (which is constant). Similarly, ε is thought to be uncorrelated. In this research, it was accepted that the MLR model has k independent variables and that there are n observations. In this way, the regression model can be composed as [47]: where b i is the regression coefficients, x i is independent variables and ε is a stochastic error associated with the regression.
Collinearity occurs when two model covariates have a linear relationship. This makes the individual contribution of each variable difficult to discern, introduces redundancy and makes the model excessively sensitive to the data. The multi-collinearity assumption was verified by Variance of Inflation Factor (VIF) accompanied with the regression output, where as long as the VIF is under 10 the conducted regression should be fine, where there is no multi-collinearity between the independent variables [24]. The VIF is given by: where VIF i is the variance inflation factor related with the ith predictor and R 2 i is the multiple coefficients of determination in a regression of the ith predictor on all other predictors.
By performing the Durbin-Watson (D-W) Test, autocorrelation can be recognized. Autocorrelation basically uncovers the capability of PM 10 concentration of current day to estimate the following day of PM 10 concentration. The test values can change in the vicinity of 0 and 4 with an estimation of 2 implying that the lingering is uncorrelated [19]. The D-W is given by: where n is number of data and e i = y i − y i (y i = observed data and y i is the forecasted data).

Multi-Layer Perceptron
MLP models consist of several types, but the most commonly used in air pollution forecasting is the Feed Forward topologies of Multi-Layer Perceptron (MLP) as shown in Figure 2 [37,48,49].
The MLP starts when the interested input parameters are fed into the network. These input parameters provide input signals and these signals are sent to the network starting from the input layer to the hidden layer and from the hidden layer to the output layer. The scaled input vector, which introduces neurons to the input layer, is multiplied by weights, which is a real number quantity. The neuron in the hidden layer sums up this information, including bias.

of 24
This weighted sum information is still in its linear model. The non-linearity of information or model occurs when it is passing through the activation or transfer function. Then: where y o is the output, W i is weight vector, X i is the scaled input vector, b is the bias, f is the transfer function and x represents the total sum of weighted inputs.
individual contribution of each variable difficult to discern, introduces redundancy and makes the model excessively sensitive to the data. The multi-collinearity assumption was verified by Variance of Inflation Factor (VIF) accompanied with the regression output, where as long as the VIF is under 10 the conducted regression should be fine, where there is no multi-collinearity between the independent variables [24]. The VIF is given by: where is the variance inflation factor related with the th predictor and is the multiple coefficients of determination in a regression of the th predictor on all other predictors.
By performing the Durbin-Watson (D-W) Test, autocorrelation can be recognized. Autocorrelation basically uncovers the capability of PM10 concentration of current day to estimate the following day of PM10 concentration. The test values can change in the vicinity of 0 and 4 with an estimation of 2 implying that the lingering is uncorrelated [19]. The D-W is given by: where is number of data and = − ( = observed data and is the forecasted data).

Multi-Layer Perceptron
MLP models consist of several types, but the most commonly used in air pollution forecasting is the Feed Forward topologies of Multi-Layer Perceptron (MLP) as shown in Figure 2 [37,48,49]. The MLP starts when the interested input parameters are fed into the network. These input parameters provide input signals and these signals are sent to the network starting from the input Once the error signal is computed, the process of model fitting ends. The difference between target and output is used to compute the error signal in the model, which corresponds to the input [50]. Mathematically, the equation of MLP with several numbers of neurons is given as: where WI ij and WO k j are the weights of input and output layers, respectively, and b 1 and b 2 are biased in the input and output layer, respectively. The determination of various neurons in the hidden layer is an exceptionally basic assessment and is known to be user-specified [51]. Furthermore, the excessive number of neurons in the hidden layer may bring about underfitting or overfitting, which builds the noteworthy blunders of the created models [34]. This situation may happen amid the training stage. The number of neurons in the hidden layer is found by the trial and error approach, which starts with one neuron and progressively includes more neurons one by one [36,52]. Keeping in mind the end goal to decide the optimum number of neurons in the hidden layer in the most targeted way, this trial and error method was rehashed 10 times at arbitrarily appointed picked data points [43], and the optimum number of neurons was selected in view of the least performance error (average of all trials) as measured by Root Mean Square Error (RMSE) [41,52] and R 2 values [52].
There are no standard rules on the minimum and maximum number of neurons in the hidden layer [53]; however, the scope of an optimum number of neurons in the hidden layer has been recommended by a few researchers. The proper number of neurons in hidden layer ranges from 2 √ M + N to 2M + 1; M is the number of input and N is the number of output node [54]. The diverse number of neurons were tried in MLP models by utilizing M 2 − 2 up to M 2 + 2, where M is the number of input variables [55]. It was recommended that the number of neurons in the hidden layer as M + 1 [56], and 2 3 × (M + N) [57]. Other than that, a previous study proposed and effectively demonstrated that the number of neurons in the hidden layer is not bigger than twice the number of inputs [58].
The activation or transfer function played an important role in ANN by producing a non-linear decision through non-linear blends of weighted inputs. This activation function then transforms input signals into output signals. This activation function presents the non-linearity in the MLP model, and subsequently contrasts it from the linear model. Sigmoid units of activation functions map the input to a value in the range of 0 to 1. This research utilizes three sigmoid functions that are generally utilized as a part of MLP; for example, purelin (linear), log-sigmoid (logsig) and hyperbolic tangent sigmoid (tansig) [34,52]. The tansig transfer function in the hidden layer is recommended to be utilized by previous studies [55,59], while the logsig transfer function is proposed by [43,60] and a linear (purelin) activation function is used in the output layer [55].
In order to train the optimum configuration of the MLP network, the learning rate was set to 0.05 [59]. This prevents the network from diverging from the target output; additionally, it enhances the general execution. The bias is defined as 1 [34], both at the hidden layer and output layer for the activation function (sigmoid) employment. The initialization of the MLP network was trained with 5000 epochs [37]. Therefore, considering the above-stated topology and criteria of MLP, several networks consisting three-layered MLP were trained in order to optimize or to find the best activation function of hidden and output layer, as well as the number of nodes or neurons in the hidden layer. The development of MLP is generally composed of three main steps: (1) define the training sets; (2) train the MLP; (3) apply the network to a new set of data (Testing). All implementations and computations were performed using Matrix Laboratory (MATLAB) (MathWorks, Inc. Massachusetts USA) [61].

Radial Basis Function
MLP networks inherent several drawbacks which include slow convergence speed, network topology specification and poor generalization performance [62]. Although several of its optimization techniques can be employed on the network, most will result in computationally high demands, and therefore, it is difficult to put this MLP model into practice. The RBF network was developed, which visualizes a more effective algorithm for a faster convergence speed and good generalization ability [63]. The architecture of RBF model is shown in Figure 3.

Radial Basis Function
MLP networks inherent several drawbacks which include slow convergence speed, network topology specification and poor generalization performance [62]. Although several of its optimization techniques can be employed on the network, most will result in computationally high demands, and therefore, it is difficult to put this MLP model into practice. The RBF network was developed, which visualizes a more effective algorithm for a faster convergence speed and good generalization ability [63]. The architecture of RBF model is shown in Figure 3. RBF consists of three layers; input layer, hidden layer and output layer. In terms of training time, RBF has a shorter training time as compared with other models [64]. Its specialty is that the hidden layer is composed of a transfer function known as Gaussian (radial basis). These hidden neurons work in measuring the weight distance between the input layer and output layer [65]. Two RBF parameters, known as center and spread, are important as the first weight connecting the input layer and hidden layer. The second weight connects the radial basis function to the output layer. The Gaussian is given as [66]: where represents the input, the center of the RBF unit and is the spread of the Gaussian basis function.
The optimum number of nodes in hidden layer depends on the behavior of training dataset. The training process concludes the optimum number of nodes in the hidden layer. A different spread number is used in this study [36] and the number of nodes is equivalent to the number of epochs in for the trained model. Training error is constant at 0.001 [62], while the suitable spread is determined through a trial and error method [42], with a scale of 0.1 from 0 to 1 [67]. The performance is based RBF consists of three layers; input layer, hidden layer and output layer. In terms of training time, RBF has a shorter training time as compared with other models [64]. Its specialty is that the hidden layer is composed of a transfer function known as Gaussian (radial basis). These hidden neurons work in measuring the weight distance between the input layer and output layer [65]. Two RBF parameters, known as center and spread, are important as the first weight connecting the input layer and hidden layer. The second weight connects the radial basis function to the output layer. The Gaussian is given as [66]: Atmosphere 2019, 10, 667 9 of 24 where x represents the input, µ the center of the RBF unit and σ j is the spread of the Gaussian basis function. The optimum number of nodes in hidden layer depends on the behavior of training dataset. The training process concludes the optimum number of nodes in the hidden layer. A different spread number is used in this study [36] and the number of nodes is equivalent to the number of epochs in for the trained model. Training error is constant at 0.001 [62], while the suitable spread is determined through a trial and error method [42], with a scale of 0.1 from 0 to 1 [67]. The performance is based on the lowest RMSE and highest R 2 values for the best predictive model. The output layer is the linear function for executable results. The output layer is given with [68]: where M = number of basis functions, x = output data vector, W k j = weighted connection between the basis function and output layer, φ j = nonlinear function of unit j in the RBF and W ko = weighted connection in the output layer.

Performance Indicators
The selection of the best-fitted model, MLR, MLP or RBF at each site, was based on the performance indexes or performance indicators. Performance Indicator (PI) measures two things: (1) accuracy measure and (2) error measure. The accuracy measures evaluate values from 0 to 1, whereby the best model is considered when the evaluated values are close to 1, while the best model for error measures are selected if the evaluated value is close to 0 [69,70].

Descriptive Statistics
The descriptive statistics and boxplot showing the temporal variation of PM 10 concentrations during the study period (2000-2014) for each site are summarized in Table 2.

Multiple Linear Regression Model
The MLR models were developed and the models' summary is depicted in Table 3. The range of the Variance Inflation Factor (VIF) for the independent variables of the MLR model was in the range of 1.012-1.926. The VIF values were lower than 10, which demonstrates that the multi-collinearity issue does not exists in the model. Durbin Watson (D-W) statistics demonstrate that the models are able to cater the autocorrelation, as the values were in the range of 2.007-2.150. The residual (error) is critical in choosing the ampleness of the factual model. On the off chance that the error demonstrates any sort of pattern, it is interpreted that the model does not deal with all of the methodical data (Figures 4 and 5). Figure 4 indicates histograms of the residuals of PM 10 models. The residual analysis shows that the residuals were normally distributed with zero mean and constant variance. The plots of fitted values with residuals for the PM 10 model are shown in Figure 5, indicating that the residuals are uncorrelated because the residuals are contained in a horizontal band and hence obviously that variance is constant.

Multiple Linear Regression Model
The MLR models were developed and the models' summary is depicted in Table 3. The range of the Variance Inflation Factor (VIF) for the independent variables of the MLR model was in the range of 1.012-1.926. The VIF values were lower than 10, which demonstrates that the multicollinearity issue does not exists in the model. Durbin Watson (D-W) statistics demonstrate that the models are able to cater the autocorrelation, as the values were in the range of 2.007-2.150. The residual (error) is critical in choosing the ampleness of the factual model. On the off chance that the error demonstrates any sort of pattern, it is interpreted that the model does not deal with all of the methodical data (Figure 4,5). Figure 4 indicates histograms of the residuals of PM10 models. The residual analysis shows that the residuals were normally distributed with zero mean and constant variance. The plots of fitted values with residuals for the PM10 model are shown in Figure 5, indicating that the residuals are uncorrelated because the residuals are contained in a horizontal band and hence obviously that variance is constant.

Multi-Layer Perceptron Model
The number of neurons in the hidden layer is assessed through a few references as expressed in Sub-Section 2.6 (Materials and Methods). The range of the tried number of neurons in the hidden layer is organized and appeared in Table 4.

Multi-Layer Perceptron Model
The number of neurons in the hidden layer is assessed through a few references as expressed in Sub-Section 2.6 (Materials and Methods). The range of the tried number of neurons in the hidden layer is organized and appeared in Table 4.

Multi-Layer Perceptron Model
The number of neurons in the hidden layer is assessed through a few references as expressed in Section 2.6 (Materials and Methods). The range of the tried number of neurons in the hidden layer is organized and appeared in Table 4. The best combination of different sigmoid functions, which includes Logsig, Tansig and Purelin, were determined through several trials in the training phase. It should be noted that the best results are marked with bold ( Table 5). The selection of the best model during the training phase is strictly based on the R 2 and RMSE. Site 1, Site 2 and Site 4 have the best combination of Logsig-Purelin activation function while Site 3 has the best combination of Tansig-Purelin activation function. The combination of Logsig/Tansig-Purelin activation functions is similar to the results deliberated by [43,55,59,60] and has been proposed by several researchers in training the MLP model [34,52]. The optimum number of neurons in hidden layer varies among 17-20 with 18 neurons (Site 1), 20 neurons (Site 2), 17 neurons (Site 3) and 18 neurons (Site 4). The inputs that are fed into the MLP models are able to explain 69% (Site 1), 72% (Site 2), 77% (Site 3) and 79% (Site 4) of the variance in data. The forecasted and observed PM 10 concentrations for all sites are depicted in Figure 6. There is a very good agreement between the predicted and observed concentration of PM 10 during the training phase of MLP models.

Radial Basis Function Models
Trial and error method for different spread numbers are used and the best predictive model is marked with bold ( Table 6). The optimum number of nodes in the hidden layer was automatically determined during the training of the model. The best spread number was 1 with 3620 neurons in the hidden layer, 0.5 with 2254 neurons, 0.1 with 1181 neurons and 0.1 with 730 neurons for Site 1, Site 2, Site 3, and Site 4, respectively. In terms of accuracy, Site 1 was able explain 93% of the total variance in data, Site 2 could explain 92%, Site 3 89% and Site 4 83%. The error was measured via RMSE, which were 4.08 µg/m 3 , 7.11 µg/m 3 , 6.56 µg/m 3 and 9.19 µg/m 3 , for Site 1, Site 2, Site 3 and Site 4, respectively. Overall, the predictive model for rural site has slightly lower accuracy as compared with urban and suburban sites. Figure 7 depicts the forecasted and observed concentration of PM 10 at all areas. The remaining 30% of the data were used for the model testing by using the selected combination of activation functions and the optimum number of neurons, as during the training stage. The forecasted daily PM 10 concentrations for the model derived for all sites were plotted against the observed values to determine a goodness-of-fit of the models as validation for the linear models. The regression lines showing 95% confidence interval were also drawn. Most of the points fall in the range of 95% confidence interval. Lines A and C are the upper and lower 95% confidence limit for the regression model. The R 2 is between 0.549-0.665 for the MLR models (Figure 8), 0.680-0.812 for the MLP models ( Figure 9) and 0.693-0.886 for the RBF models ( Figure 10).  The remaining 30% of the data were used for the model testing by using the selected combination of activation functions and the optimum number of neurons, as during the training stage. The forecasted daily PM10 concentrations for the model derived for all sites were plotted against the observed values to determine a goodness-of-fit of the models as validation for the linear models. The regression lines showing 95% confidence interval were also drawn. Most of the points fall in the range of 95% confidence interval. Lines A and C are the upper and lower 95% confidence limit for the regression model. The R 2 is between 0.549-0.665 for the MLR models (Figure 8), 0.680-0.812 for the MLP models ( Figure 9) and 0.693-0.886 for the RBF models ( Figure 10).

Models Evaluation and Selection
Several performance indicators, such as Root Mean Square Error (RMSE), Normalized Absolute Error (NAE), Correlation Coefficient (R 2 ), Prediction Accuracy (PA) and Index of Agreement (IA), were used in the process of the evaluation and selection of the best-fitted model for each site. The error measures such as RMSE and NAE show the best model if the evaluated values are near zero, while R 2 , PA and IA are known as accuracy measures; when these values approach one, it indicates a better model. It should be noted that the best results are marked in bold (Table 7).

Models Evaluation and Selection
Several performance indicators, such as Root Mean Square Error (RMSE), Normalized Absolute Error (NAE), Correlation Coefficient (R 2 ), Prediction Accuracy (PA) and Index of Agreement (IA), were used in the process of the evaluation and selection of the best-fitted model for each site. The error measures such as RMSE and NAE show the best model if the evaluated values are near zero, while R 2 , PA and IA are known as accuracy measures; when these values approach one, it indicates a better model. It should be noted that the best results are marked in bold (Table 7).
In terms of the non-linear model, RBF was chosen as the best model rather than the MLP model. , respectively. Generally, the results show that both MLP and RBF models are close to each other. Unfortunately, the initial weight values are randomly assigned; thus, repeated simulations or training are needed to obtain the best representation of that particular model for MLP. In contrast, RBF has unique training without repeated training simulations [71]. The small area of input space in RBF provides better approximation than MLP, which uses a sigmoid function, which consumes a larger space [42].
This study looks in the profundity of the comparison between the two models used for the prediction of PM 10 concentration on the next day. It was proven that for the non-linear model, in particular, RBF is a better model in the prediction of PM 10 concentration than the linear model, namely MLR, since the statistical indices of the first were methodically better contrasted with the ones of the reference model. The non-linear model was able in reducing the error of the models by 78.9% (Site 1), 73.8% (Site 2), 32.1% (Site 3) and 39.8% (Site 4). Interestingly, the non-linear model proved an increase in the accuracy of forecasting by 62.3% (Site 1), 48.2% (Site 2), 16.2% (Site 3) and 15.0% (Site 4). This study was able prove that there was an improvement of the non-linear model in terms of reducing the model's error and increasing the model's accuracy for PM 10 forecasting. Moreover, the execution of the RBF-ANN display is exceptionally practical, and in this manner, it can be considered for operational utilization. Results from previous, similar studies are in agreement with these findings, as shown in Table 8. It is proven that, the development of stepwise MLR model executed with high error and low accuracy as compared to the MLP model in the prediction of PM 10 and PM 2.5 [71]. The MLP model, if properly trained with suitable and optimum neuron numbers in the hidden layer, might provide a meaningful air quality prediction model. Another study proved that the ANN model is better in the prediction of particulate matter in Hong Kong, which is in agreement with our findings [72]. Prediction of PM 10 in Sakarya City in Turkey has proved that the nonlinear model (MLP) outperforms the linear model (MLR) with vast differences in the R 2 of 0.84 and 0.32, respectively [73]. Similarly, a previous study in the development of prediction model in industrial area confirmed that the MLP model is able to increase the accuracy by 29.9%, and can reduce the error by 69.3% as compared to the MLR model [35]. Another study in Malaysia clarified that there is an improvement of the linear model in R 2 as compared to nonlinear model [58]. Findings have shown that the nonlinear models outperform the linear model in the prediction of PM 2.5 at the USA-Mexico border. Furthermore, the RBF model is more suited for the prediction process as compared to MLP [74]. Great performance of non-linear models was because of the way that it can display very non-linear functions and can be prepared to precisely conclude when given new concealed data. Moreover, ANN does not require any earlier assumptions with respect to the conveyance of training data, and no decision with regards to the relative significance of the different info estimations should be made [75]. These highlights of the neural networks make them the alluring contrasting option when creating numerical models and picking between statistical methodologies. Moreover, they have some outstanding effective applications in tackling issues of different orders [36,42,76].

Conclusions
This study was able to prove that among the forecasting algorithms, the non-linear algorithm is better in forecasting the next day's PM 10 concentrations, with RBF giving the best prediction value. The RBF model was able to forecast PM 10 concentration successfully, explaining 93% and 92% (urban), 89% (suburban) and 83% (rural) of the variance in the data. Thus, we recommend the local authority and the DOE to use the RBF models in forecasting PM 10 concentration in areas without AQMS with similar land use for improving air quality at specific locations. Moreover, with this model, authorities can warn communities of dangerous levels of PM 10 sooner, so that they can reduce outdoor activities and decrease their exposure to unhealthy levels of air quality. This result can also be adopted at regions on the West Coast of Peninsular Malaysia and East Malaysia (Sabah and Sarawak), provided that they follow the methods highlighted in this study, using local meteorological and gaseous pollutants data.