Next Article in Journal
Slaughterhouse Wastewater Treatment by Integrated Chemical Coagulation and Electro-Fenton Processes
Previous Article in Journal
Leadership and Work Engagement Effectiveness within the Technology Era
Previous Article in Special Issue
Suitability of Selected Plant Species for Phytoremediation: A Case Study of a Coal Combustion Ash Landfill
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Improving Air Pollution Prediction Modelling Using Wrapper Feature Selection

by
Ahmad Zia Ul-Saufie
1,*,
Nurul Haziqah Hamzan
1,
Zulaika Zahari
1,
Wan Nur Shaziayani
1,
Norazian Mohamad Noor
2,
Mohd Remy Rozainy Mohd Arif Zainol
3,
Andrei Victor Sandu
4,5,6,*,
Gyorgy Deak
6 and
Petrica Vizureanu
4,7
1
Faculty of Computer and Mathematical Sciences, Universiti Teknologi MARA, Shah Alam 40450, Selangor, Malaysia
2
Faculty of Civil Engineering Technology, Universiti Malaysia Perlis, Kompleks Pengajian Jejawi 3, Arau 02600, Perlis, Malaysia
3
School of Civil Engineering, Engineering Campus, Universiti Sains Malaysia, Nibong Tebal 14300, Pulau Pinang, Malaysia
4
Faculty of Material Science and Engineering, Gheorghe Asachi Technical University of Iasi, 61 D. Mangeron Blvd., 700050 Iasi, Romania
5
Romanian Inventors Forum, St. P. Movila 3, 700089 Iasi, Romania
6
National Institute for Research and Development in Environmental Protection INCDPM, Splaiul Independentei 294, 060031 Bucharest, Romania
7
Technical Sciences Academy of Romania, Dacia Blvd 26, 030167 Bucharest, Romania
*
Authors to whom correspondence should be addressed.
Sustainability 2022, 14(18), 11403; https://doi.org/10.3390/su141811403
Submission received: 7 June 2022 / Revised: 5 September 2022 / Accepted: 7 September 2022 / Published: 11 September 2022

Abstract

:
Feature selection is considered as one of the essential steps in data pre-processing. However, all of the previous studies on predicting PM10 concentration in Malaysia have been limited to statistical method feature selection, and none of these studies used machine-learning approaches. Therefore, the objective of this research is to investigate the influence variables of the PM10 prediction model by using wrapper feature selection to compare the prediction model performance of different wrapper feature selection and to predict the concentration of PM10 for the next day. This research uses 10 years of daily data on pollutant concentrations from two stations (Klang and Shah Alam) obtained from the Department of Environment Malaysia (DOE) from 2009 until 2018. Six wrapper methods (forward selection, backward elimination, stepwise, brute-force, weight-guided and genetic algorithm evolution and the predictive analytics multiple linear regression (MLR) and artificial neural network (ANN)) were implemented in this study. This study found that brute-force is the dominant wrapper method in most of the best models in selecting important features for MLR. Moreover, compared to MLR, ANN provides more advantages regarding model accuracy and permits feature selection in predicting PM10. The overall results revealed that the RMSE value for next day prediction in Klang is 20.728, while the AE value is 15.69. Furthermore, the RMSE value for next day prediction in Shah Alam is 10.004, while the AE value is 7.982. Finally, all of the predicted models in Klang and Shah Alam can be used to predict the PM10 concentrations. This proposed model can be used as a tool for an early warning system in giving air quality information to local authorities in order to formulate air-quality-improvement strategies.

1. Introduction

Malaysia is an increasingly developed country. In line with this progress, there are plenty of advances in technology that indirectly contribute to air pollution. Moreover, open burning, power plants, motor vehicle emissions and industrial process emissions are the major sources of particulate matter less than or equal to 10 micrometers (PM10) in Malaysia [1].
In observing air quality, Malaysia has been following the Malaysian Ambient Air Quality Standard for allowable air pollutant levels. According to the Malaysian Ambient Air Quality Standard, the acceptable threshold levels of PM10 are 50 µg/m3 per year and 100 µg/m3 per 24 h, which are considered to be safe [2]. These particulate matters can become dissolved and absorbed into the bloodstream, which can later trigger serious biological effects. In addition, it is also one of the factors that cause lung cancer and cardiopulmonary deaths. Thus, in facing this hazardous situation, building optimized forecasting models of PM10 is the best solution in controlling these particle concentrations, and this also helps to prepare for the worst circumstances.
Feature selection is considered as one of the data pre-processing essential steps and is important in solving problems of high dimensionality dataset. This method is significant in discovering correlated features and in removing uncorrelated or redundant features from the original data set. By implementing the feature selection method, the performance of the model will be improved as this method will reduce the error by removing irrelevant and redundant features. However, all of the previous studies in Malaysia were only limited to statistical methods, such as backward, forward and stepwise analysis, and none of these studies uses machine-learning approaches, such as brute-force, weight-guided and GA evolution. Therefore, this study will investigate which approaches are better at selecting features in predicting the PM10 concentration.
Various methods have been used by previous researchers in predicting PM10 concentrations in Malaysia. For instance, a study by [3] determined the best loss function in boosted regression trees (BRT) for the prediction of the PM10 concentration in Alor Setar, Klang and Kota Bharu, Malaysia. A study conducted by [4] suggested that the prediction of PM10 concentrations can be made by considering the conditions of the previous day event. In China, [5,6] applied deep-learning-network models to predict air pollution. Most studies do not focus on optimizing the number of inputs in predicting the PM10 concentration. Therefore, this study investigates the optimal number of inputs and identifies the influence factors for which predictive analytics are suitable for predicting PM10 to compare the performances.
In summary, this study investigates which variables influence the PM10 concentration and which approaches are better in selecting features for predicting the PM10 concentration. Next, this study will investigate which predictive analytics for statistical and machine-learning methods are commonly used in predicting the PM10 and compare which method is best in predicting PM10.
According to [7], the goal of feature selection is to discover features that can precisely and concisely describe the original dataset and later generate new features based on the original dataset. Feature selection is a method using an algorithm or procedure to retain the most vital features and their application domain. Feature selection is beneficial in performance accuracy and complexity reduction as this method removes irrelevant features from the model. It also reduces the integration time and produces a simpler model, which is much easier to debug [8,9,10].
In machine learning, feature-selection techniques are mainly divided into supervised techniques and unsupervised techniques. The difference between these two techniques is whether to select features based on the target variable. The supervised techniques use the target variable in choosing its features. On the other hand, the unsupervised techniques ignore the target variable in selecting its features [11]. Filter methods, wrapper methods and embedded methods are among the feature-selection techniques.
A study conducted by Ibrahim et al. [12] aimed to compare the wrapper and filter methods to maximize the classifier accuracy. Correlation-based and information gain are the filter methods used in this study. The wrapper methods are sequential forward and sequential backward elimination. The study [12] applied the selected feature selection methods obtained from the UCI Machine Learning Repository to measure its performance and the datasets are Pima Indians Diabetes, Breast Cancer Wisconsin and Spam base.
As a result, all of the datasets showed that the wrapper method had higher significant features compared with the filter method. The results also indicated that the logistic regression performed the best with the highest accuracy, specificity and sensitivity using the wrapper methods features. Thus, based on the evidence provided by the previous study, the wrapper method performs better in selecting the features compared to the filter method.
Lastly, a gap in the study was regarding predicting PM10 concentrations using different types of features. The common method is still limited to statistical model approaches in feature selection, such as forward, backward and stepwise selection. When compared to overseas, none of the studies in Malaysia used machine-learning approaches, such as brute-force, weight-guided, or GA evolution for feature selection in predicting the air pollutant concentration. On the other hand, MLR is the most commonly used statistical method for predicting PM10 concentrations, whereas ANN is the most commonly used machine-learning method. Therefore, this study implements both common statistical and machine-learning approaches in selecting its features and uses MLR and ANN in predicting PM10 concentrations.

2. Materials and Methods

As shown in Figure 1, data acquisition, exploration, cleaning, transform and partitioning the data set are part of the data preparation. As for the feature selection, the partition data will be implemented in six wrapper methods, which are forward selection, backward elimination, stepwise, brute-force, weight-guided and genetic algorithm evolution. The significant variables obtained according to each method later will be used to develop predictive models, and MLR and ANN and will be evaluated using performance indicators. The performance of each model in MLR and ANN later will be compared and ranked according to its performance. The best model for each day of MLR and ANN will be compared again between the predictive analytics model. The best model obtained will be used to predict the concentration of PM10 for the next day.
This section consists of data acquisition, data exploration, data cleaning, data transform and partitioning the dataset. The data acquisition will explain the information of data and parameters included in this study. Second, this study will conduct descriptive analysis in data exploration. Third, data cleaning will explain the technique involved in imputing the data. Next, data transform will explain the transformation on the data before being analyzed. Lastly, data partitioning will explain the partition of the dataset.
As for the data acquisition, this research used ten years of daily data on pollutant concentrations from two stations obtained from the Department of Environment Malaysia (DOE) from 2009 until 2018. The stations included in this study are Klang station located at Sekolah Menengah Perempuan Raja Zarina, Klang and Shah Alam station located at Sekolah Kebangsaan TTDI Jaya, Shah Alam. These two stations were selected because they are surrounded by major roads that experience heavy traffic, particularly during the morning rush hour.
Based on the Exploratory Data Analysis (EDA), this analysis measured the central tendency, dispersion, skewness and graphical representation. This study measured the central tendency of the data by estimating the mean, mode and median. This study also computed the variance, standard deviation and range in measuring the dispersion. Moreover, this research evaluated the skewness to check for the probability distribution of the data.
In this study, linear interpolation and the series mean are used to impute the missing values in the data as suggested by others [13]. Linear interpolation is an interpolation method for single-dimensional data. This method estimates the data point value needed to be interpolated based on the two data points adjacent to that point in the single-dimensional data sequence. Equation (1) shows the formula and graph of linear interpolation. While the series means method was used to impute all missing values with the mean value of the data. Therefore, the data was imputed using linear interpolation first, and the rest was imputed using the series mean.
y = y i + ( x x i ) ( y 2 y 1 ) ( x 2 x 1 )
Based on the data retrieved, the readings of each parameter were recorded hourly. As this study predicts the PM10 concentration by day, the data is transformed into a daily format. This study used the average PM10 concentration of hourly data as the daily data. Next, the wind direction parameters in this study were split into two variables following [14]. The variables were the sinusoidal (sinWD) and the cosinusoidal (cosWD).
Before developing the model, the original dataset was divided into three datasets for training, validation and verification so that there would be new data to assess the model. The dataset collected for this study was divided chronologically into 80% for training data (2009–2016) and 20% for validation data (2016–2017). The training data was for estimating the predictive method parameters, while the validation data is for analyzing the accuracy. The proposed model was verified using the new dataset for 2018.

2.1. Feature Selection

Feature selection is the process of minimizing the number of input variables when building a predictive model [15]. In this research, the wrapper method was used to develop air pollution prediction modelling consisting forward selection, backward selection, stepwise selection, brute-force feature selection, Genetic Algorithm (GA) evolutionary and weight-guided.
Forward selection is a type of stepwise regression that begins with a null model. The approach initiates with no variables in the model and step by step adds variables to the model until no variable not included in the model can make a significant contribution to the model’s conclusion. The variable with the highest test statistic that is more than the cut-off value or the lowest p-value with less than the cut-off value is chosen and added to the model.
Backward elimination is the most basic approach to variable selection. This technique begins with a complete model that includes all of the variables in the model. Variables are subsequently removed from the whole model one by one until all remaining variables are sure to have a meaningful impact on the result. The variable with the lowest test statistic or the highest p-value more than the cut-off value is removed from the model. This procedure is repeated until every remaining variable is statistically significant at the cut-off value.
The stepwise selection method is the mixture of forward selection and backward elimination procedures that allow one to go in both directions while adding and eliminating variables at various stages. Forward selection and backward elimination can be applied to begin the process. If stepwise selection begins with forward selection, variables are added to the model one at a time according to the statistical significance. After each step, the model is analyzed. Any variable that is not significant will be removed from the model. The process repeats until every variable in the model is statistically significant.
If stepwise selection begins with backward elimination, the variables are removed from the full model based on statistical significance and then re-added if they show statistical significance afterward. Brute-force is a straightforward approach to solving a problem. It involves iterating through all possible features until the best feature selection is found. Brute-force feature selection tries every potential combination of the variables and provides the highest performing subset. The best subset is chosen by maximizing a defined performance metric in the presence of an arbitrary regressor or classifier. The algorithm will choose each combination and compute its score before selecting the optimal combination based on its score.
For this study, the number of possible subsets is calculated using the best subset regression formula, 2 p, where p equals 11 (the number of predictors). As a result, this method will generate 2048 subsets. Essentially, this method starts with the generation of possible subsets, beginning with one variable, two variables, three variables and so on until eleven predictors are generated. Each subset will have its own regression equation, which will be evaluated using the adjusted R square ( R ¯ 2 ). The reason for using R ¯ 2 rather than R2 to compare the performance of subsets is that R ¯ 2 values are often artificially inflated as more variables are chosen. The formula for R ¯ 2   is stated below:
R ¯ 2 = 1 ( 1 R 2 ) n 1 n p 1
where p is the number of predictors and n is the number of samples. The best subset, which contains the most significant factors to predict PM10 concentration, will be the subset with the highest R 2 value.
Next, GA evolution is a type of optimization technique that mimics the concepts of natural evolution. There are three basic concepts in this process, which are selection, crossover and mutation. The first step of an evolutionary algorithm is an initialization phase where it creates a population of air pollution models, each with their unique set of chromosomes. The chromosomes are binary strings; 1 means the feature is included, and 0 means the feature is excluded. The models for the starting population are randomly generated.
A good rule of thumb is to use between 5% and 30% of the total number of features as the population size [16]. For the second step, each model in the population will have their fitness calculated. Models with better fitness have a higher chance of being chosen for recombination. After calculating the fitness value, the third step is that the models will be selected randomly using the roulette wheel method and selected according to their fitness level. The number of selected models is half of the population size.
After selecting the models, the fourth step is the crossover, in which the selected models are recombined to create a new population. In this step, two models will be chosen at random and their features will be combined to produce offspring for the new population until the new population is the same size as the old one. In the crossover, offspring that are genetically identical to their parents may be produced, resulting in a low-diversity new generation. Therefore, in step five, the mutation is done by changing the value of some features in the offspring at random. Lastly, the process is looped to step two until a stopping criterion is met and the best feature selection is obtained.
Finally, the wrapper method used weight-guided via correlation. Equation (3) is the formula of the correlation. A weight is given to the variables, and the highest weight of the variable is considered.
r = n ( x y ( x ) ( y ) ) [ n x 2 ( x ) 2 ] [ n x 2 ( x ) 2 ]

2.2. Model Development and Model Evaluation

In this model development, the process of developing MLR and ANN models is explained to conduct predictive modelling. Figure 2 shows the process of the wrapper method in developing a predictive model.
There are four steps in developing a multiple linear regression (MLR) model. First, the development of the MLR model will be based on 80% of the data. Second, the assumption of the MLR models is checked using certain methods and tests, such as histograms and scatter plots. Next, the model is validated based on the performance indicator value using 20% of the data. Finally, the best model of MLR is obtained. The expected MLR models are as shown in Equation (4). The past PM10 daily average concentration was used to predict the next day’s PM10 concentration.
PM10,D+1 = β0 + β1PM10,D + β2COD + β3NO2,D + β4NOx,D + β5NOD + β6SO2,D +
β7RHD + β8TD + β9WSD + β10cosWDD + β11sinWDD
where
PM10,D+1 = Next day prediction of PM10 concentration.
PM10,D = Particulate matter (µg/m3).
COD = Carbon monoxides (ppm).
NO2,D = Nitrogen dioxide (ppm).
NOx,D = Nitrogen oxide (ppm).
NOD = Nitric oxide (ppm).
SO2,D = Sulfur dioxide (ppm).
RHD = Relative humidity (%).
TD = Temperature (°C).
WSD = Wind speed (km/h).
cosWDD = Cosine Wind direction (units).
sinWDD = Sine Wind direction (units).
β0 = regression constant.
β1, …, β11 = regression coefficient for each predictors used.
A feed forward backpropagation neural network (FFBP) was used in this study. The structure of FFBP was composed of three layers of neurons called the input, hidden and output layers. The first layer of neurons consisted of an input layer, representing independent variables. The input layer contained 12 independent variables—namely, O3, CO, NO2, SO2, NO, sinWD, cosWD, NOx, PM10, T, WS and RH. The second layer was the hidden layer, which is responsible for processing the input weight from the input layer and transferring it to the output layer. The third layer was the output layer, which represents the PM10,D+1 concentrations.
The maximum error used as a criterion for stopping was set at 0.05. In this study, the training process was set to 10,000 iterations or until the maximum error was reached, as suggested by [17]. As a network training function, Levenberg–Marquardt optimization was used to update weight and bias values. As sigmoid units are easier to train than other activation functions, [18] proposed using them. In this case, the layer size was 2 number of attributes + number of classes)/2 + 1 = 8 hidden nodes, as recommended by [19]. This study fitted models with varying learning rate lr (0.01) values, which [20] proposed for the study of air pollution datasets.
Furthermore, [21] stated that changing the momentum rate and learning rate from 0.05 to 1 had no effect on the training and prediction networks’ errors. Performance indicators in this research are used to identify the best method to predict the concentration of PM10,D+1. The Root Mean Square Error (RMSE), Normalized Absolute Error (NAE), Absolute Error (AE) and Relative Error (RE) are the error measures used to determine the error of the model, while the Coefficient of Determination (R2) is the accuracy measure used to determine the accuracy of the model outcome.
Regarding the model deployment, the dataset from 2009 until 2017 was used to produce prediction models. The prediction models were later deployed on the 2018 dataset. However, there were extreme outliers in ozone concentration for the 2018 dataset for the Klang station. Based on the 2009 until 2017 dataset, the maximum ozone concentration value was 0.056 ppb. Some of the ozone concentration in the 2018 dataset exceeded 0.7 ppb. This situation may happen due to technical errors. Therefore, a total of 29 data were removed before deployment.

3. Results and Discussion

3.1. Descriptive Analysis

Table 1 is the descriptive statistics of each parameter in Klang and Shah Alam, Selangor. Based on the table, the maximum concentration of PM10 was 551.542 µg/m3 (Klang) and 1332.814 µg/m3 (Shah Alam). The high concentration was taken on 25 of June 2013 for Klang, while Shah Alam happened during April 2017. Next, the skewness value of PM10 is 4.62 (Klang) and 17.11 (Shah Alam). Since the value is more than 1, the data of PM10 are highly skewed to the right. This may be due to the presence of extreme outliers in this data.
Figure 3 and Figure 4 below are heatmaps of the average PM10 concentrations according to the month and year for Klang and Shah Alam. The greenish parts have the lowest average concentration, and the reddish parts have the highest average concentration. Based on both Figures, it is indicated that October 2015 had the highest average concentration compared to the others. Moreover, the concentration of PM10 in September 2015 was the second-highest concentration. Referring to haze incidents in Malaysia, this supports the outline of the heatmap as there was a hazing incident in 2015 in August and September due to massive land and forest fires in Indonesia [22]. The high PM10 concentration in October 2015 may be due to the backlash of this incident.
In March 2014, there was a high concentration of PM10 in both locations where Klang was slightly higher than Shah Alam. It is also proved by the chronology of haze incidents in Malaysia as haze incidents happened between February and March 2014. This incident occurred due to forest and peatland fires. High PM10 concentration incidents were also detected in June 2013 from both locations. However, Klang had a higher concentration value compared to Shah Alam. This incident may be due to haze incidents that happened from 15 to 27 June 2013 [22]. In addition, both Figures show a low average of PM10 concentration starting from May until December 2017. This situation may be due to zero cases of haze incidents happening in 2017.
In conclusion, the heatmaps of both locations align with the haze chronology in Malaysia. The heatmaps also makes it easier to observe the condition of air pollution in Malaysia.

3.2. Correlation of PM10 Concentration with Other Parameters

Figure 5 and Figure 6 show the heatmap of the Spearman’s rank correlation coefficient (r) of the PM10 concentration with other parameters in Klang and Shah Alam, respectively. Figure 5 shows that all of the parameters have positive correlation with PM10 except for WD, RH and NO in Klang. It also indicates all of the parameters have moderate-to-very-weak correlations with the PM10 for Klang. CO has the highest correlation PM10 concentration with a positive moderate correlation (r = 0.498), while WD has the lowest correlation with a negative very weak correlation (r = −0.075).
Figure 6 shows that all of the parameters have a positive correlation with PM10 except for WD and RH in Shah Alam. All of the parameters have moderate, weak and very weak correlations with the PM10 concentration. It also indicates that NO2 has the highest correlation with PM10 concentration with a positive moderate correlation (r = 0.437), while WD has the lowest correlation with PM10 concentration with a positive very weak correlation (r = 0.151).

3.3. Performance Model and Feature Selection

Performance measures for this section used the validation dataset (20%). Table 2 shows that the performance of all MLR model was compared to find the best model for next day prediction. Based on Table 3, for Klang, brute-force has the lowest value of RMSE, AE, RE and NAE. The backward method has the highest R2 value compared to others.
Therefore, it is shown that brute-force is the best model for Klang as it had the lowest value of error and the lowest total rank with WS, RH, SO2, O3 and PM10 as the parameters selected to predict the next day’s PM10 concentration. Furthermore, the performance for all models in predicting PM10 in Shah Alam also shows that brute-force had the lowest error measures for RMSE, AE, RE and NAE and the highest accuracy for R2. Therefore, brute-force is the best model with T, RH, SO2, NO2, sinWD, NO and PM10 as the parameters selected to predict the PM10,D+1.
Referring to Table 4, the ticked table means that the parameter is selected in that model. For the best model PM10,D+1 for Klang (MLR-Brute-Force), RH, SO2, NO2, O3 and PM10 were analyzed to predict the next day for Klang. For the best model PM10,D+1 for Shah Alam (MLR-Brute -Force), T, RH, SO2, NO2, sinWD, NO and PM10 are the parameters selected to predict the next day for Shah Alam.
Based on Table 5, the performances of all ANN models for next-day prediction in Klang and Shah Alam were compared to determine the best model. For Klang, brute-force had the lowest value of RMSE and RE, and backward had the lowest value AE and highest R2 value compared to the others. Evolution had the lowest value of NAE. However, backward is the best model for Klang station as it had the lowest total rank compared to the others with T, RH, SO2, NO2, O3, sinWD, cosWD, NOx, NO and PM10 as the parameters selected to predict the PM10,D+1.
Next, the performance for all ANN models in predicting PM10,D+1 in Shah Alam shows that brute-force had the lowest value for RMSE, weight-guided had the lowest value for AE and RE, and forward had the lowest value for NAE and the highest value for R2. However, forward is the best model in predicting PM10,D+1 since it hd the lowest total rank with WS, NOX and PM10 as the parameters selected to predict the PM10,D+1 as shown in Table 6.
Referring to Table 7, the ticked (/) table means that the parameter is selected in that model. For the best model PM10,D+1 for Klang (ANN-Backward), RH, SO2, NO2, O3, PM10, sinWD, cosWD, NOX and NO were analyzed to predict the next day. For the best model PM10,D+1 for Shah Alam (ANN-Forward), WS, NOX and PM10 are the parameters selected to predict the next day for Shah Alam using ANN.
As a conclusion, brute-force is the best feature selection to predict the next day’s PM10 concentration in Klang and Shah Alam by using MLR, and the models fulfil the assumptions of MLR. The backward for Klang and forward for Shah Alam are the best feature selections for predicting the next day’s PM10 concentration using the ANN model.

3.4. The Best Model

The best model to predict the PM10,D+1 for each station was obtained by comparing the performance of models between MLR and ANN. For the overall performance, each predicted day shows that the ANN model had the best performance compared to the MLR model for both Klang and Shah Alam station. This result is supported with Table 8 as the ANN model for each predicted day for both stations shows the lowest total score. In Klang, ANN with backward elimination is the best model selected, while for Shah Alam, ANN with forward selection is the best model.
Furthermore, Table 9 summarizes the comparison results with other research. This indicates that the results in this study are similar to those in other studies. Regression is involved with linear dependencies, whereas neural networks are involved with nonlinearities. As a result, if the data contains nonlinear dependencies, neural networks should outperform regression.
According to studies [23,24,25], the ANN method predicts the dependent variable more accurately than MLR. Although ANN is regarded as a powerful technique for non-linear models [26], some researchers have used and reported on this linear model better than the regression model [27,28,29]. This showed that our ANN model can be used to predict PM10 concentrations since it improved the performance of the model.
Table 9. Performance indicators results gained from other research.
Table 9. Performance indicators results gained from other research.
AuthorsMethodResult
[30]MLR R 2 = 0.347–0.614
[31]MLRRMSE = 126.728–164.978
NAE = 0.325–0.429
PA = 0.359–0.668
[32]MLR R 2 = 0.3239
[33]MLR R 2 = 0.586–0.715
This StudyANN-Forward
ANN-Backward
MLR-Brute-Force
R2 = 0.63–0.74
RMSE = 12.33–15.08

3.5. Model Verification

For the model verification, the dataset from 2009 until 2017 was used to develop prediction models. The proposed prediction models were used to predict the PM10 concentration using the 2018 dataset [30,31,32,33,34,35]. Figure 7 and Figure 8 below show line charts of the observed and predicted values of the PM10 concentration in Shah Alam and Klang.
This predictive model used ANN with backward elimination using RH, SO2, NO2, O3, PM10, sinWD, cosWD, NOX and NO as a parameters in Klang. For the best model PM10,D+1 for Shah Alam (ANN-Forward), WS, NOX and PM10 are the parameters selected to predict the next day for Shah Alam using ANN.
Figure 7 shows the comparison of the line chart between the observed and predicted value for PM10,D+1 for model ANN-forward selection and predicted ANN using all variables. Referring to the line chart, it shows that, on average, the observed and predicted values have a slight gap. Most of the prediction values exceed the observed value; however, in some cases, the observed value exceeds the prediction value. Furthermore, the enter method has a large gap since, in 2018, there was a slight increase in the value of ozone, causing the prediction using all parameters to be higher than the observed value. Therefore, it shows that the predicted values of the PM10 concentration were not notably affected by the ozone concentration.
Overall, the values of RMSE and AE of this model are 20.728 and 15.69, respectively. Hence, this model can be used for unseen data since there is no huge difference between the observed and predicted values. Figure 8 shows the comparison of the line chart between the observed and predicted values for PM10,D+1 for the model ANN-forward selection and predicted ANN using all variables. Referring to the line chart, it shows that, on average, the observed and predicted values have a minimum gap between each other with the value of RMSE at 10.004 and value of AE at 7.982. Most of the prediction values exceed the observed value, and in only a few cases does the observed value exceed the prediction value. Hence, this model can be used for unseen data since there is no great difference between the observed and predicted values.s
The prediction error in Klang is higher than in Shah Alam because the industrial area of Klang suffers from severe haze, while Shah Alam is only a residential area. Furthermore, if all variables based on previous studies are selected to predict PM10 concentrations for the next day, it will take more time to determine the best model and reduce the maintenance data cost for the future.

4. Conclusions

In this study, the wrapper methods of six different feature selections were analyzed and compared to determine the best feature selection method. The methods included were forward, backward, stepwise, brute-force, weight-guided and GA evolution. These methods were analyzed together with the predictive analytics methods MLR and ANN. The performance of the models determined the best model to predict the next day. This study found that the best feature selections were backward elimination, forward selection and brute-force in predicting the PM10 concentration in Malaysia.
Based on the results, the best feature selection method to predict the PM10,D+1 in Klang was the backward method with the parameters T, RH, SO2, NO2, O3, PM10, sinWD, cosWD, NOX and NO. For Shah Alam, the best feature selection method to predict PM10,D+1 was the forward method with the parameters WS, NOX and PM10.
The prediction of the ANN model for PM10,D+1 was deployed in the 2018 dataset. Based on the line chart in Figure 7 and Figure 8, the gaps between the observed and predicted lines show a minimum difference. The RMSE value in Klang for PM10,D+1 was 20.728, while the AE value was 15.69. In addition, the line chart of observed and predicted of each predicted day in Shah Alam also shows a minimum gap between each line with the RMSE value for PM10,D+1 of 10.004, while the AE value was 7.982. In conclusion, all of the predicted models in Klang and Shah Alam can be used for unseen data.
There are a few recommendations for improving the performance of air pollution modelling that can be suggested to other researchers. This study used the cross-sectional method, and for future research, we suggest using time series, since the time-series forecasting method is better at predicting extreme events compared to the cross-sectional method. Apart from the MLR and ANN models, a new approach can be implemented to the predicted modelling to forecast the concentration of PM10 using machine-learning methods, such as long short-term memory (LSTM), gated recurrent units (GRU) and deep learning [36].
Other methods, aside from wrapper methods, can be applied to conduct feature selection for air pollution modelling, such as the filter method, embedded method and hybrid method. Hence, various approaches to predicted modelling and feature selection methods for air pollution modelling will be beneficial as they will produce better results. In addition, predictions for other particulate matter, such as PM2.5, should be made since the DOE began to include PM2.5 in calculating the API from 2018. In addition, PM2.5 is more dangerous since the size of the particles is smaller compared to PM10. Therefore, predicting PM2.5 may help to improve the performance of air pollution modelling. Lastly, this output can be used by the authorities as it will be helpful to reduce the impact of air pollutants.
For example, the DOE’s prediction of air pollutants can be used for early alertness to help in performing the relevant procedures. Hopefully, this recommendation will help improve air pollution modelling and help the authorities to pay early attention to the air pollutants in Malaysia. The limitation of this research is that the model can only be used when the sources and conditions of the characteristics of PM10 remain the same. Therefore, it may not be suitable for the other locations. For instance, if there is a sudden forest fire or storm in a selected area, this would affect the PM10 concentration.

Author Contributions

A.Z.U.-S. and W.N.S. designed the study concept and secured funding. A.Z.U.-S. is the project administrator. N.H.H., Z.Z. and A.Z.U.-S. performed the data analysis. N.H.H., Z.Z. and W.N.S. wrote the manuscript. A.Z.U.-S., W.N.S., N.M.N. and M.R.R.M.A.Z. reviewed and edited the manuscript, A.V.S., G.D. and P.V. data curation and validation of research. All authors have read and agreed to the published version of the manuscript.

Funding

The research was funded by Ministry of Science, Technology & Innovation (MOSTI) under Technology Development Fund 1 (TDF04211363). Thank you to Faculty of Computer and Mathematical Sciences, Universiti Teknologi MARA for their support and also thanks to the Department of Environment Malaysia for providing air quality monitoring data. This publication was also supported by TUIASI from the University Scientific Research Fund (FCSU).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data for this project are confidential but may be obtained with Data Use Agreements with the Department of Environment (DOE), Ministry of Environment and Water of Malaysia.

Acknowledgments

The authors thank Faculty of Computer and Mathematical Sciences, Universiti Teknologi MARA for their support and also the Department of Environment Malaysia for providing air quality monitoring data.

Conflicts of Interest

The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

References

  1. Department of Environment, Malaysia (DOE); Info Umum Kualiti Udara Kronologi Episod Jerebu di Malaysia. Malaysia Environmental Quality Report; Department of Environment Ministry of Energy, Science, Technology, Environment & Climate Change, Malaysia: Kuala Lumpur, Malaysia, 2014.
  2. Department of Environment, Malaysia (DOE); Info Umum Kualiti Udara Kronologi Episod Jerebu di Malaysia. Malaysia Environmental Quality Report; Department of Environment Ministry of Energy, Science, Technology, Environment & Climate Change, Malaysia: Kuala Lumpur, Malaysia, 2018.
  3. Shaziayani, W.N.; Ul-Saufie, A.Z.; Ahmat, H.; Al-Jumeily, D. Coupling of quantile regression into boosted regression trees (BRT) technique in forecasting emission model of PM10 concentration. Air Qual. Atmos. Health 2021, 14, 1647–1663. [Google Scholar] [CrossRef]
  4. Mohamad, N.S.; Deni, S.M.; Ul-Saufie, A.Z. Application of the First Order of Markov Chain Model in Describing the PM10 Occurrences in Shah Alam and Jerantut, Malaysia. Pertanika J. Sci. Technol. 2018, 26, 367–378. [Google Scholar]
  5. Du, S.; Li, T.; Yang, Y.; Horng, S.J. Deep Air Quality Forecasting Using Hybrid Deep Learning Framework. IEEE Trans. Knowl. Data Eng. 2021, 33, 2412–2424. [Google Scholar] [CrossRef]
  6. Yan, R.; Liao, J.; Yang, J.; Sun, W.; Nong, M.; Li, F. Multi-hour and multi-site air quality index forecasting in Beijing using CNN, LSTM, CNN-LSTM, and spatiotemporal clustering. Expert Syst. Appl. 2020, 169, 114513. [Google Scholar] [CrossRef]
  7. Zhou, H.; Han, S.; Liu, Y. A novel feature selection approach based on document frequency of segmented term frequency. IEEE Access 2018, 6, 53811–53821. [Google Scholar] [CrossRef]
  8. Towards Data Science. An Introduction to Feature Selection. 2020. Available online: https://towardsdatascience.com/an-introduction-to-feature-selection-dd72535ecf2b (accessed on 2 February 2022).
  9. Sukatis, F.F.; Noor, N.M.; Zakaria, N.A.; Ul-Saufie, A.Z.; Suwardi, A. Estimation of missing values In Air Pollution Dataset by Using Various Imputation Methods. Int. J. Conserv. Sci. 2019, 10, 791–804. [Google Scholar]
  10. Shaziayani, W.N.; Harun, F.D.; Ul-Saufie, A.Z.; Samsudin, N.; Noor, N.M. Three-Days Ahead Prediction of Daily Maximum Concentrations of PM10 Using Decision Tree Approach. Int. J. Conserv. Sci. 2021, 12, 217–224. [Google Scholar]
  11. Zhou, Z.; Liu, H. Spectral feature selection for supervised and unsupervised learning. In Proceedings of the 24th International Conference on Machine Learning, Corvallis, OR, USA, 20–24 June 2007; pp. 1151–1157. [Google Scholar] [CrossRef]
  12. Ibrahim, N.; Hamid, H.A.; Rahman, S.; Fong, S. Feature selection methods: Case of filter and wrapper approaches for maximising classification accuracy. Pertanika J. Sci. Technol. 2018, 26, 329–340. [Google Scholar]
  13. Libasin, Z.; Suhailah, W.; Fauzi, W.M.; Ul-Saufie, A.Z.; Idris, N.A.; Mazeni, N.A. Evaluation of Single Missing Value Imputation Techniques for Incomplete Air Particulates Matter (PM10) Data in Malaysia. Pertanika J. Sci. Technol. 2021, 29, 3099–3112. [Google Scholar] [CrossRef]
  14. Kukkonen, J.; Partanen, L.; Karppinen, A.; Ruuskanen, J.; Junninen, H.; Kolehmainen, M.; Niska, H.; Dorling, S.; Chatterton, T.; Foxall, R.; et al. Extensive evaluation of neural network models for the prediction of NO2 and PM10 concentrations, compared with a deterministic modelling system and measurements in central Helsinki. Atmos. Environ. 2003, 37, 4539–4550. [Google Scholar] [CrossRef]
  15. Brownlee, J. How to Choose a Feature Selection Method For Machine Learning. Machine Learning Mastery. 2020. Available online: https://machinelearningmastery.com/feature-selection-with-real-and-categorical-data/ (accessed on 20 August 2020).
  16. Jain, S. Genetic Algorithm | Application of Genetic Algorithm. Analytics Vidhya. 2017. Available online: https://www.analyticsvidhya.com/blog/2017/07/introduction-to-genetic-algorithm/ (accessed on 15 June 2022).
  17. Shafie, A.S.; Masrom, S.; Ahmad, N. Improved Neural Network Backpropagation with Genetic Algorithm Based Parameter Tuning for Classification Problem; Research Report; Universiti Teknologi Mara: Alam, Malaysia, 2010. [Google Scholar]
  18. Kamruzzaman, J.; Aziz, S.M. A Note on Activation Function in Multilayer Feedforward Learning. In Proceedings of the 2002 International Joint Conference on Neural Networks, Honolulu, HI, USA, 12–17 May 2002; pp. 519–523. [Google Scholar]
  19. RapidMiner. RapidMiner Documetation. 2022. Available online: https://docs.rapidminer.com/latest/studio/operators/modeling/predictive/neural_nets/neural_net.html (accessed on 5 March 2022).
  20. Guo, C.; Liu, G.; Chen, C.H. Air Pollution Concentration Forecast Method Based on the Deep Ensemble Neural Network. Wirel. Commun. Mob. Comput. 2020, 2020, 8854649. [Google Scholar] [CrossRef]
  21. Al-Rashed, A.; Al-Mutairi, N.; Boureslli, A. Prediction of air pollution in al-hmadi city using artificial neural network (Ann). J. Environ. Treat. Tech. 2020, 8, 1390–1399. [Google Scholar] [CrossRef]
  22. Department of Environment, Malaysia. Malaysia Environmental Quality Report. 2015. Available online: https://www.doe.gov.my/ (accessed on 1 January 2022).
  23. Adielsson, S. Statistical and Neural Networks Analysis of Pesticide Losses to Surface Water in Small Agricultural Catchments in Sweden. Master’s Thesis, Sweden University, Uppsala, Sweden, 2005. [Google Scholar]
  24. Miao, Y.; Mulla, D.J.; Robert, P.C. Identifying important factors influencing corn yield and grain quality variability using artificial neural networks. Precis. Agric. 2006, 7, 117–135. [Google Scholar] [CrossRef]
  25. Pastor, O. Unbased sensitivity analysis and pruning techniques in ANN for surface ozone modeling. Ecol. Model. 2005, 182, 149–158. [Google Scholar] [CrossRef]
  26. Starett, S.K.; Najjar, Y.; Adams, S.G.; Hill, J. Modeling pesticide leaching from golf courses using artificial neural networks. Commun. Soil Sci. Plant Anal. 1998, 29, 3093–3106. [Google Scholar] [CrossRef]
  27. Lek, S.; Delacoste, M.; Baran, P.; Dimopoulos, I.; Lauga, J.; Aulagnier, A. Application of neural networks to modeling nonlinear relationships in ecology. Ecol. Model. 1996, 90, 39–52. [Google Scholar] [CrossRef]
  28. Lek, S.; Delacoste, M.; Baran, P.; Dimopoulos, I.; Lauga, J.; Aulagnier, S. Comparing discriminant analysis, neural networks and logistic regression for predicting species distributions: A case study with a Himalayan river bird. Ecol. Model. 1999, 120, 337–347. [Google Scholar]
  29. Ozesmi, L.S.; Ozesmi, U. An artificial neural network appr oach to spatial habitat modeling with interspecific interaction. Ecol. Model. 1999, 116, 15–31. [Google Scholar] [CrossRef]
  30. Mansor, A.A.; Abdullah, S.; Dom, N.C.; Napi, N.N.L.M.; Ahmed, A.N.; Ismail, M.; Zulkifli, M.F.R. Three-Hour-Ahead of Multiple Linear Regression (MLR) Models for Particulate Matter (PM10) Forecasting. Int. J. Des. Nat. Ecodyn. 2021, 16, 53–59. Available online: http://iieta.org/journals/ijdne (accessed on 5 May 2022). [CrossRef]
  31. Abdullah, S.; Ismail, M.; Ahmed, A.N.; Abdullah, A.M. Forecasting particulate matter concentration using linear and non-linear approaches for air quality decision support. Atmosphere 2019, 10, 667. [Google Scholar] [CrossRef]
  32. Ceylan, Z.; Bulkan, S.E.R.O.L. Forecasting PM10 levels using ANN and MLR: A case study for Sakarya City. Glob. Nest J. 2018, 20, 281–290. [Google Scholar]
  33. Fong, S.Y.; Abdullah, S.; Ismail, M. Forecasting of Particulate Matter (PM10) concentration based on gaseous pollutants and meteorological factors for different monsoons of urban coastal area in Terengganu. J. Sustain. Sci. Manag. 2018, 13, 3–17. [Google Scholar]
  34. Comite, V.; Pozo-Antonio, J.S.; Cardell, C.; Rivas, T.; Randazzo, L.; La Russa, M.F.; Fermo, P. Environmental Impact Assessment on the Monza Cathedral (Italy): A Multi-Analytical Approach. Int. J. Conserv. Sci. 2020, 11 (SI1), 291–304. [Google Scholar]
  35. Cazacu, M.M.; Pelin, V.; Radinschi, I.; Sandu, I.; Ciocan, V.; Sandu, I.G.; Gurlui, S. Effects of Meteorological Factors on the Hydrophobization of Specific Calcareous Geomaterials From Repedea—Iasi Area, Under the Urban Ambiental Air Exposure. Int. J. Conserv. Sci. 2020, 11, 1019–1030. [Google Scholar]
  36. Wu, B.; Wang, L.; Zeng, Y.R. Interpretable wind speed prediction with multivariate time series and temporal fusion transformers. Energy 2022, 252, 123990. [Google Scholar] [CrossRef]
Figure 1. Research work flow.
Figure 1. Research work flow.
Sustainability 14 11403 g001
Figure 2. Process of the wrapper method in model development.
Figure 2. Process of the wrapper method in model development.
Sustainability 14 11403 g002
Figure 3. Average PM10 concentration by month and year for Klang.
Figure 3. Average PM10 concentration by month and year for Klang.
Sustainability 14 11403 g003
Figure 4. Average PM10 concentration by month and year for Shah Alam.
Figure 4. Average PM10 concentration by month and year for Shah Alam.
Sustainability 14 11403 g004
Figure 5. Correlation between he pParameters and PM10 for Klang.
Figure 5. Correlation between he pParameters and PM10 for Klang.
Sustainability 14 11403 g005
Figure 6. Correlation between the parameters and PM10 for Shah Alam.
Figure 6. Correlation between the parameters and PM10 for Shah Alam.
Sustainability 14 11403 g006
Figure 7. Klang line chart of the observed and predicted PM10,D+1 for model ANN-forward selection and predicted ANN using all variables.
Figure 7. Klang line chart of the observed and predicted PM10,D+1 for model ANN-forward selection and predicted ANN using all variables.
Sustainability 14 11403 g007
Figure 8. Shah Alam line chart of the observed and predicted PM10,D+1 for model ANN-forward selection and predicted ANN using all variables.
Figure 8. Shah Alam line chart of the observed and predicted PM10,D+1 for model ANN-forward selection and predicted ANN using all variables.
Sustainability 14 11403 g008
Table 1. Descriptive statistics for Klang and Shah Alam.
Table 1. Descriptive statistics for Klang and Shah Alam.
Mean Median Standard
Deviation
SkewnessMinimum Maximum
WS (km/h)5.385.802.034.890.4752.03
WD (°) 176.78168.8346.050.57942.17318.17
T (°C)28.6228.601.52−0.1222.5633.12
KRH (%)69.7469.627.070.0146.4694.11
LNOX (ppm)0.0370.0350.02925.60<0.011.14
ANO (ppm)0.0160.0140.02831.6<0.011.15
NSO2 (ppm)0.0040.0040.0022.25<0.010.03
GNO2 (ppm)0.021060.0210.0070.280.0010.05
O3 (ppm)0.0190.0180.0070.78<0.010.06
CO (ppm)1.0290.9670.4122.380.156.21
PM 10 ( μ g / m 3 )62.7556.1632.524.6217.18551.54
WS (km/h)4.8284.7751.9080.1980.31416.529
SWD (°) 157.15153.7152.760.509.75337.17
HT (°C)28.1028.091.320.0223.3133.29
ARH (%)77.3577.466.11−0.055494.63
HNOX (ppm)0.040.040.020.480.0040.12
NO (ppm)0.020.020.011.11<0.0010.09
ASO2 (ppm)0.0030.0030.0021.74<0.0010.02
LNO2 (ppm)0.020.020.010.450.0020.06
AO3 (ppm)0.020.020.010.86<0.0010.08
MCO (ppm)0.760.730.301.180.093.69
PM 10 ( μ g / m 3 )50.9145.3334.8617.1111.921332.81
Table 2. Performance models of MLR for Klang and Shah Alam for the validation.
Table 2. Performance models of MLR for Klang and Shah Alam for the validation.
LocationMethodRMSEAERENAER2
KlangForward14.22410.29217.97%0.5960.709
Backward14.18610.44718.17%0.6050.713
Stepwise14.22410.29217.97%0.5960.709
Brute-Force13.69410.00517.52%0.5960.707
Weight-Guided16.2711.08919.21%0.6580.591
Evolution14.36710.70118.84%0.6330.665
Shah AlamForward15.17610.93223.99%0.7090.598
Backward15.2311.34624.77%0.7360.502
Stepwise15.17610.93223.99%0.7090.598
Brute-Force12.3389.55921.77%0.6620.632
Weight-Guided13.41710.37125.92%0.7220.596
Evolution14.53011.41126.84%0.7820.531
Table 3. Ranking performance models of MLR for Klang and Shah Alam.
Table 3. Ranking performance models of MLR for Klang and Shah Alam.
LocationMethodRMSEAERENAER2Total Rank
KlangForward3231211
Backward2342112
Stepwise3221210
Brute-Force111137
Weight-Guided5564525
Evolution4453420
Shah AlamForward4322213
Backward5434521
Stepwise4322213
Brute-Force111115
Weight-Guided2243314
Evolution3555422
Table 4. Selected features of MLR for Klang and Shah Alam.
Table 4. Selected features of MLR for Klang and Shah Alam.
LocationMethodWSTRHSO2NO2O3COSin
WD
Cos
WD
NOXNOPM10
KlangForward // /
Backward/ ///// // /
Stepwise / /
Brute-Force // / /
Weight-Guided /
Evolution/ / / / /
Shah AlamForward/ / / / /
Backward/ / /
Stepwise / / ///////
Brute-Force/ / /
Weight-Guided //// / //
Evolution /
Table 5. Performance models of ANN for Klang and Shah Alam for validation.
Table 5. Performance models of ANN for Klang and Shah Alam for validation.
LocationMethodRMSEAERENAER2
KlangForward15.51711.06719.57%0.5880.701
Backward15.08510.27217.06%0.5740.742
Stepwise15.51711.06719.57%0.5880.701
Brute-Force14.13910.28216.90%0.6050.701
Weight-Guided17,28811.77918.64%0.6990.581
Evolution14.22810.33517.92%0.5630.732
Shah AlamForward13.4489.31718.93%0.5940.715
Backward14.0719.59520.19%0.6470.596
Stepwise12.9669.50522.30%0.650.691
Brute-Force12.2529.13820.75%0.6140.687
Weight-Guided12.4638.84018.72%0.6150.615
Evolution14.00310.00521.83%0.6450.657
Table 6. Ranking the performance models of ANN for Klang and Shah Alam.
Table 6. Ranking the performance models of ANN for Klang and Shah Alam.
LocationMethodRMSEAERENAER2Total Rank
KlangForward4443318
Backward312219
Stepwise4443318
Brute-Force1214311
Weight-Guided5555424
Evolution2331211
Shah AlamForward4321111
Backward6535625
Stepwise3466221
Brute-Force1242312
Weight-Guided2113512
Evolution5654424
Table 7. Selected features of ANN for Klang and Shah Alam.
Table 7. Selected features of ANN for Klang and Shah Alam.
LocationMethodWSTRHSO2NO2O3COSin
WD
Cos
WD
NOXNOPM10
Shah
Alam
Forward/ / /
Backward /// // //
Stepwise/
Brute-Force/// // / /
Weight-Guided /
KlangForward / / /
Backward ///// /////
Stepwise / / /
Brute-Force/ // ///
Weight-Guided /
Evolution / /// // /
Table 8. Performance ranking of the best model.
Table 8. Performance ranking of the best model.
StationModelMethodRMSEAERENAER2Total
KlangMLRBrute-Force112228
ANNBackward221117
Shah AlamMLRBrute-Force122229
ANNForward211116
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Ul-Saufie, A.Z.; Hamzan, N.H.; Zahari, Z.; Shaziayani, W.N.; Noor, N.M.; Zainol, M.R.R.M.A.; Sandu, A.V.; Deak, G.; Vizureanu, P. Improving Air Pollution Prediction Modelling Using Wrapper Feature Selection. Sustainability 2022, 14, 11403. https://doi.org/10.3390/su141811403

AMA Style

Ul-Saufie AZ, Hamzan NH, Zahari Z, Shaziayani WN, Noor NM, Zainol MRRMA, Sandu AV, Deak G, Vizureanu P. Improving Air Pollution Prediction Modelling Using Wrapper Feature Selection. Sustainability. 2022; 14(18):11403. https://doi.org/10.3390/su141811403

Chicago/Turabian Style

Ul-Saufie, Ahmad Zia, Nurul Haziqah Hamzan, Zulaika Zahari, Wan Nur Shaziayani, Norazian Mohamad Noor, Mohd Remy Rozainy Mohd Arif Zainol, Andrei Victor Sandu, Gyorgy Deak, and Petrica Vizureanu. 2022. "Improving Air Pollution Prediction Modelling Using Wrapper Feature Selection" Sustainability 14, no. 18: 11403. https://doi.org/10.3390/su141811403

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop