Next Article in Journal
An Integrated System Dynamics Model and Life Cycle Assessment for Cement Production in South Africa
Next Article in Special Issue
Mixture Regression for Clustering Atmospheric-Sounding Data: A Study of the Relationship between Temperature Inversions and PM10 Concentrations
Previous Article in Journal
Are Adaptation Measures Used to Alleviate Heat Stress Appropriate to Reduce Ammonia Emissions?
Previous Article in Special Issue
Prediction of Monthly PM2.5 Concentration in Liaocheng in China Employing Artificial Neural Network
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Towards On-Site Implementation of Multi-Step Air Pollutant Index Prediction in Malaysia Industrial Area: Comparing the NARX Neural Network and Support Vector Regression

Faculty of Engineering, Universiti Malaysia Sabah, Kota Kinabalu 88400, Sabah, Malaysia
*
Author to whom correspondence should be addressed.
Atmosphere 2022, 13(11), 1787; https://doi.org/10.3390/atmos13111787
Submission received: 30 September 2022 / Revised: 23 October 2022 / Accepted: 26 October 2022 / Published: 29 October 2022

Abstract

:
Malaysia has experienced public health issues and economic losses due to air pollution problems. As the air pollution problem keeps increasing over time, studies on air quality prediction are also advancing. The air quality prediction can help reduce air pollution’s damaging impact on public health and economic activities. This study develops and evaluates the Nonlinear Autoregressive Exogenous (NARX) Neural Network and Support Vector Regression (SVR) for multi-step Malaysia’s Air Pollutant Index (API) prediction, focusing on the industrial areas. The performance of NARX and SVR was evaluated on four crucial aspects of on-site implementation: Input pre-processing, parameter selection, practical predictability limit, and robustness. Results show that both predictors exhibit almost comparable performance, in which the SVR slightly outperforms the NARX. The RMSE and R2 values for the SVR are 0.71 and 0.99 in one-step-ahead prediction, gradually changing to 6.43 and 0.68 in 24-step-ahead prediction. Both predictors can also perform multi-step prediction by using the actual (non-normalized) data, hence are simpler to be implemented on-site. Removing several insignificant parameters did not affect the prediction performance, indicating that a uniform model can be used at all air quality monitoring stations in Malaysia’s industrial areas. Nevertheless, SVR shows more resilience towards outliers and is also stable. Based on the trends exhibited by the Malaysia API data, a yearly update is sufficient for SVR due to its strength and stability. In conclusion, this study proposes that the SVR predictor could be implemented at air quality monitoring stations to provide API prediction information at least nine steps in advance.

1. Introduction

Air pollution is a global issue that threatens the public health and economic activities of the worldwide population [1,2,3]. Without exception, Malaysia has experienced public health issues and economic losses due to air pollution problems [4,5]. Research by Tajudin et al. [6] reported that two air pollutants, namely Nitrogen Dioxide (NO2) and Ozone (O3), have an immediate effect on hospital admissions related to cardiovascular disease in Kuala Lumpur. Meanwhile, Ab Manan et al. [7] stated that the haze episode in 2013 cost Malaysians approximately MYR 410 million, accumulated from the medical expenses and income opportunity losses due to medical leave. Thus, the air pollution problem must be appropriately addressed to minimize its health effects. One solution is to predict air quality in advance. Knowing the air quality in advance can help the local administration issue early warning alerts to the residents so they can plan their activities accordingly.
Malaysia uses the Air Pollutant Index (API) to determine air quality. Malaysia, through APIMS (Air Pollutant Index Management System), has yet to develop a mechanism to predict API values in advance. There are, however, several apps that can provide the forecasted air quality index (AQI) for Malaysian cities; one such is Plume Labs: Air Quality Apps. This app uses real-time data from the Malaysia Department of Environment (MDOE) to predict future AQI, but its accuracy is questionable. A brief comparison between the actual AQI for the Kuala Lumpur region provided by IQAir with the values predicted API by Plume Labs for 24 h (from 1 a.m., 28 August 2022 to 12 a.m., 29 August 2022) is plotted in Figure 1. The plots disagree, with large differences and an R2 value of −0.2300. The low R2 value indicates that the prediction made by Plume Labs has an accuracy issue.
Researchers around the world have proposed many air quality prediction methods [8,9,10,11,12]. Among them, a technique based on the Nonlinear Autoregressive Exogenous (NARX) Neural Network was found superior in many publications. A study by Gündoğdu [13] established that NARX outperforms Multilayer Perceptron (MLP) in the one-step-ahead prediction of Particulate Matter 10 (PM10) and Sulphur Dioxide (SO2) concentrations. The RMSE values for NARX prediction of PM10 and SO2 concentrations were 0.0191 and 0.0070, respectively, while MLP produced values of 0.0489 and 0.1121. Concurrently, NARX prediction of PM10 and SO2 produced R2 values of 0.9773 and 0.9984, while MLP produced values of 0.8530 and 0.6048. In another study, a popular machine learning algorithm called the Support Vector Machine (SVM) was used to predict the monthly average PM10 concentration seven months in advance [14]. The prediction performance was compared to MLP, Autoregressive Integrated Moving-Average (ARIMA), and Vector Autoregressive Moving-Average (VARMA). The results showed that SVM performs better than the other methods in one-step ahead and multi-step ahead predictions. The one-step-ahead prediction performances of SVM, ARIMA, MLP, and VARMA measured by RMSE were 2.061, 2.283, 3.432, and 3.451, respectively. Meanwhile, for multi-step ahead prediction, the RMSE of SVM was 1.990, followed by ARIMA (2.453), VARMA (3.121), and MLP (3.408).
A study employed NARX and SVM to predict the Air Quality Index (AQI) and concluded that NARX was better than SVM in one-step-ahead prediction [15]. The NARX gave an R2 value of 0.9701, in contrast with SVM, which gave 0.8891. Another study compared the one-step-ahead prediction performance of NARX and SVM, amongst other methods, to predict PM2.5 concentrations [16]. They concluded that NARX has better prediction performance than SVM, with R2 and RMSE values of 0.99 and 0.72, respectively, while SVM gave 0.70 and 5.75.
Despite the superiority of NARX over SVM reported in the latter two publications, Kumar et al. [17] proved that SVM outperformed NARX in hourly wind speed prediction. The prediction performance measured by Mean Squared Error (MSE) was 52.32 for SVM and 56.43 for NARX. Leong et al. [18] also achieved excellent API prediction using the SVM model. The research was conducted using the air quality data from 2009 to 2014 collected at eight monitoring stations in northern Malaysia. Prediction performance was measured in the R2 value, and the SVM method achieved an R2 of 0.9843 for one-step-ahead prediction. The superiority of NARX over other methods motivates this research to evaluate its performance in predicting the API in Malaysia’s industrial areas. Since the SVM method was also proven to have excellent prediction performance using the Malaysia API, it will be evaluated and compared to NARX.
At present, scholars are more interested in proposing new methods to predict air quality [19,20,21,22]. Often, studies use the one-step-ahead prediction performance to evaluate the superiority of the proposed methods. We believe the evaluation should not stop at only comparing the prediction accuracy but rather extend it as if the proposed methods will be implemented on-site. Issues that might affect the prediction performance from the perspective of actual on-site implementation, such as input normalization, input parameters, practical predictability limit, and robustness, should be evaluated.
This paper addresses these four on-site implementation issues by comparing the performance of two established predictors, the NARX and SVM for regression (SVR). A careful analysis was designed and performed for each issue, providing valuable insight to researchers proposing new prediction methods. Apart from that, the outcomes of this study will make suggestions on how a multi-step-ahead API predictor for Malaysia API monitoring stations in industrial areas should be developed.

2. Materials and Methods

2.1. Study Area

Industrial activity is one of the major sources of air pollution [23,24]. Approximately 85% of air pollution in Malaysia comes from power plants emission [25]. Accordingly, this research focuses on air quality in three renowned industrial areas in Malaysia: TTDI Jaya, Larkin, and Pasir Gudang (Figure 2).
These industrial areas are located nearby or surrounded by residential areas with a more than 1.2 million total population. The TTDI Jaya is in the Shah Alam district of Selangor. It is situated nearby Saujana Indah and the Hicom-Glenmarie industrial park, among many other industrial areas. Food, cosmetics, and machinery are among the products manufactured in this industrial area. Larkin and Pasir Gudang are in Johor Bharu, south of peninsular Malaysia. The Larkin industrial area houses factories for plastic and metal fabrication, food products, glass manufacturing, electronic components, and mechanical machines. Most of the companies operating in the Pasir Gudang industrial area are heavy industries. This includes shipbuilding, palm oil storage and distribution, transportation and logistics, petrochemical, and construction.

2.2. Data Pre-Analysis and Treatment

The air quality data collected in 2018 and 2019 at these three industrial areas were provided by the Malaysia Department of Environment (MDOE). Each dataset contains hourly air quality parameters of Nitrogen Dioxide (NO2), Ozone (O3), Particulate Matter 2.5 (PM2.5), Particulate Matter 10 (PM10), Sulphur Dioxide (SO2), Carbon Monoxide (CO), and API. The hourly meteorological parameters, such as the ambient temperature (T), wind direction (WD), and wind speed (WS), were also provided in each dataset. A pre-analysis of the 2018 API parameter shows that the series does not exhibit seasonality for all three monitoring stations. The API values fluctuated randomly, mainly within the moderate level (50 to 100), with a maximum of 77 points and a minimum of 39 points. It can be concluded that the 2018 data represent the typical air quality in the three monitoring stations. Similar variations were observed in most parts of the 2019 data, except between September and November, when Malaysia was hit by a severe haze caused by the regional and transboundary haze from Indonesia. During the haze episode, the API reached an unhealthy level (101 to 200) and a very unhealthy level (201 to 300) for several weeks at the three monitoring stations.
Some missing values and outliers (less than 3.5%) were found in the raw air quality data provided by the MDOE. For the purposes of developing an optimized predictor, the missing values and outliers were replaced by the interpolated values using the Linear Interpolation Imputation method [26,27]. The Linear Interpolation Imputation method is explained by Equation (1), where f(x) is the interpolated value of the missing value and the outlier x is the point at which the interpolation is performed. Variables x0 and x1 are the known values before and after the missing value, respectively.
f(x) = f(x0) + (f(x1)f(x0))/(x1x0) (xx0)
The outliers were determined by comparing them with the median data. The values that are more than three Median Absolute Deviations (MADs) away from the median value were replaced [28]. The scaled MAD is defined by Equation (2) where xa is the average of the past values and xi is the past values for each time step in the dataset.
Scaled MAD = (−1)/(√2 × erf cinv (3/2)) × median (|xi − xa |)
Table 1 presents the data range and the correlation between each air quality parameter to the API for the three monitoring stations. The PM10 and PM2.5 parameters show quite an obvious correlation with the API parameter compared to the other parameters in all three monitoring stations.

2.3. Multi-Step Ahead Predictor

Three common strategies can be adapted in machine learning to perform multi-step-ahead prediction: Recursive, direct, and multiple outputs. The recursive strategy is the simplest and requires a single model with a single output. In the recursive approach, the predicted output at (t) is fed back as input to predict the output at (t + 1). Then the predicted output at (t + 1) is fed back as input to predict the output at (t + 2). The process continues until the desired step is achieved. The direct strategy requires n models to predict the outputs at (t + 1) to (t + n). Each model has a single output and is trained to predict a specific number of steps ahead of the output. Hence, ten models will be developed if the system wants to predict one to ten steps ahead. In many studies, the direct strategy produced more accurate multi-step ahead predictions [29,30]. On the other hand, a single model with n outputs is utilized in the multiple-outputs strategy to predict the (t + 1) to (t + n) values.
This paper employed the direct strategy to obtain the multi-step ahead prediction. In this study, 24 optimized models were used to obtain the hourly 1- to 24-step-ahead predictions, equivalent to a day-ahead prediction.

2.3.1. The Nonlinear Autoregressive Exogenous (NARX) Neural Network Model

NARX is a dynamic neural network with recurrent input fed by the feedback connection encircling the network layers [31]. A two-layer feed-forward NARX network that consists of a hidden layer and an output layer was used in this research. The sigmoidal transfer function is used as the hidden layer’s transfer function, and the linear function was employed in the output layer. The NARX feedback connection was removed, making it a complete open-loop feed-forward network.
The inputs of the NARX model consist of the currently available air quality and meteorological parameters (NO2, O3, PM2.5, PM10, SO2, CO, API, T, WD, and WS), while the output is the predicted future API values. Two hidden neurons were used in the hidden layer, determined by analysis in a preliminary study [32]. The NARX model employed the Levenberg–Marquardt algorithm for training. A total of 24 NARX models were developed and trained to obtain 1- to 24-step ahead prediction. Each unit in the 24 models was built from the s-step predictor depicted in Figure 3.

2.3.2. The Support Vector Regression (SVR) Model

The Support Vector Machine (SVM) is a supervised machine learning approach widely used to solve classification problems [33]. The SVM can also be used to solve regression problems to predict discrete values and is usually referred to as Support Vector Regression (SVR). In SVR, a margin of tolerance known as epsilon is introduced to solve regression problems, which is the tolerated error for the SVR [34]. Similar to the classification problem, a kernel function was applied in SVR to solve the dimensional problem of nonlinear data. The well-tested kernel functions are Medium Gaussian, Coarse Gaussian, Fine Gaussian, Cubic, Quadratic, and Linear.
Figure 4 shows the SVR model used to perform the multi-step ahead prediction. The SVR inputs were fed with the currently available air quality and meteorological parameters, and the output was set to the s-step-ahead API value. The C and epsilon parameters were set to a default value during the training and testing stages. The default value of the C is set to the estimated value of the standard deviation using the interquartile range of the response variable y (the real API), while the default value of the epsilon is set to one-tenth of the C value. Twenty-four SVR models with the Linear kernel were employed using the direct approach to obtain the 24-step-ahead API prediction.

2.4. Performance Indicator

RMSE and R2 were used to assess the prediction performance of the NARX and SVM models. RMSE explains the prediction error or the difference between the predicted and the actual value of API. The R2 value represents the ratio of the variation in the predicted API value that can be explained by the linear association between the actual and predicted API values and the total variation of the predicted API value. Equations (3) and (4) define the RMSE and R2, respectively.
RMSE = 1 N t = 1 N ( P t T t ) 2
R 2 = ( 1 N t = 1 N ( P t P ¯ ) ( T t T ¯ ) σ P σ T ) 2
Based on the equations, Pt is the predicted API while P ¯ is its mean, Tt is the actual value of API while T ¯ is its mean, N is the number of data points used in the measurement, σ P is the standard deviation of the predicted API, and σ T is the standard deviation for the actual value of API.

3. Results and Discussion

This study embarked on the following research questions:
  • Is input normalization required?
  • Which input parameters are important, and how do they affect the prediction performance?
  • How far can reasonable prediction be performed?
  • Which model is more robust?
The analyses were performed using 175,200 (10 parameters × 2 years × 365 days × 24 h) data, divided into training and testing in a ratio of 80 to 20. A large training dataset will reduce the risk of overfitting. However, during the model optimization process, the RMSE and R2 for the training data were compared with the testing data to avoid overfitting or underfitting the model. The presented RMSE and R2 values in the following subsections were obtained from the testing data.

3.1. Input Normalization

Each air quality and meteorological data collected at the three monitoring stations have values with differing scales, which may affect the prediction performance [35]. Applying normalization is suggested when dealing with such data [22,36]. However, the normalization approach depends on the machine learning architecture and the specific application [37]. Input normalization, if required, will introduce additional computational burdens and must be estimated correctly [38,39], and is tricky in real applications. In addition, the prediction values must be converted back to the original scale for reporting.
Considering those, an analysis was conducted to verify the need for input normalization. Here, the prediction performance of both predictors using normalized and raw data (non-normalization) was observed. Z-score normalization, or standardization, was performed on the data [40]. The z-score is calculated using Equation (5), where x is a data point in a feature with the mean x ¯ and standard deviation S.
z = ( x x ¯ ) / S
The RMSE and R2 values obtained by both predictors in all three monitoring stations are tabulated in Table 2. As expected, the RMSE value for normalized data is much smaller due to data rescaling and is not an accurate performance indicator. However, the smaller RMSE values obtained by SVR indicate that it is a better predictor than NARX. Further, it can be observed that the R2 values scored by SVR and NARX are almost identical, implying that both predictors can predict aptly using raw data (non-normalized data). Performing input normalization/standardization seems unnecessary as it does not affect the prediction performance.

3.2. Input Parameters

We study the possibility that the API prediction can be performed using fewer parameters to reduce the computational burden. For this purpose, two prediction performance analyses were conducted using two different combinations of input parameters. The first analysis used all ten air quality and meteorological parameters, while the second analysis used selected parameters only. Parameter selection was performed using the Neighborhood Component Analysis (NCA). The NCA detects the relevant and irrelevant parameters in the data by learning the feature weights in an objective function that measures the training data regression loss [41]. The NCA results for the Pasir Gudang showed five relevant parameters: PM2.5, CO, WD, WS, and T. Three parameters, namely PM10, CO, and WD, were found to be relevant for the TTDI Jaya data. Meanwhile, all parameters were found to be relevant to the Larkin data. Figure 5 shows the NCA results for the three stations.
Table 3 lists the RMSE and R2 for one-step-ahead predictions of NARX and SVR for Pasir Gudang and TTDI Jaya stations using all parameters and only the relevant parameters, respectively. Using the relevant parameters seems to reduce the prediction error for Pasir Gudang but not for TTDI Jaya, using both NARX and SVR models. The negligible difference in the results proves that including all parameters, although unnecessary, will not hinder the prediction performance. This finding indicates that a universal predictor with a uniform structure can be built at every monitoring station in Malaysia without having to perform a preliminary analysis to obtain the relevant input parameters. A universal predictor with a uniform structure is preferred for easy installation at all stations.

3.3. Practical Predictibality Limit

The multi-step prediction performance in R2 values for NARX and SVR predictors is tabulated in Table 4 and plotted in Figure 6. This analysis was performed using all ten non-normalized air quality parameters. From the plot, the prediction performance of both predictors decreases as the prediction steps progress, where the R2 values gradually fall from 0.99 in one-step-ahead prediction to 0.68 in 24-step-ahead prediction. From the R2 values, the NARX and SVR recorded a comparable performance for one- to six-step-ahead prediction. Beyond six-step-ahead prediction, NARX performs better prediction in all three stations. The SVR shows a more stable prediction for all 24-step-ahead predictions, whereby NARX’s performance fluctuates. However, SVR recorded smaller RMSE values for all 24-step-ahead predictions for all three monitoring stations, compared to the NARX predictor (Table 5). This finding proves that the SVR is a better predictor.
The real and predicted API values for different R2 values were plotted and observed to determine how far both predictors could reliably predict API. Findings from the three monitoring stations show that an R2 of at least 0.90 can be considered sensible. As shown in Figure 7 for TTDI Jaya as an example, the real and predicted API values are closer with an R2 of 0.90 or higher and deviate with an R2 below 0.90. From Table 4, by using an R2 of 0.90 as the lower limit, it can be derived that the NARX model can predict up to ten steps ahead while the SVR model can predict up to nine steps ahead.

3.4. Robustness

We also analyze the NARX and SVR predictors for their ability to perform reliable multi-step-ahead prediction for an extended duration to find which model requires frequent retraining and which is less susceptible to outliers. For this analysis, the NARX and SVR predictors were trained and validated using the 2018 data in the ratio of 80 to 20. The optimized predictors were tested using the 2019 data, and the RMSE values for one- to eight-step-ahead prediction, according to month, were computed. Results show that the SVR predictor produces more accurate and stable multi-step-ahead predictions than the NARX predictor in all three monitoring stations. SVR also seems more robust and makes better predictions during the haze episodes than NARX, hence is less susceptible to outliers. The same observation was also seen in all three monitoring stations.
Using Pasir Gudang data as an example (Figure 8), the RMSE values for one-step-ahead to eight-step-ahead predictions, calculated according to the respective month, are given in Table 6 and visualized in Figure 9. The results show that the SVR predictor recorded smaller RMSE values every month. Both predictors produced the worst performance for September; however, the SVR performed at least twice as good the NARX model. The worst prediction performance was due to the irregular trend of API values in September, where the API values rapidly increased and reached unhealthy status due to a haze episode that occurred during that time. Both predictors did not learn this trend during training resulting in lower prediction accuracy. The Larkin and TTDI Jaya data also exhibited a similar trend.

4. Conclusions

The present study developed two multi-step-ahead API predictors based on NARX and SVR using Malaysia air quality data collected at three renowned industrial areas. Both predictors were evaluated for their ability to perform multi-step-ahead API prediction using the air quality parameters NO2, O3, PM2.5, PM10, SO2, CO, and API and meteorological parameters T, WD, and WS. The analyses reveal that both predictors show comparable performance in multi-step API prediction, with the SVR slightly outperforming the NARX.
The SVR predictor can also perform multi-step prediction by using the actual (non-normalized) data, hence it is simpler to implement in actual applications. For uniformity, all air quality and meteorological parameters can be included as the predictor’s inputs, as removing some parameters did not affect prediction performance. This finding indicates that a uniform SVR predictor can be installed in all air quality monitoring stations in Malaysia’s industrial areas. Regarding robustness and the need for frequent retraining, SVR is also better than NARX as it shows more resilience towards outliers and is also stable. As Wang and Han [42] recommended, a predictor developed offline must be updated periodically to match the latest trends. However, based on the trends exhibited by the Malaysia API data, a yearly update is sufficient for SVR due to its resilience and stability. Based on the results, this study proposes that the SVR predictor could be applied practically to enhance MDOE service quality by providing API prediction information in advance.
As we advance, the SVR predictor should be immune to missing or false data for the API prediction to be reliable and without interruption. Thus, future research should focus on finding a supporting mechanism to provide continuous and valid data in case such a problem happens on-site. On the other hand, adaptive machine learning could be explored and adopted to deal with outliers.

Author Contributions

Conceptualization, R.M. and M.M.; methodology, R.M. and M.M.; software, R.M.; validation, R.M., M.M. and H.T.Y.; formal analysis, R.M. and M.M.; investigation, R.M.; resources, R.M.; data curation, R.M. and M.M.; writing—original draft preparation, R.M.; writing—review and editing, R.M., M.M. and H.T.Y.; visualization, R.M.; supervision, M.M. and H.T.Y.; project administration, M.M.; funding acquisition, M.M. and H.T.Y. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by Universiti Malaysia Sabah, grant number SBK0466-2021 and the Ministry of Higher Education, Fundamental Research Grant Scheme, FRGS/1/2020/TK0/UMS/02/2.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Acknowledgments

The authors would like to thank the Malaysian Department of Environmental (MDOE) for the air quality data provided for this research.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Landrigan, P.J. Air Pollution and Health. Lancet Public Health 2017, 2, e4–e5. [Google Scholar] [CrossRef] [Green Version]
  2. Shaddick, G.; Thomas, M.L.; Mudu, P.; Ruggeri, G.; Gumy, S. Half the World’s Population Are Exposed to Increasing Air Pollution. Npj Clim. Atmos. Sci. 2020, 3, 23. [Google Scholar] [CrossRef]
  3. Taghizadeh-Hesary, F.; Taghizadeh-Hesary, F. The Impacts of Air Pollution on Health and Economy in Southeast Asia. Energies 2020, 13, 1812. [Google Scholar] [CrossRef] [Green Version]
  4. Hanafi, N.H.; Hassim, M.H.; Noor, Z.Z.; Ten, J.Y.; Aris, N.M.; Jalil, A.A. Economic Losses Due to Health Hazards Caused by Haze Event in Johor Bahru, Malaysia. In Proceedings of the 7th Conference on Emerging Energy and Process Technology, Johor Bahru, Malaysia, 27–28 November 2018. E3S Web Conf. 2019, 90, 01009. [Google Scholar] [CrossRef]
  5. Usmani, R.S.A.; Saeed, A.; Abdullahi, A.M.; Pillai, T.R.; Jhanjhi, N.Z.; Hashem, I.A.T. Air Pollution and Its Health Impacts in Malaysia: A Review. Air Qual. Atmos. Health 2020, 13, 1093–1118. [Google Scholar] [CrossRef]
  6. Tajudin, M.A.B.A.; Khan, M.F.; Mahiyuddin, W.R.W.; Hod, R.; Latif, M.T.; Hamid, A.H.; Rahman, S.A.; Sahani, M. Risk of Concentrations of Major Air Pollutants on the Prevalence of Cardiovascular and Respiratory Diseases in Urbanized Area of Kuala Lumpur, Malaysia. Ecotoxicol. Environ. Saf. 2019, 171, 290–300. [Google Scholar] [CrossRef]
  7. Ab Manan, N.; Abdul Manaf, M.R.; Hod, R. The Malaysia Haze and Its Health Economic Impact: A Literature Review. Malays. J. Public Health Med. 2018, 18, 38–45. [Google Scholar]
  8. Shaban, K.B.; Kadri, A.; Rezk, E. Urban Air Pollution Monitoring System with Forecasting Models. IEEE Sens. J. 2016, 16, 2598–2606. [Google Scholar] [CrossRef]
  9. Lin, K.; Jing, L.; Wang, M.; Qiu, M.; Ji, Z. A Novel Long-Term Air Quality Forecasting Algorithm Based on KNN and NARX. In Proceedings of the ICCSE 2017—12th International Conference on Computer Science and Education, Houston, TX, USA, 22–25 August 2017; pp. 343–348. [Google Scholar] [CrossRef]
  10. Mohebbi, M.R.; Jashni, A.K.; Jashni, K.; Dehghani, M.; Hadad, K. Short-Term Prediction of Carbon Monoxide Concentration Using Artificial Neural Network (NARX) without Traffic Data: Case Study: Shiraz City. Iran. J. Sci. Technol. Trans. Civ. Eng. 2018, 43, 533–540. [Google Scholar] [CrossRef]
  11. Kang, G.K.; Gao, J.Z.; Chiao, S.; Lu, S.; Xie, G. Air Quality Prediction: Big Data and Machine Learning Approaches. Int. J. Environ. Sci. Dev. 2018, 9, 8–16. [Google Scholar] [CrossRef]
  12. Zhou, Y.; Chang, F.J.; Chang, L.C.; Kao, I.F.; Wang, Y.S.; Kang, C.C. Multi-Output Support Vector Machine for Regional Multi-Step-Ahead PM2.5 Forecasting. Sci. Total Environ. 2019, 651, 230–240. [Google Scholar] [CrossRef] [PubMed]
  13. Gündoğdu, S. Comparison of Static MLP and Dynamic NARX Neural Networks for Forecasting of Atmospheric PM10 and SO2 Concentrations in an Industrial Site of Turkey. Environ. Forensics 2020, 21, 363–374. [Google Scholar] [CrossRef]
  14. García Nieto, P.J.; Sánchez Lasheras, F.; García-Gonzalo, E.; de Cos Juez, F.J. PM10 Concentration Forecasting in the Metropolitan Area of Oviedo (Northern Spain) Using Models Based on SVM, MLP, VARMA and ARIMA: A Case Study. Sci. Total Environ. 2018, 621, 753–761. [Google Scholar] [CrossRef] [PubMed]
  15. Wang, L.; Bai, Y. Research on Prediction of Air Quality Index Based on NARX and SVM. Appl. Mech. Mater. 2014, 602–605, 3580–3584. [Google Scholar] [CrossRef]
  16. Delavar, M.; Gholami, A.; Shiran, G.; Rashidi, Y.; Nakhaeizadeh, G.; Fedra, K.; Hatefi Afshar, S. A Novel Method for Improving Air Pollution Prediction Based on Machine Learning Approaches: A Case Study Applied to the Capital City of Tehran. ISPRS Int. J. Geo-Inf. 2019, 8, 99. [Google Scholar] [CrossRef] [Green Version]
  17. Kumar, V.; Pal, Y.; Tripathi, M.M. SVM Tuned NARX Method for Wind Speed Power Prediction in Electricity Generation. In Proceedings of the 8th IEEE Power India International Conference (PIICON 2018), Kurukshetra, India, 10–12 December 2018; pp. 1–6. [Google Scholar] [CrossRef]
  18. Leong, W.C.; Kelani, R.O.; Ahmad, Z. Prediction of Air Pollution Index (API) Using Support Vector Machine (SVM). J. Environ. Chem. Eng. 2019, 8, 103208. [Google Scholar] [CrossRef]
  19. Jiang, X.; Wei, P.; Luo, Y.; Li, Y. Air Pollutant Concentration Prediction Based on a CEEMDAN-FE-BiLSTM Model. Atmosphere 2021, 12, 1452. [Google Scholar] [CrossRef]
  20. Muthukumar, P.; Nagrecha, K.; Comer, D.; Calvert, C.F.; Amini, N.; Holm, J.; Pourhomayoun, M. PM2.5 Air Pollution Prediction through Deep Learning Using Multisource Meteorological, Wildfire, and Heat Data. Atmosphere 2022, 13, 822. [Google Scholar] [CrossRef]
  21. He, Z.; Guo, Q.; Wang, Z.; Li, X. Prediction of Monthly PM2.5 Concentration in Liaocheng in China Employing Artificial Neural Network. Atmosphere 2022, 13, 1221. [Google Scholar] [CrossRef]
  22. Wei, F.; Zhu, R.; Jerry Chun, W.L. An air quality prediction model based on improved Vanilla LSTM with multichannel input and multiroute output. Expert Syst. Appl. 2022, 211, 118422. [Google Scholar] [CrossRef]
  23. Raffee, A.F.; Rahmat, S.N.; Hamid, H.A.; Jaffar, M.I. The Behavior of Particulate Matter (PM10) Concentrations at Industrial Sites in Malaysia. Int. J. Integr. Eng. 2019, 11, 214–222. [Google Scholar] [CrossRef] [Green Version]
  24. Azid, A.; Juahir, H.; Toriman, M.E.; Endut, A.; Kamarudin, M.K.A.; Rahman, M.N.A.; Hasnam, C.N.C.; Saudi, A.S.M.; Yunus, K. Source Apportionment of Air Pollution: A Case Study in Malaysia. J. Teknol. 2015, 72, 83–88. [Google Scholar] [CrossRef] [Green Version]
  25. Sentian, J.; Herman, F.; Yih, C.Y.; Hian Wui, J.C. Long-Term Air Pollution Trend Analysis in Malaysia. Int. J. Environ. Impacts Manag. Mitig. Recover. 2019, 2, 309–324. [Google Scholar] [CrossRef]
  26. Mohamed Noor, N.; Al Bakri Abdullah, M.M.; Yahaya, A.S.; Ramli, N.A. Filling Missing Data Using Interpolation Methods: Study on the Effect of Fitting Distribution. Key Eng. Mater. 2014, 594–595, 889–895. [Google Scholar] [CrossRef]
  27. Mohamed Noor, N.; Al Bakri Abdullah, M.M.; Yahaya, A.S.; Ramli, N.A. Comparison of Linear Interpolation Method and Mean Method to Replace the Missing Values in Environmental Data Set. Mater. Sci. Forum 2015, 803, 278–281. [Google Scholar] [CrossRef]
  28. Fitriyah, H.; Budi, A.S. Outlier Detection in Object Counting Based on Hue and Distance Transform Using Median Absolute Deviation (MAD). In Proceedings of the 2019 4th International Conference on Sustainable Information Engineering and Technology (SIET 2019), Lombok, Indonesia, 28–30 September 2019; pp. 217–222. [Google Scholar] [CrossRef]
  29. Mamat, M.; Samad, S.A. Comparison of Iterative and Direct Approaches for Multi-Steps Ahead Time Series Forecasting Using Adaptive Hybrid-RBF Neural Network. In Proceedings of the IEEE Region 10 Annual International Conference, Fukuoka, Japan, 21–24 November 2010; pp. 2332–2337. [Google Scholar] [CrossRef]
  30. López Pouso, Ó.; Jumaniyazov, N. Direct versus iterative methods for forward-backward diffusion equations. Numerical comparisons. SeMA 2021, 78, 271–286. [Google Scholar] [CrossRef]
  31. Boussaada, Z.; Curea, O.; Remaci, A.; Camblong, H.; Bellaaj, N.M. A Nonlinear Autoregressive Exogenous (NARX) Neural Network Model for the Prediction of the Daily Direct Solar Radiation. Energies 2018, 11, 620. [Google Scholar] [CrossRef]
  32. Mustakim, R.; Mamat, M. The Nonlinear Autoregressive Exogenous Neural Network Performance in Predicting Malaysia Air Pollutant Index. Trans. Sci. Technol. 2021, 8, 305–310. [Google Scholar]
  33. Cortes, C.; Vapnik, V. Support-Vector Network. Mach. Learn. 1995, 20, 273–297. [Google Scholar] [CrossRef]
  34. Drucker, H.; Surges, C.J.C.; Kaufman, L.; Smola, A.; Vapnik, V. Support Vector Regression Machines. Adv. Neural Inf. Process. Syst. 1997, 1, 155–161. [Google Scholar]
  35. Falocchi, M.; Zardi, D.; Giovannini, L. Meteorological Normalization of NO2 Concentrations in the Province of Bolzano (Italian Alps). Atmos. Environ. 2021, 246, 118048. [Google Scholar] [CrossRef]
  36. Bengio, Y.; Goodfellow, I.; Courville, A. Deep Learning; MIT Press: Cambridge, MA, USA, 2016. [Google Scholar]
  37. Platt, J.A.; Penny, S.G.; Smith, T.A.; Chen, T.-C.; Abarbanel, H.D.I. A Systematic Exploration of Reservoir Computing for Forecasting Complex Spatiotemporal Dynamics. Neural Netw. 2022, 153, 530–552. [Google Scholar] [CrossRef] [PubMed]
  38. Passalis, N.; Tefas, A.; Kanniainen, J.; Gabbouj, M.; Iosifidis, A. Deep Adaptive Input Normalization for Time Series Forecasting. IEEE Trans. Neural Netw. Learn. Syst. 2020, 31, 3760–3765. [Google Scholar] [CrossRef] [Green Version]
  39. Passalis, N.; Kanniainen, J.; Gabbouj, M.; Iosifidis, A.; Tefas, A. Forecasting Financial Time Series Using Robust Deep Adaptive Input Normalization. Signal Process. Syst. 2021, 93, 1235–1251. [Google Scholar] [CrossRef]
  40. Gupta, M.; Wadhvani, R.; Rasool, A. Real-Time Change-Point Detection: A Deep Neural Network-Based Adaptive Approach for Detecting Changes in Multivariate Time Series Data. Expert Syst. Appl. 2022, 209, 118260. [Google Scholar] [CrossRef]
  41. Djerioui, M.; Brik, Y.; Ladjal, M.; Attallah, B. Neighborhood Component Analysis and Support Vector Machines for Heart Disease Prediction. Ing. Yst. d’Inform. 2019, 24, 591–595. [Google Scholar] [CrossRef]
  42. Wang, Y.; Han, L. Adaptive Time Series Prediction and Recommendation. Inf. Process. Manag. 2021, 58, 102494. [Google Scholar] [CrossRef]
Figure 1. Predicted and actual AQI values for the Kuala Lumpur region for 24 h.
Figure 1. Predicted and actual AQI values for the Kuala Lumpur region for 24 h.
Atmosphere 13 01787 g001
Figure 2. The location of the industrial areas.
Figure 2. The location of the industrial areas.
Atmosphere 13 01787 g002
Figure 3. The NARX model for s-multi-step predictions.
Figure 3. The NARX model for s-multi-step predictions.
Atmosphere 13 01787 g003
Figure 4. The SVR model for s-multi-step predictions.
Figure 4. The SVR model for s-multi-step predictions.
Atmosphere 13 01787 g004
Figure 5. The parameter weights for each monitoring station based on the NCA for the parameter selection process.
Figure 5. The parameter weights for each monitoring station based on the NCA for the parameter selection process.
Atmosphere 13 01787 g005
Figure 6. Multi-step prediction performance of NARX and SVR.
Figure 6. Multi-step prediction performance of NARX and SVR.
Atmosphere 13 01787 g006
Figure 7. The actual and predicted API for different R2 values.
Figure 7. The actual and predicted API for different R2 values.
Atmosphere 13 01787 g007
Figure 8. API values for Pasir Gudang in the years 2018 and 2019.
Figure 8. API values for Pasir Gudang in the years 2018 and 2019.
Atmosphere 13 01787 g008
Figure 9. The performance of NARX and SVR, calculated monthly for the 2019 data.
Figure 9. The performance of NARX and SVR, calculated monthly for the 2019 data.
Atmosphere 13 01787 g009
Table 1. Data range and correlation between the parameters and API.
Table 1. Data range and correlation between the parameters and API.
2018
Parameter NO2PM10PM2.5SO2COO3WD.WSTAPI
TTDI JayaCorrelation0.1850.5490.5520.0500.1920.111−0.045−0.0520.1541.000
Min0.0001.0090.0890.0000.1040.0000.0000.00020.21727.525
Max0.070168.490153.8000.0303.5100.140359.8707.92036.220154.000
LarkinCorrelation0.3040.5860.6010.1580.1800.1150.026−0.0210.0601.000
Min0.0000.7180.1200.0000.0500.0000.0000.00021.42312.456
Max0.070172.660163.5500.0202.7600.150359.97022.79035.39091.760
Pasir GudangCorrelation0.3230.5790.5760.3190.1890.0930.084−0.0750.2241.000
Min0.0001.3400.1120.0000.2830.0000.0000.00022.50516.000
Max0.077199.950181.4500.0204.2300.120359.8905.58035.920101.000
2019
Parameter NO2PM10PM2.5SO2COO3WD.WSTAPI
TTDI JayaCorrelation0.0840.5110.533−0.0100.2490.036−0.0180.0260.0681.000
Min0.0002.3150.0460.0000.0440.0000.0000.00021.28233.000
Max0.060264.640252.3100.0203.2000.160359.9305.95036.270221.000
LarkinCorrelation0.3540.6740.6860.1670.2800.148−0.080−0.1870.1361.000
Min0.0003.7280.0900.0000.0520.0000.0000.00021.84326.000
Max0.070294.900264.7200.0203.2500.120359.9504.75035.280171.000
Pasir GudangCorrelation0.3990.6790.6950.4580.2610.137−0.037−0.0170.1861.000
Min0.0001.19400.08500.0000.0770.0000.0000.00023.89718.000
Max0.070173.90161.6300.0102.9000.110359.9406.25037.550143.00
Table 2. The RMSE and R2 values using non-normalized and normalized data.
Table 2. The RMSE and R2 values using non-normalized and normalized data.
Monitoring StationNARXSVR
Non-NormalizedNormalizedNon-NormalizedNormalized
RMSER2RMSER2RMSER2RMSER2
Pasir Gudang1.18000.99220.12820.99230.71060.99600.07870.9959
Larkin1.23220.98830.15650.98860.71350.99480.09130.9948
TTDI Jaya1.57970.98770.18420.98880.89140.99380.10250.9938
Table 3. The RMSE and R2 for all and relevant parameters.
Table 3. The RMSE and R2 for all and relevant parameters.
PredictorMonitoring StationAll ParametersRelevant Parameters
RMSER2RMSER2
NARXPasir Gudang1.18000.99221.19260.9922
TTDI Jaya1.57970.98771.31860.9898
SVRPasir Gudang0.71060.99600.72370.9959
TTDI Jaya0.89140.99380.88640.9939
Table 4. NARX and SVR multi-step-ahead prediction performance (in R2 values).
Table 4. NARX and SVR multi-step-ahead prediction performance (in R2 values).
Step Ahead PredictionPasir GudangLarkinTTDI Jaya
NARXSVRNARXSVRNARXSVR
10.99230.99590.98860.99480.98880.9938
20.98780.99090.98150.98830.98290.9866
30.98080.98410.97370.97940.96450.9771
40.97380.97580.96640.96890.96610.9676
50.96560.96710.95560.95700.95600.9564
60.95780.95730.94690.94430.94330.9446
70.95080.94740.93830.93080.93760.9323
80.94440.93660.92480.91710.93010.9196
90.93410.92600.91840.90240.92140.9067
100.92360.91510.90850.88770.91250.8934
110.91320.90420.89450.87310.90160.8789
120.90520.89240.88780.85770.89220.8664
130.89190.88120.85950.84390.88110.8526
140.88990.86990.86890.82780.86900.8407
150.88480.85860.85560.81410.86580.8269
160.87070.84740.84070.79940.86460.8142
170.86300.83710.81060.78310.85490.8002
180.84710.82620.82150.77070.83560.7887
190.80970.81590.80900.75620.79890.7757
200.82660.80500.79690.74290.80680.7630
210.81940.79380.75850.73060.79080.7529
220.79680.78350.72760.71640.77660.7393
230.79400.77210.68570.70300.75370.7258
240.77450.76000.73320.68810.73810.7120
Table 5. NARX and SVR multi-step-ahead prediction performance (in RMSE values).
Table 5. NARX and SVR multi-step-ahead prediction performance (in RMSE values).
Step Ahead PredictionPasir GUDANGLarkinTTDI Jaya
NARXSVRNARXSVRNARXSVR
11.18000.71061.23220.71351.57970.8914
21.51551.07311.57801.07182.26511.3052
32.08791.41931.89701.42072.32821.6941
42.26221.75282.21831.75122.62392.0248
52.60142.04682.53322.03632.87042.3275
62.88772.33192.76442.31602.99952.6360
73.30922.59022.96732.58073.54652.9113
83.47512.83843.19632.82203.40473.1732
93.93053.07223.33313.05663.71743.4352
104.49893.29133.69863.27313.75113.6636
114.70803.50363.86843.48394.03563.9097
124.86283.71864.04343.68254.21354.1243
135.03913.90234.32253.87014.55094.3245
145.30384.08814.66904.04094.48114.5119
155.65294.28324.96594.22304.76914.7013
165.94134.45775.12504.39284.73344.8971
175.77864.62495.56154.55425.31255.0703
186.25824.82555.58494.72545.10815.2186
196.50324.99275.55374.87775.49915.4078
207.12415.17666.29725.01675.64995.5586
217.05895.37226.00565.21045.84575.7227
227.00475.52426.65505.37806.43975.9267
237.53415.73177.52955.52766.24886.1325
248.09175.95807.24955.74336.58166.4285
Table 6. RMSE values for one- to eight-step-ahead predictions, according to month.
Table 6. RMSE values for one- to eight-step-ahead predictions, according to month.
MonthPredictorStep Ahead
12345678
JANNARX1.50122.01922.57653.05163.64403.99494.37514.863
SVR1.00831.54522.02312.47932.91553.35223.78664.1557
FEBNARX1.66902.13192.54422.96403.30683.56863.94244.1698
SVR1.04281.66692.15902.58112.98313.34273.65413.9496
MARNARX1.01461.21001.41211.58751.70961.85022.10702.1575
SVR0.58470.80971.01461.21271.41941.56631.69911.8356
APRNARX1.67891.90572.07652.29942.45872.65092.81652.9065
SVR0.67921.01201.35781.67242.00732.28062.52632.7356
MAYNARX1.18851.47581.84752.14612.43242.55332.94943.1601
SVR0.71831.08661.44841.79612.12042.42822.72302.9844
JUNENARX1.28231.60121.89892.12012.35982.64352.75352.9114
SVR0.78641.16081.51751.82942.07342.31182.52932.7184
JULYNARX0.82560.94081.02351.12891.19871.30691.42571.5252
SVR0.45870.58140.72600.84760.98311.09901.22571.3482
AUGNARX0.97111.03201.14741.28731.44611.54301.69861.8528
SVR0.46870.60900.76960.92591.08371.25121.41061.5599
SEPTNARX5.74339.05777.866910.48711.145113.648714.76215.1999
SVR1.00991.91082.82323.72944.61755.43746.20176.9276
OCTNARX2.26832.84343.30733.31483.55093.65163.77133.9953
SVR1.37162.33733.04733.48083.70543.83323.96584.1094
NOVNARX2.08902.22343.2803.45813.80423.65614.96044.9131
SVR0.75191.23511.70342.14822.55282.93453.28913.6206
DECNARX1.81624.44564.26503.72524.71705.25856.67096.5395
SVR1.13491.89122.58993.24913.87094.46285.02845.6076
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Mustakim, R.; Mamat, M.; Yew, H.T. Towards On-Site Implementation of Multi-Step Air Pollutant Index Prediction in Malaysia Industrial Area: Comparing the NARX Neural Network and Support Vector Regression. Atmosphere 2022, 13, 1787. https://doi.org/10.3390/atmos13111787

AMA Style

Mustakim R, Mamat M, Yew HT. Towards On-Site Implementation of Multi-Step Air Pollutant Index Prediction in Malaysia Industrial Area: Comparing the NARX Neural Network and Support Vector Regression. Atmosphere. 2022; 13(11):1787. https://doi.org/10.3390/atmos13111787

Chicago/Turabian Style

Mustakim, Rosminah, Mazlina Mamat, and Hoe Tung Yew. 2022. "Towards On-Site Implementation of Multi-Step Air Pollutant Index Prediction in Malaysia Industrial Area: Comparing the NARX Neural Network and Support Vector Regression" Atmosphere 13, no. 11: 1787. https://doi.org/10.3390/atmos13111787

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop