The Role of GARCH Effect on the Prediction of Air Pollution

: Air pollution prediction is an important issue for regulators and practitioners in a sus-tainable era. Air pollution, especially PM 2.5 resulting from industrialization, has fostered a wave of global weather migration and jeopardized human health in the past three decades. Taiwan has evolved as a highly developed economy and has a severe PM 2.5 pollution problem. Thus, the control of PM 2.5 is a critical issue for regulators, practitioners and academics. More recently, GA-SVM, an artiﬁcial-intelligence-based approach, has become a preferred prediction model, attributed to the ad-vances in computer technology. However, hourly observation of PM 2.5 concentration tends to present the GARCH effect. The objective of this study is to explore whether the integration of GA-SVM with the GARCH model can build a more accurate air pollution prediction model. The study adopts central Taiwan, the region with the worst level of PM 2.5 , as the source of observations. The empirical implementation of this study took a two-step approach; ﬁrst, we examined the potential existence of the GARCH effect on the observed PM 2.5 data. Second, we built a GA-SVM model integrated with the GARCH framework to predict the 8 h PM 2.5 concentration of the sample region. The empirical results indicate that the prediction performance of our proposed alternative model outperformed the traditional SVM and GA-SVM models in terms of both MAPE and RMSE. The ﬁndings in this study provide evidence to support our expectation that adopting the SVM-based approach model for PM 2.5 prediction is appropriate, and that prediction performance can be improved by integrating the GARCH model. Moreover, consistent with our prior expectation, the evidence further supports that taking the GARCH effect into account in the GA-SVM model signiﬁcantly improves the accuracy of prediction. To the knowledge of the authors, this study is the ﬁrst to attempt to integrate the GARCH effect into the GA-SVM model in the prediction of PM 2.5 . In summary, with regard to the development of sustainability for both regulators and practitioners, our results strongly encourage them to take the GARCH effect into consideration in air pollution prediction if a regression-based model is to be adopted. Furthermore, this study may shed light on the application of the GARCH model and SVM models in the air pollution prediction literature.


Introduction
Air pollution has long been an important concern for all countries. In particular, the level of air pollution is often closely related to the industrialization of each country. Although industrialization brings economic growth and prosperity to society, it is often accompanied by factors causing the deterioration of the ecological environment, such as waste and air pollution. For a long time, developed countries, in response to citizens' pursuit of a cleaner environment and better life quality, have had to deal with the air pollution problems associated with high levels of industrialization. In recent years, as a result of climate change and the rapid economic development of emerging economies, air chronic respiratory disease increased by nearly 25%. The seriousness of Taiwan's air pollution problem did not come to the surface until the Changhua Guoguang Petrochemical Development Project Evaluation in 2010 [6]. With government's promotion of a non-nuclear policy, the supply of green energy is still insufficient, thus the demand for coal-fired power has continued to increase in recent years. The government is still planning to build new coal-fired thermal power plants even though thermal power generation has accounted for more than 80% of annual power generation structure in Taiwan. Therefore, in this study we show that, in order to maintain citizens' health, accurately predicting the level of PM 2.5 and analyzing the sources of air pollution are extremely important issues for the government to formulate air pollution policies and to control target levels.
In terms of forecasting methodology, in order to deal with autoregressive characteristics of observed time series data, the General Autoregressive Conditional Heteroscedasticity model (GARCH) is the most widely used method. With recent progress of computer programming and development of big data analysis, besides traditional statistical-based forecasting methods, the machine-learning-based approach is preferred in industry and academia. Among them, neural network systems and support vector machine (SVM) models are the most commonly used. Recently, the application of methods has evolved towards integrating models. For example, to improve prediction accuracy, neural networks are combined with wavelet analysis [7], while in terms of SVM, the combination of genetic algorithm (GA-SVM) is preferred [8].
Regarding the improvement of prediction accuracy, in addition to considering the integration of methods, the attributes of observation data are also very important. The sample data examined in this study is the hourly air pollution index from the EPA monitoring stations, which may be affected by factors such as terrain, atmosphere and so on. The air pollution in the previous hourly period may not have dissipated yet, thus the deferred effects on air pollution index observed in the next period may lead to autoregression. Conventional prediction models have been built on the basis of regression analysis. In the classical regression model, residual autoregression and heteroscedasticity should not exist; therefore, their presence may produce bias and violate the underlying assumptions [9,10]. In addition, GA-SVM is a machine-learning-based AI approach. Accuracy of prediction depends not only on quality of data, but also on availability of input (correlated) variables. Thus, a sufficient number of input variables (for more training) will be helpful to generate better prediction results. If we can provide more related variables in the training process, we will be able to produce better prediction accuracy. Our intention is to examine variables, from the GARCH model, which can improve the accuracy of the prediction model for air pollution.
Accordingly, the purpose of this study is, using key observation data from local monitoring stations, to analyze whether the hourly PM 2.5 presents an autoregressive conditional heteroscedasticity (ARCH) effect and whether incorporating the GARCH effect into the GA-SVM model can improve the performance of an air pollution prediction model and further identify factors affecting PM 2.5 . In the first stage of analysis, the examination starts with an ADF test for stationary in our time series data, followed by an LM test and an ARCH test to investigate the existence of autoregression and ARCH effect in our dataset. We further estimate using a GARCH(1,1) model to confirm the existence of the GARCH effect and integrate the GARCH effect into the GA-SVM model with PM 2.5 as the prediction variable in the second stage of analysis [11]. Empirical results indicate that the prediction performance of our proposed alternative model outperformed traditional SVM and GA-SVM models in terms of both MAPE and RMSE as accuracy measures.
Consistent with previous SVM literature, which suggests a trend to integrate various approach into SVM model [8], our empirical results provide evidence to support our expectation that adopting an SVM-based approach model for PM 2.5 prediction is appropriate and that prediction performance can be improved by integrating models, such as incorporate the GARCH effect into a GA-SVM-based approach. Moreover, consistent with our prior expectation, evidence further supports that taking the GARCH effect into account, in a GA-SVM model, clearly improves the accuracy of prediction. To the knowledge of the authors, this study is the first to integrate the GARCH effect into the GA-SVM model in the prediction of PM 2.5 . In summary, with regard to the development of sustainability for both regulators and practitioners, our results strongly encourage them to take the GARCH effect into consideration in air pollution prediction if the regression-based model is to be adopted. Furthermore, this study may shed light on the application of the GARCH model, as well as machine learning methods, in the air pollution prediction literature.

Literature Review
After the announcement of Kyoto Protocol, the prediction and management of air pollution has become an important common concern among industry, government and academics around the world. Scholars have put a lot of effort into empirical research and have obtained quite significant results. Nevertheless, the pursuit of the best predictive model has not reached a consistent conclusion. Here, we review key studies from the literature relevant to this study.
Zickus et al. employed daily average of PM 10 concentrations monitored in the Terro region of Helsinki, Finland from 1996 to 1999 as a sample and variables such as wind speed, wind direction, air pressure, humidity, precipitation, temperature, dew point temperature, and terrain to predict daily PM 10 concentration in 1999 with a three-year training period from 1996 to 1998 [12]. Empirical methods include logistic regression, decision tree, multiple adaptive regression splines (MARS) and artificial neural networks (ANN). Results show that logistic regression, multiple adaptive regression, and neural networks perform more consistently, while decision trees perform significantly worse. Dudot et al. employed a neural network combined with a neural classifier to predict hourly maximum ozone concentrations in central France [13]. The neural model is based on the MLP structure, and the sample data is collected from the French air quality agency LIG'AIR, which has 15 ground monitoring stations, and this study only focuses on three stations in Orleans. The daily maximum data for hourly ozone mean concentrations from 1999 to 2003 were adopted. Since the ozone observation peaks in summer, the authors only used data from April to September. Nonetheless, the model developed can be used to make valid forecasts throughout the year. Results showed that the use of neural networks for ozone peaks produced better predictions, with a 92% concordance index, MAE = RMSE = 15 µg/m 3 , MBE = 5 µg/m 3 , as compared to the European threshold for hourly ozone of 180 µg/m 3 . In order to improve the accuracy of prediction, the authors use a neural classifier with a sigmoid function in the output layer. The output range of the network was [0,1], which can be interpreted as the probability of exceeding the standard. Comparing this model with logistic regression shows that the prediction accuracy index using the neural classifier is 78%, compared to 65% to 72% for the classical MLP. Voukantsis [18]. Wind speed, wind direction, solar radiation, temperature and relative humidity were used as variables and the adjacent pollution sources were used as references. Results support improved prediction accuracy based on the values of the correlation coefficient and RMSE. The correlation coefficient between observed and predicted PM 2.5 concentrations increased from 0.77 to 0.79, and PM 10 concentrations increased from 0.63 to 0.69. The RMSE index values of PM 2.5 and PM 10 were reduced from 5.00 to 4.74 and from 6.77 to 6.34, respectively. Kristiani et al. implemented short-term prediction of PM 2.5 in Taiwan using the long short-term memory (LSTM) deep learning method [19]. Results indicate that LSTM had the lowest RMSE value at 1.9, as compared to other models such as CNN at 3.5, Bi-LSTM at 2.5, Bi-GRU at 2.7 and RNN at 2.4.
Zheng et al. applied neural networks and linear regression as spatial and temporal prediction models and combined with regression trees [20]. The study included monitoring data from 43 cities in China from 1 May 2014 to 30 April 2015 and combined temporal predictors, spatial predictors, prediction aggregators and deformation predictors to forecast air quality at Beijing, Shanghai, Tianjin, and Guangzhou for the following 48 h. Results indicate that the model can achieve an accuracy of 0.75 in the first 6 h and 0.6 in the next 7 to 12 h. Even though forecast accuracy in Beijing was the worst among the four cities, it was still better than the weather forecasting model (WFM) adopted by the Beijing Environmental Protection Monitoring Center. Feng et al. proposed a hybrid model that combines air quality trajectory analysis and wavelet transformation with a neural network (ANN) to predict PM 2.5 concentrations beyond two days, and its accuracy was observed [7].
The sample data was collected from 13 air quality monitoring stations in Tianjin and Hebei, China from 1 September 2013 to 31 October 2014. Wind speed and wind direction were set as parameters affecting air quality. The prediction results show that this hybrid model can effectively improve prediction accuracy of PM 2.5 , and its RMSE value can be reduced by as much as 40% on average. In particular, the days with high PM 2.5 concentration can almost be predicted by wavelet decomposition, and the detection rate (DR) can reach 90% on average for the alert threshold set by the hybrid model. Wang et al. established an urban air quality prediction system based on the weather research forecast and chemistry (WRF-Chem) model and a regional haze weather forecast system based on the Regional Atmospheric Environment Modeling System (RegAEMS) and applied to Shanghai and Nanjing in the Yangtze River Delta region of China [21]. The study conducted a one-year forecast in Shanghai from May 2009 to April 2010 and a one-month test in Nanjing in October 2007. Results show that WRF-Chem performs well in the prediction of SO 2 , NO 2 , and PM 10 , with the prediction accuracy of API index in Shanghai and Nanjing of 50-83% and 80%, respectively. RegAEMS performed well in haze weather forecasting in terms of RH, PM 2.5 and visibility. The accuracy rates of Shanghai and Nanjing were 77% and 58%, respectively. The authors developed new classification criteria by taking relative humidity, PM 2.5 and visibility as key parameters. Saide et al., applied WRF-Chem model combined with a two-kilometer grid to build a forecasting system to predict PM 2.5 concentration for the next one to three days [22]. The test period was from April to August in 2014 and the sample included hourly PM 2.5 observations at nine cities in Chile and the United States: Santiago, Rancagua, Curico, Talca, Chillan, Los Angeles, Temuco, Valdivia and Osorno. Empirical results show that the prediction accuracy ranged from 50 to 70%, while the optimal initialization was 61 to 76%.
Delavar et al. established an air pollution prediction model to predict PM 10 and PM 2.5 concentrations in Tehran [8]. The day of the week, month, topography, meteorology and pollution rates of two neighboring areas were adopted as input parameters for the machine learning methods adopted including SVR (support vector regression), NARX (nonlinear autoregressive exogenous), ANN and GWR (geographically weighted regression). Cross validation was applied on results to evaluate the best method for modeling air pollution predictions. Empirical results show that, SVR, NARX, ANN and GWR can reduce the RMSE of PM 10 by 53%, 47%, 47% and 94%; and predict the RMSE of PM 2.5 by 58%, 57%, 61% and 94%, respectively. The best prediction method was NARX with external input. Using the proposed prediction model, its RMSE value reached 1.79. In addition, using a genetic algorithm (GA), the authors found that variables such as day of the week, month, topography, wind direction, maximum temperature and pollution rates in two neighboring areas were the most effective parameters for predicting air pollution. Hu et al. used the hourly CO concentration values of four stations, namely, Liverpool, Chullora, Rozelle and Prospect, in Sydney, Australia from May 2009 to May 2016 as samples [23]. Using SVR as the method, CO concentration values were predicted and compared with the prediction results of ANN. Empirical results show that when MAE is used as evaluation index, prediction accuracy of SVR and ANN are 0.314 and 0.435, respectively; when RMSE is used, prediction accuracy of SVR and ANN are 0.414 and 0.677, respectively. In summary, the prediction results of SVR are more accurate than those of ANN.
Davis and Speckman adopted the generalized additive model (GAM) method to establish an air quality prediction system to estimate the next-day maximum and the average ozone concentration over an 8 h period (10 a.m. to 5 p.m.) in Houston [24]. The study collected ozone data from 10 stations in the Houston area from 1983 to 1991, as well as meteorological data at international airports. Data from April to October from 1983 to 1987 and from 1989 to 1990 were used as the training period, and 1988 and 1991 as the forecasting period. Empirical results indicate that wind direction, opaque cloud cover factor, the previous day's maximum ozone concentration, current day's maximum temperature and morning mixing depth were all very important variables in the model. In addition, the 8 h prediction results of the average ozone concentration at each station showed that the RMSE ranged from 13.2 to 16.3 ppb (R 2 is ranged from 0.66 to 0.73); the prediction results of the maximum average ozone concentration indicated an error range from 18.5 to 22.0 ppb (R 2 is ranged from 0.61 to 0.68). Siwek et al. took data collected in southern Warsaw from 2005 to 2007 as a sample to predict the PM 10 concentration [25]. Three machine learning networks were adopted: Multilayer Perceptron (MLP), Radial Basis Function (RBF), and SVM. Other models proposed include wavelet-transformed MLP, RBF, and SVM models and a model integrating Blind Source Separation (BSS) and another neural network structure. Empirical results showed that the MAE values of MLP, RBF and SVM models were 6.47, 6.99 and 7.07 µg/m 3 , respectively, and the MAPE values were 26.43, 28.49 and 27.05%, respectively. After wavelet transformation, the MAE values of the MLP, RBF and SVM models were 4.37, 5.76 and 4.93 µg/m 3 , respectively, and the MAPE values were 18.04, 23.43 and 20.93%, respectively. The MAE values of the models integrated by BSS and SVM were 3.89 and 4.03 µg/m 3 , respectively, and the MAPE values were 15.78 and 15.96%, respectively. Results indicate that accuracy of the prediction was improved, and the prediction performance of the model integrated by SVM was the best, which was over 12% higher than the SVM model transformed by wavelet, and higher than the pure RBF model, which was the worst at over 44%.
Sotomayor-Olmedo et al. took monthly air quality monitoring data, including O 3 , NO 2 and PM 10 , from Mexico City as a sample and applied SVM to predict the air pollution quality of each month in 2009 [26]. Parameters were adjusted through three kernel functions and the performance of prediction results were compared. Empirical results showed that in the prediction of O 3 and NO 2 , the SVM model applying the Gaussian kernel function had higher accuracy. The empirical results also indicated that the prediction accuracy of the three kernel functions was lower in the last couple months of the year, especially in December. As for the prediction of PM 10 , the Gaussian kernel function mode performed better with a large number of SVMs and the polynomial and spline kernel function modes were relatively accurate with a small number of SVMs.
Song et al. explored a more accurate model in the prediction of power usage load spikes. The study proposed an FKM-ASVM-GARCH ECM model, which integrates GARCH and SVM models, as an alternative model to be compared with traditional FKM-ASVM model which does not include GARCH-modified errors [27]. The study adopted China's electricity supply as an observation sample, using the daily load capacity from June to July 2014 as the training period of SVM and the daily load capacity in August as the test period. Results indicate that MAPE, the evaluation criterion for prediction accuracy, was reduced from 1.72 in traditional method without GARCH to 0.74 in the alternative GARCH model. Therefore, it was suggested that the FKM-ASVM-GARCH ECM is superior to the FKM-ASVM. Integration of the GARCH model can indeed improve the accuracy of the SVM prediction model. Ishak et al. applied SVR and random forest (RF) models to establish a prediction model for the daily maximum ozone concentration at three monitoring stations in Tunisia, namely, Gabes, Ghazela and Manouba [28]. Using the station data of the National Environmental Protection Agency (ANPE) from 20 June 2014 to 30 September 2014 as the observation sample, 36 explanatory variables, including daily maximum ozone concentration (maxO 3 ) and other pollutant concentrations (SO 2 , NO 2 , NO and PM 10 ), were adopted to explain daily maximum ozone concentration. The experimental results showed that prediction performance of the RF model was better than that of the support vector regression model. The RMSE values of the RF model at the Gabes, Ghazela and Manouba stations were 2.26, 4.16, and 6.71, respectively; the MAE values were 1.85, 3.18 and 5.29 and the MAPE values were 4.08, 3.51 and 8.63, respectively. Lin et al. adopted three machine learning methods, including decision tree regression (DTR), gradient boosted tree regression (GBTR) and SVR to predict PM 2.5 concentration in the next hour at 67 locations in Taiwan through a big data platform, with RMSE and MAE as the accuracy evaluation criteria [29]. Results showed that the RMSE of DTR, GBTR and SVR methods were 8.52, 5.17 and 4.68, respectively, and the MAE indicators were 6.25, 3.63 and 3.46, respectively. A preliminary conclusion suggests that SVR is considered to be the better prediction model. Altogether, with recent evolution of quantitative methods, methodology of the air pollution quality prediction in the literature has evolved from traditional Logistic regression analysis to the application of machine-learning-based approaches. The most widely used methods include neural network analysis (ANN) and SVM. The methodology of ANN has evolved from a traditional single-layer input and output to a multi-layer recurrent neural network analysis method (RNN). However, the need for huge volume of (big) data to improve accuracy in RNN method has become a challenge for empirical study with constrained data collection.
Another development trend is the use of hybrid prediction models combined with other methods, such as wavelet analysis, in pursuit of higher prediction accuracy. In terms of SVM methodology, it is moving towards combining other algorithms, such as the combination of the genetic algorithm (GA) model and machine learning methods, such as GA-SVM model. There is still a lack of consensus about which methodology can provide the best prediction accuracy. Finally, the choice of air pollution predictors is also inconclusive among the literature. The intention of this study is to examine predictors and alternative prediction models that integrate the GARCH effect into the GA-SVM model, to improve prediction accuracy for air pollution.

Dataset
Data for our empirical study was retrieved from the Environmental Protection Agency (EPA) database in Taiwan. The study adopts the central Taiwan region, which has the worst PM 2.5 density, as the observed sample. The choice of predictive variables refers to previous literature related in Section 2.1. However, due to the availability of data, nine variables were adopted for our examination: fine particles (PM 2.5 ), carbon monoxide (CO), nitric oxide (NO), nitrogen dioxide (NO 2 ), nitrogen oxide (NO x ), ozone (O 3 ), suspended particulate matter (PM 10 ), sulfur dioxide (SO 2 ), wind direction (WindDirection), and wind speed (WindSpeed).
Sampling for this study includes hourly observation data from five air pollution monitoring stations, including FongYen, SaLu, DaLi, ChungMin and SeaTun stations, in central Taiwan. The observation period for our study was from 20 October 2020 to 16 December 2020. The dataset was split into two samples, training data and testing (holdout) data, to evaluate the prediction performance. We held the testing data as the out-of-sample test and used the holdout test to predict hourly PM 2.5 concentration in next 8 h in January 2021. In Table 1, we present basic features of the data, including the locations of five monitoring stations, duration and frequency of data and observation numbers in each station. There are three major reasons for choosing the research period from 20 October 2020 to 16 December 2020. First of all, seasonal characteristics are distinct among the four seasons in Taiwan. Especially, the effects of subtropical island weather, temperature and continental air mass on air quality are unique in winter season. The most serious air pollution problems occur around wintertime. Thus, we adopted observation data from this period for examination considering the seasonal effect. Second, in this dataset with over 1200 hourly observations in each station, from a statistical point of view, the mean behavior of the sample is closer to observation data, which makes it more representative for splitting hourly observation data into training data and testing (holdout) data. Third, the data was retrieved from the EPA database in Taiwan. There is redundant overlap in later period observations and clean data is not easy to retrieve, which is why we adopted sample data from the 20 October to 16 December period.

Methodology
In this study, we attempting to apply machine learning methods to estimate the degree of PM 2.5 based on hourly observation data. Due to the influence of terrain, temperature, humidity, wind direction and so on, the PM 2.5 in the previous period may not have completely dissipated and will affect the PM 2.5 concentration in the next period, thus the phenomena of autoregression and heteroscedasticity, which are common in time series data, might exist in the dataset. Therefore, it is necessary to check the time series features before the prediction. Where the phenomena of autoregression and heteroscedasticity exist, we applied the Generalized Autoregressive Conditional Heteroscedasticity (GARCH) model to capture the time series characteristics, and then examined various cross-model integration methods, including integrating the GARCH effect in the GA-SVM model, to establish a method that provides the highest prediction accuracy.
A two-stage approach will be performed in this study. In the first stage, as shown in the upper part of Figure 1, we start with an Augmented Dickey-Fuller (ADF) unit root test for stationary, followed by the GARCH effect diagnosis, including LM test and ARCH test, and end with GARCH model estimation.

Unit Root Test
Granger and Newbold suggested that, if an analysis is carried out with time series data in a non-stationary state, the results will be biased and lead to spurious regression [30]. In such cases, difference should be carried out till the time series data is stationary for further analysis. For the stationary test method, we follow Engle and Granger and adopt the Augmented Dickey-Fuller (ADF) test to check whether the PM 2.5 concentration in the dataset is a stationary time series [31]. Depending on the inclusion of intercept or trend, or not, the three variations of ADF test model are:

1.
Model with neither intercept nor trend

2.
Model with intercept but without trend 3.

Model with both intercept and trend
where α 0 is the intercept; t is the trend for time; γ, β i and α 2 are parameters to be estimated; p is the optimal lag order; the residual is ε t ∼ iid(0, σ 2 ) and fit white noise; the null hypothesis is H 0 : γ = 0. If the test statistics are significant and the null hypothesis is rejected, this time series data does not have a unit root phenomenon and belongs to a stationary time series.

ARCH Test
When the conditional variance of the regression residuals is not uniform, the estimated coefficient is not valid. Therefore, in traditional quantitative empirical analysis, testing whether the model has heterogeneous variance (Heteroscedasticity) has become the main step in diagnosing the model. In empirical analysis, before fitting the GARCH correlation model, it is necessary to check whether the sample time series data has the feature of heterogeneous variation, that is, whether there is an ARCH effect, as the basis for whether the ARCH model can be configured. For the test method, we applied the Lagrange multiplier (LM) test proposed by Engle to test whether the ARCH effect was present [9]. The testing steps were as follows.
(1) We first run the OLS regression to estimate the appropriate mean equation: y t = x t ∧ α, where ∧ α is the regression coefficient estimated by OLS, and the residual ∧ ε t = y t − x t ∧ α is calculated accordingly, and then save the residual square ∧ ε 2 t as another time series.
(2) Regress the residual square estimate ∧ ε 2 t on intercept and q lagging terms to calculate the coefficient of determination, R 2 , of this regression analysis. The estimation function is as follows: (3) Multiply the determined coefficient, R 2 , by the number of samples, T, to calculate the LM test statistic, LM = T × R 2 ∼ χ 2 (q), where the LM statistic approaches the chi-square distribution with the degree of freedom q. If the resulting LM test statistic significantly rejects the null hypothesis: H 0 : ∧ α 1 = ∧ α 2 = . . . . . . . . . = ∧ α q = 0, it means that the time series data inspected has an ARCH effect, and an ARCH or GARCH model should be further fitted.

GARCH Model
If the ARCH effect exists in the hourly PM 2.5 concentration data, the conditional heterogeneity variance model will fit. Econometricians have proposed correction meth-ods to improve the heteroscedasticity of time series data. Among them, Engle [9] and Bollerslev [10] are the most popular.
Engle considered the conditional variance to change over time and included it in the autoregressive conditional heterogeneous variance (ARCH) model, allowing the conditional variance to be a function of the squared term of residual in previous period [9]. Thus, previous volatility will affect the subsequent volatility, which is in line with the phenomenon of volatility clustering. The model specification of ARCH(q) can be described as follows: where α 0 > 0, α i > 0, i = 1, 2, . . . . . . . ., q; y t is the time series data; ax t is the conditional mean of y t ; Ω t−1 is the information collected up to period t − 1 and h t is the conditional heterogeneity variance of y t . Bollerslev further added the lag conditional variance to the ARCH model, so that ARCH conforms to the traditional ARMA process, which is called the generalized autoregressive conditional heterogeneous variation (GARCH) model [10]. The conditional variance is not only affected by the squared term of residual in previous period, but also by the conditional variance in previous period. The GARCH (p,q) model is stated as follows: where α 0 > 0, α i > 0, i = 1, 2, . . . . . . . ., q; β j > 0, j = 1, 2, . . . . . . . ., p; y t is the time series data; ax t is the conditional mean of y t ; Ω t−1 is the information collection up to period t − 1; h t is the conditional heterogeneity variance of y t . We follow the GARCH (1,1) specification since it represents most popular specification based on prior literature [10,11,32,33]. After the GARCH model was estimated, to make sure the GARCH (1,1) model specification is at its optimal level, we applied the ARCH-LM test to check the model fit as well.
The estimation function of ARCH-LM test is as follows: Since the SVM model is a regression-based model, when autoregression exists in the dataset, the prediction result will be biased. If the GARCH effect was confirmed in the first stage, in the second stage of analysis, we integrate the GARCH effect into the GA-SVM model with PM 2.5 as the prediction variable. We add the PM 2.5 in the previous period (PM 2.5 ( t−1 ) ), the conditional heterogeneity variance in the previous period (h t−1 ) and the squared term of the residual in the previous period, ε 2 ( t−1 ) , from the GARCH model estimation into the GA-SVM model to establish an alternative PM 2.5 prediction model and to compare the prediction accuracy with traditional GA-SVM model which does not take the GARCH effect into consideration.

GA-SVM Model
The Genetic Algorithm (GA) was proposed by John Holland and is based mainly on Darwin's theory of evolution to simulate the "natural selection" in the evolution of the biological world [34]. The natural elimination mechanism of "survival of the fittest" is widely applied in solving optimization problems, data search, artificial intelligence and machine learning [35]. In the financial field, many scholars have also applied the genetic algorithm to examine various topics, such as: trading systems, stock or portfolio selection, bankruptcy prediction, credit evaluation, budget allocation, etc. [36].
The genetic algorithm mainly operates through three processes: reproduction, crossover and mutation. During the reproduction process, an initial population is randomly generated, and each individual is coded in binary and substituted into a fitness function. Then, based on the obtained fitness value, individuals with high fitness are selected and reproduced to the mating pool. Two individuals are selected randomly in the mating pool for mating each time and the algorithm decides whether the resulting offspring should undergo further mutation. The process of reproduction, crossover and mutation is repeated until the most resilient population is produced. A processing flow chart of the genetic algorithm is shown in Figure 2 [36]. SVM is a learning method widely applied in classification-related topics. SVM was proposed in 1995 by Vladimir Naumovich Vapnik and the AT&T laboratory team [37]. SVM is a machine learning system developed based on the Structural Risk Minimization (SRM) method in statistical learning theory. The main concept of SVM is to use a separating hyperplane to divide data into two or more classes and to deal with the problem of classification in data mining.

Evaluation Indicators for Prediction Models
Regarding the performance evaluation of prediction models, four indicators are generally adopted, including mean-absolute percentage error (MAPE), root mean squared error (RMSE), mean absolute error (MAE) and correlation coefficient (CC) [38]. Specifications of the four indicators are presented in Table 2 based on Witten, Frank and Hall [38]. MAPE is the most commonly used criteria for prediction performance evaluation [36]. We further added RMSE to reinforce our evaluation and will present results in Section 3.3. a 1 + · · · + ρ n −a n a n Root mean squared error (RMSE) Here, a is the mean value over the test data.
Lewis stated that MAPE is often applied to evaluate the predictive ability of a model [39]. The smaller the MAPE, the better the prediction performance. The denominator of percentage termed MAPE is the actual value, thus there is no problem of unstable comparison basis due to the size of the value. When MAPE is the measure, the value of (1-MAPE) represents the accuracy of prediction; thus, the lower the MAPE value, the better the predictive ability. RMSE mainly measures the degree of deviation between the predicted value and the actual value. The degree of deviation is standardized with the actual value of the variable, so the predictive ability of each variable can be compared. When RMSE is the measure, the closer its value is to 0, the better the predictive ability.
In Table 3, we present the interpretation of MAPE values based on Lewis [39]. For example, if the MAPE value is less than 10%, the prediction performance will be classified as "highly accurate forecasting". When the MAPE value is above 50%, the prediction performance will be classified as "inaccurate forecasting".

GARCH Effect Diagnosis
In Table 4, we present the results of the GARCH effect diagnosis in each of five air pollution monitoring stations including FongYen, SaLu, DaLi, ChungMin and SeaTun stations in Taichung, Taiwan, which are labeled as Stations 1, 2, 3, 4 and 5, respectively, in Table 3. Before the diagnosis, we start with an ADF unit root stationary test, as suggested by Engle and Granger [31], and find that the test statistics in all five stations are significant with probability of chi-square equal to 0.0000. The results significantly reject the null hypothesis of non-stationary at alpha = 0.01 level of confidence, thus the PM 2.5 time series data in each station are all stationary and can be used for further estimation.
As shown in Table 4, in all five monitoring stations, F-statistics of the OLS model are all statistically significant at 0.01 level and the adjusted R 2 values are at least 0.7773. The coefficient estimates and the t-statistics of each variable at each station are presented in the table. The coefficient estimates of PM 2.5 concentration in previous period (PM 2.5 (t−1) ) at Station 1 to 5 are 0.5408, 0.5828, 0.4945, 0.5563 and 0.5543, respectively; all are statistically significant at 0.01 level. Similar results can be found on the coefficient estimates of PM 10 and sulfur dioxide (SO 2 ); both are statistically significant at 0.01 level for all five stations. The coefficient estimates of ozone (O 3 ) were not significant for all five stations. 0.0000 0.0000 0.0000 0.0000 0.0000 Note: Test statistics of ADF test in five stations are significant with the chi-square probability equal to 0.0000; thus, the data in each station does not have a unit root phenomenon and belongs to a stationary time series. Stations 1, 2, 3, 4 and 5 represent monitoring station in the counties of FongYen, SaLu, DaLi, ChungMin and SeaTun, respectively. Variables examined include fine-particle observation in previous period (PM 2.5 (t−1) ), carbon monoxide (CO), nitric oxide (NO), nitrogen dioxide (NO 2 ), nitrogen oxide (NO x ), ozone (O 3 ), suspended particulate matter (PM 10 ), sulfur dioxide (SO 2 ), wind direction (WindDirection), and wind speed (WindSpeed). t-statistic values are presented below each coefficient estimates. ***, ** and * indicate statistical significance at 1%, 5% and 10%, respectively.
As we expected, autocorrelation in the PM 2.5 data does exist, and the result of the LM test reconfirms such a phenomenon. As shown in the lower part of

GARCH Estimation
In Table 5, we present the GARCH estimation and model specification for each of the five monitoring stations. We follow the GARCH (1,1) specification since it represents the most popular specification, based on prior literature [29][30][31][32]. As shown in the upper section of Table 4, for all five stations, the coefficient estimates of both PM 2.5 (t−1) , the PM 2.5 observation in previous period and PM 10   We also found that the coefficient estimates of nitric oxide (NO), nitrogen dioxide (NO 2 ), and nitrogen oxide (NO x ) were statistically significant at the 0.01, 0.05 and 0.05 levels, respectively, only at station 3 (DaLi) but not significant at the other four stations. Conversely, coefficient estimates of wind speed (WindSpeed) were significant at station 1, 2, 4 and 5 but not significant at station 3 (DaLi). A similar result is found in the coefficient estimates of carbon monoxide (CO), which were significant at Station 1, 2, 4 and 5, all at the 0.01 level, but not significant at station 3 (DaLi).
In the lower section of Table 5, we present the variance estimates of the GARCH model including the squared term of the residual in the previous period (ε 2 (t−1) ) and the conditional heteroscedasticity in previous period (h t−1 ) at five stations. Coefficient Empirical results do support our expectation that the PM 2.5 observations in the previous period (PM 2.5 (t−1) ), conditional heteroscedasticity (h t−1 ) and residual square of GARCH model (ε 2 (t−1) ) are appropriate to be incorporated as prediction variables into a GA-SVM model in our second stage procedure to establish an alternative PM 2.5 prediction model. After the GARCH model was estimated, to make sure the GARCH (1,1) model specification was at an optimizal level, we perform an ARCH-LM test to check the model fit as well. Test statistics in all five monitoring stations except Station 3 (DaLi) were significant, rejecting the null hypothesis at 0.01 level, indicating that the GARCH model fit is optimized.

Evaluations of the Prediction Models
In the second stage of our analysis, we integrate the GARCH effect into the GA-SVM model by adding PM 2.5 observations in the previous period (PM 2.5 (t−1) ), GARCH effect (h t−1 ) and residual of the GARCH model (ε 2 (t−1) ) to establish an alternative PM 2.5 prediction model (GA-SVM-GARCH) and compare the prediction performance of two traditional approaches, the SVM and the GA-SVM model, to our proposed alternative model. The observation period for our study was from 20 October 2020 to 16 December 2020. The dataset was split into two sample sets, training data and testing (holdout) data, to evaluate the prediction performance. We held the testing data as the out-of-sample test and used the holdout test to predict hourly PM 2.5 concentration in next 8 h in January 2021.
In Table 6, we present the MAPE and RMSE values of the SVM, GA-SVM and GA-SVM-GARCH models. MAPE is the most commonly used criteria for prediction performance evaluation [35]. The prediction accuracy was calculated by subtracting MAPE value from one. For example, the MAPE values of SVM, GA-SVM and GA-SVM-GARCH model in Station 2 (SaLu) are 35.94%, 33.14% and 0.68%, respectively, indicating prediction accuracies of 64.06%, 66.86% and 99.32%, respectively. The results indicate a significate increase in prediction accuracy with the GA-SVM-GARCH model by over 30% as compared to the traditional SVM and GA-SVM models. We further added RMSE as the second evaluation indicator to reinforce our evaluation. When the RMSE value is closer to 0, the predictive ability of the model is better. For example, in Station 2 (SaLu), RMSE values of SVM, GA-SVM and GA-SVM-GARCH models were 4.7724, 4.4938 and 0.0950, respectively, further proving the better prediction performance with the GA-SVM-GARCH model.
In terms of performance comparison, Song et al. adopted China's electricity supply as an observation sample and found that the MAPE was reduced from 1.72% in the FKM-ASVM model without GARCH, to 0.74% in the FKM-ASVM-GARCH ECM model with GARCH [27]. In our proposed alternative model, the MAPE value of Station 4 reduced from 26.89% in GA model and 26.64% in GA-SVM model to 0.14% in GA-SVM-GARCH model. The results indicate that integration of the GARCH model can indeed improve the accuracy of the SVM prediction models and our proposed GA-SVM-GARCH model provides the highest accuracy. In Figures 3 and 4, we present graphical abstracts of MAPE and RMSE comparison of three SVM models above in five monitoring stations. As shown in Figure 3, the MAPE values are within a range from 0.14% to 0.68% in the GA-SVM-GARCH model, as compared to 10.42% to 33.14% in the GA-SVM model and 10.33% to 35.94% in the GA model. In Figure 4, the RMSE values are within a range from 0.0225 to 0.0950 in the GA-SVM-GARCH model, as compared to 3.1561 to 7.3327 in the GA-SVM model and 2.9855 to 7.2913 in the GA model.
As shown in Table 6, Figures 3 and 4, the MAPE and RMSE value of the GA-SVM-GARCH model are the lowest compared to those values in the SVM and GA-SVM models. Overall, the performance of our proposed alternative model outperformed traditional SVM and GA-SVM models in terms of both MAPE and RMSE. When we integrated the GARCH effect into the GA-SVM model, the improvement in predicting accuracy exceeded our expectations.

Conclusions
Air pollution, especially that of PM 2.5 , resulting from industrialization has fostered a wave of global weather migration and jeopardized human health in the past three decades. The prediction and control of air pollution, especially PM 2.5 , has been a critical issue for regulators, practitioners and academics in Taiwan. Much research has been carried out searching for better prediction models, yet there is no consensus on which is the most accurate approach. In this study we conducted a two-stage analysis and explored whether, by integrating the GA-SVM model with the GARCH effect, we can construct a more accurate air pollution prediction model. The study adopted the region with the worst PM 2.5 density, central Taiwan, as the sample source.
In the first stage of analysis, the examination started with an ADF test for stationary in our time series data, followed by an LM test and ARCH test to investigate if autoregression and the ARCH effect existed in our dataset. We further estimated with a GARCH (1,1) model to confirm the existence of a GARCH effect and integrated the GARCH effect into the GA-SVM model with PM 2.5 as the predictive variable in the second stage of analysis. Empirical results indicate that the prediction performance of our proposed alternative model outperformed traditional SVM and GA-SVM models in terms of both MAPE and RMSE as the accuracy indicators.
Consistent with previous SVM literature [7,23,26,27], which shows a trend of integrating various approaches into the SVM model, our empirical results provide evidence to support our expectation that adopting an SVM-based approach model for PM 2.5 prediction is appropriate and that prediction performance can be improved by integrating models, such as incorporate the GARCH effect into the GA-SVM-based approach. Moreover, consistent with our prior expectation, evidence further support that taking the GARCH effect into account, in the GA-SVM model, clearly improves the accuracy of prediction. To the knowledge of the authors, this study is the first to attempt to integrate the GARCH effect into the GA-SVM model in the prediction of PM 2.5 . It is possible to compare the empirical results of prediction in this study to findings in the recent PM 2.5 literature, although different methods were adopted. Studies on the Taichung region by Chen et al. [2] and Kristiani et al. [19], cross country (Kusuma et al. [3]), and sub-district in China (Long et al. [5]) are comparable.
In summary, this study has implications for sustainability management by both government and industry. As long as a regression-based approach is adopted, ignoring possible autoregressive characteristic in time series dataset tends to result in lower prediction efficiency and accuracy. Furthermore, if variance clustering exists in a time series dataset, the choice of prediction model should account for this phenomenon. It is highly recommended to take the GARCH effect into consideration in air pollution prediction to capture variance clustering and to improve prediction efficiency and accuracy. Moreover, this study may shed light on the application of the GARCH model, as well as machine learning methods, in the air pollution prediction literature.