Predicting Daily Air Pollution Index Based on Fuzzy Time Series Markov Chain Model

: Air pollution is a worldwide problem faced by most countries across the world. Prediction of air pollution is crucial in air quality research since it is related to public health e ﬀ ects. The symmetry concept of fuzzy data transformation from a single point (crisp) to a fuzzy number is essential for the forecasting model. Fuzzy time series (FTS) is applied for predicting air pollution; however, it has a limitation caused by utilizing an arbitrary number of intervals. This study involves predicting the daily air pollution index using the FTS Markov chain (FTSMC) model based on a grid method with an optimal number of partitions, which can greatly develop the model accuracy for air pollution. The air pollution index (API) data, which was collected from Klang, Malaysia, is considered in the analysis. The model has been validated using three statistical criteria, which are the root mean (RMSE), the mean absolute percentage error (MAPE), and the Thiels’ U statistic. Also, the model’s validation has been investigated by comparison with some of the famous statistical models. The results of the proposed model demonstrated outperformed the other models. Thus, the proposed model could be a better option in air pollution forecasting that can be useful for managing air quality.

modified a weighted FTS model for enrollment forecasting. They adopted the weighted model by adding the difference between the observed dataset across a midpoint of intervals. Tsaur [28] proposed the FTS model based on Markov chain, which is used for obtaining the largest probability using the transition probability matrix. He also used a random length of interval for the universe of discourse, which leads to a negative effect by abnormal observations and outliers. Sadaei et al. [29] developed a refined exponentially weighted FTS for forecasting the load data, which is developed prediction preciseness. More specifically, the effective interval length has been investigated by several studies based on different methods. For example, Huarng [22,23] proposed two methods for determining the effective length of intervals, which are based on averages and distribution. Yolcu et al. [24] developed Huarng's model [22] based on constrained optimization for determining the effective length of intervals. Egrioglu et al. [30] proposed a new method of fuzzy time series using a single variable constrained optimization to determine the best length of interval for the best forecasting accuracy. Chen et al. [31] proposed a new FTS forecasting model integrated with the granular computing approach and entropy method for stock price data. Talarposhti et al. [32] proposed a hybrid approach using optimization techniques and intelligence algorithm to determine the proper length of intervals for predicting the stock market. Cheng et al. [33] employed a rough set and utilized an adaptive expectation model to propose a new fuzzy time series to forecast the closing price. Rahim et al. [34] developed a type 2 FTS model using the sliding window technique for determining the appropriate length of intervals. Bose et al. [35] proposed a new partitioning method with the rough-fuzzy method for developing the fuzzy time series model.
Apart from that, Zuo et al. [36] developed a combining topological optimization technique in order to figure out the optimization problem in the product manufacturing process. Ning et al. [37] proposed a new method based on a chip formation model and an iterative gradient search method using Kalman filter algorithm. This optimization method has been used to inversely identify the Johnson-Cook model constants of ultra-fine-grained titanium. Ning and Liang [38] introduced a developed inverse identification technique for Johnson-Cook model constants based on the use of temperature and force data for predicting machining forces. The development of the model has been done by using an iterative gradient search method based on the Kalman filter algorithm. These types of optimization techniques can be adopted for improving the forecasting models.
More specifically, The FTS models have been utilized for forecasting environmental problems such as air quality [9][10][11][12][13][14][15][16][17], which are considered for predicting air pollution since the time series of air pollution may include uncertainty data and may not verify some of the statistical assumptions. Nevertheless, utilizing the FTS models in the field of air pollution is still very rare. For example, Cheng et al. [17] introduced a trend weighted FTS model to predict daily O3 concentrations. Dincer and Akkuş [12] predicted the SO2 concentrations based on a robust FTS model, which has provided good forecasting results. Koo et al. [39] made a comparison study using FTS and other statistical models for predicting air pollution events. They concluded that the proposed model outperformed the other models. Wang et al. [40] proposed a hybrid FTS method with data re-processing approaches for forecasting the main air pollutants. Yang et al. [41] proposed a forecasting system based on a combination of the fuzzy theory and advanced optimization algorithm for air pollution forecasting.
As previously mentioned, the FTS models have been utilized to solve various domain forecasting problems. However, several FTS have some issues, such as using an arbitrary length of intervals for the universe of discourse, repeated fuzzy relationships, or considering the weights of fuzzy logical relationships. According to the literature review above, some researchers proposed a partition method with complexity computations, and some have not evaluated their models, by comparison with the other recent models. Particularly, the FTS-based Markov chain model has a deficiency in determining the effective length of the interval, which was negatively affected by abnormal observations and outliers. Therefore, determining the optimal length of intervals and assigning the proper weights to present is an interesting issue that needs to be addressed. This motivated us to investigate the optimal partition number of the universe of discourse. This study proposes the FTS Markov chain Symmetry 2020, 12, 293 3 of 18 (FTSMC) model based on the grid method with the optimal number of the partitions of the universe of discourse to provide significantly improved performance in the model accuracy for air pollution forecasting. The major contribution of this study is to propose an improved model with an appropriate partition number and to implement the model for forecasting APIs as a new forecasting model in air quality research.

Fuzzy Time-Series Definitions
The fundamental steps for designing fuzzy time series models are defined universe of discourse U, divide U into an equal number of intervals, fuzzification, define fuzzy logic relation, determined forecasted values, and defuzzification. The main time series definitions of developed are listed below: Definition 1. Let X(t)(t = 0, 1, 2, . . .), a subset of real numbers, be the universe of discourse in which fuzzy sets f j (t) are defined. Let F(t) be a collection of f 1 (t), f 2 (t), . . . , then F(t) is called a fuzzy time series, defined on X(t) [17][18][19].

Definition 2.
Let R(t, t − 1) be the fuzzy logic relationship (FLR) between F(t − 1) and F(t), which can be denoted as In this case, F(t) is called the time-invariant fuzzy time series, while otherwise called a time-variant fuzzy time series [18,19]. Definition 3. Suppose that F(t − 1) = A i and the F(t) = A j . The relationship between two consecutive observations (t − 1) and F(t), denoted to as the FLR can be defined as A i → A j , where A i and A j are the left-hand side and right-hand side of the FLR respectively [17][18][19].

Study Area and Dataset
The air pollution index (API) data is classified based on the highest index value of five main air pollutants, namely, ozone (O 3 ), sulphur dioxide (SO 2 ), particulate matter (PM 10 ), carbon monoxide (CO 2 ), and nitrogen dioxide (NO 2 ), as shown in Figure 1 [42][43][44][45]. The API values are determined by the average indices for these five pollutant variables, and then the maximum value from these sub-indices is selected as the API value [3]. In Malaysia, the air pollution index (API) has been adopted as a measure of air pollution conditions. The API is a simple number that ranges from 0 to ∞ to reflect the air quality levels that are related to the health effects [3,45].
In this study, the daily API maxima values, which were gathered from an air monitoring station located in Klang, Malaysia, are considered in the analysis. The city of Klang is located nearly 32 km to the west side of Kuala Lumpur and covers a land area of about 573 km 2 , as shown in Figure 2. The API dataset is divided into a training dataset, which is from the 1 January 2012 to 31 December 2013 and testing dataset, which is from the1 January 2014 to 31 December 2014. The values of API recorded at the selected monitoring station are provided by the Department of Environment of Malaysia. The total number of observations in this study is 1096. The value of API of less than 100 denotes a good air quality, while a of API greater than 100 indicates a higher degree of air pollution. The classification of states is made based on the breakpoints for API of 50, 100, 200, 300, and 300+, corresponding to good, moderate, unhealthy, very unhealthy, and hazardous states, respectively, as shown in Table 1 [3,45].

Proposed Model
In this section, the simplified arithmetic operations proposed by Chen [21] and Tsaur [28] are used in the proposed algorithm (see Figure 3). The steps of the proposed model can be described as follows:

Proposed Model
In this section, the simplified arithmetic operations proposed by Chen [21] and Tsaur [28] are used in the proposed algorithm (see Figure 3). The steps of the proposed model can be described as follows: Step 1. Define the universe of discourse ( ) from the available time-series data, by using the formula where and denote the minimum and the maximum value in the universe of discourse respectively, and represent positive values. Step 2. Partition U for the observed data using the grid partition method [21,29] based on a different number of partitions, which are 5, 6, 7, 8, …, 50. But to avoid the redundancy, we present only 5, 10,15,20,25,30,35,40, and 45 numbers of partitions to determine the optimal partition number of partitions of the universe that can improve the model accuracy.
Step 4. Fuzzify the observations into fuzzy numbers based on the maximum membership value.
Step 5. Construct the fuzzy logical relationships (FLRs) and establish fuzzy logical relation groups (FLRGs) to build frequencies (count) matrix of fuzzy relation between observations.
Step 6. Generate the Markov weights (transition probability matrix) based on the frequencies of the established (FLRGs) in Step 5. The total number of states is according to the total number of fuzzy Step 1. Define the universe of discourse (U) from the available time-series data, by using the formula where D min and D max denote the minimum and the maximum value in the universe of discourse U respectively, D 1 and D 2 represent positive values.
Step 2. Partition U for the observed data using the grid partition method [21,29] based on a different number of partitions, which are 5, 6, 7, 8, . . . , 50. But to avoid the redundancy, we present only 5,10,15,20,25,30,35,40, and 45 numbers of partitions to determine the optimal partition number of partitions of the universe that can improve the model accuracy.
Step 3. Define the fuzzy sets A i on U using the following equation Symmetry 2020, 12, 293 Step 4. Fuzzify the observations into fuzzy numbers based on the maximum membership value.
Step 5. Construct the fuzzy logical relationships (FLRs) and establish fuzzy logical relation groups (FLRGs) to build frequencies (count) matrix of fuzzy relation between observations. Step 6. Generate the Markov weights (transition probability matrix) based on the frequencies of the established (FLRGs) in Step 5. The total number of states is n according to the total number of fuzzy sets. Thus, the matrix P is P n×n . State transition probability P ij , from state A i to state A j . In other words, P ij is the probability of observing y t+1 given y t , i.e., P ij = Pr(y t+1 = j y t = i), which can be calculated as follows where N ij is the number of transitions from state A i to state A j , and N i. is the total number of transitions in state A i . The transition probability matrix P is given as where P ij ≥ 0 and n j=1 P ij = 1.
Step 7. Calculate the forecasted values. The following rules are considered in calculating the forecasts.
Rule 1. In the case of the fuzzy logical relationship group of A i is one-to-one, in which there only one transition for A i (i.e., A i → A k , with P ik = 1 and P ij = 0, j k), then the forecasting of F(t) is m k , the midpoint of u k , k = 1, 2, n, which can be calculated according to Equation (5) below Rule 2. In the case of the fuzzy logical relationship group of Ai is one-to-many, in which there are more than one transition for Ai (i.e., Ai → A1, A2, . . .An, i = 1, 2, . . , n) . Thus, if the state is A i for the actual value Y(t) at time t, the forecasted value F(t + 1) can be determined by using Equation (6) below where m 1 , m 2 , . . . , m n are the midpoint of u 1 , u 2 , . . . , u n and m i is replaced by Y(t) for having information further from the state A i at time t.
Step 8. Adjust the forecasted values by adding the differences of actual values Y(t), which can adjust the forecasted values to reduce the estimated error. The adjusted forecasted values can be written byF Step 9. Validate the model.

Model Validation
The statistical criteria used to evaluate models are MAPE, RMSE, and Thiels' U statistic, which are defined in Equations (8)-(10), respectively, where Y i means the real data, F i the forecasted values, and N is the total number of observations. The universe of discourse U is partitioned based on the grid

The Implementation of the Algorithm
In this section, we will provide a result of the proposed model using the daily API data, whose plots for training data and testing data are given in Figure 4, respectively.

The Implementation of the Algorithm
In this section, we will provide a result of the proposed model using the daily API data, whose plots for training data and testing data are given in Figure 4, respectively. Step 2. Partitioning based on different numbers of partitions from 1 to 50. However, to prevent the redundancy where it will be too long, we have only mentioned numbers 5, 10, 15, …, 30 to present the partitioning as shown in Figure 5.
Step 3. Fuzzy sets are defined. Fuzzy sets are determined based on the intervals that already have formed using the grid method in the previous step with the function membership. Then, the fuzzy sets can be written as follows using Equation (2). Table 1 reveals the fuzzy sets , ( = 1, 2, . . , ). The greater the value of indicates that the fuzzy set of API values will move from the lowest to the highest fuzzy set of API values.
Step 4. Transform APIs values into fizzy numbers and find the fuzzy logic relationships (FLRs), as shown in Table 2. Step 2. Partitioning U based on different numbers of partitions from 1 to 50. However, to prevent the redundancy where it will be too long, we have only mentioned numbers 5, 10, 15, . . . , 30 to present the partitioning as shown in Figure 5.
Step 3. Fuzzy sets are defined. Fuzzy sets A k are determined based on the intervals u k that already have formed using the grid method in the previous step with the function membership. Then, the fuzzy sets A k can be written as follows using Equation (2). Table 1 reveals the fuzzy sets A i , (i = 1, 2, , n). The greater the value of i indicates that the fuzzy set of API values will move from the lowest to the highest fuzzy set of API values.
Step 4. Transform APIs values into fizzy numbers and find the fuzzy logic relationships (FLRs), as shown in Table 2.
Symmetry 2020, 11, x FOR PEER REVIEW 8 of 18  Step 5. Fuzzy logical relationships (FLRs) are determined, and frequencies (count) matrix of fuzzy relation between observations are determined. This step shows the FLRGs can be grouped into the fuzzy logic relationship groups (FLRGs).   Step 5. Fuzzy logical relationships (FLRs) are determined, and frequencies (count) matrix of fuzzy relation between observations are determined. This step shows the FLRGs can be grouped into the fuzzy logic relationship groups (FLRGs).
It can be seen from Table 4 that thirteen groups of the FTS values are presented, which is found with several FLRs. From Table 4, transition frequency matrix or frequencies (count) matrix of fuzzy relation between observations can be determined, which could be a matrix N 30×30 .
Step 6. Assign the Markov weights based on the matrix of frequencies from Step 5 by using Equation (4), as shown in Table 4. Then, transition process diagram could be established using the weights to visualize the Markov weighted Matrix. Table 5 demonstrates the number of transitions of the FTS and Markov weight elements for each group. The obtained Markov weights using the grid partition method can be used for establishing the transition probability matrix P 30×30 , which can be used for calculating the forecasting values in the next step. For instance, in the case of FLRG, it is A 8 → A 6 , A8. Then, value y 8 6 = 2 and y 8 8 = 2. Thus, p 8 6 = 1/2 and p 8 8 = 1/2, otherwise p 8 j = 0.   (1) Step 7. Calculate the forecasted values by using Equation (5) or (6) based on Markov weights. For example, the forecast value for the day (2012/1/2) is calculated by using Equation (6).

Markov Weights Elements for Each
Step 8. The forecasted values are adjusted by using Equation (7). For example, in Step 7, we have found the forecast value is 56.66.

Model Evaluation
In this section, fitting the optimal number of partitions of the universe of discourse has been presented. In addition, to validate the proposed model, a comparison of the proposed with some existing models is provided.

Fitting the Optimal partItion Number of the Universe of Discourse
In this section, investigating the appropriate number of partitions has been done using numbers from 5 to 50 (see Table A2 in Appendix A). However, to avoid the redundancy, we present only numbers 5, 10, 15, . . . , 45. It can be seen from Table 6 and Figure 6 that the best number of partitions of API data is 30 intervals, which indicates that the proposed model produced the smallest value of MAPE, RMSE, and Theils U. This implies that the proposed model provides the best forecasting accuracy using this number of partitions as compared to the other number of partitions of APIs in terms of training and testing dataset. the other petition numbers. This indicates that the proposed model produces accurate predicting results of the air pollution index.

Model's Validation
For validating the proposed model, the testing and training dataset of the APIs are used to evaluate the model performance and compare it with some of the famous existing models. Particularly, we introduce a comparison between the FTSMC model based on the optimal number of partitions and conventional FTS models that were proposed by Song & Chissom [18], Chen [21], Cheng [47], and Severiano et al. [48], which are FTS [18], CFTS [21], TWFTS [47], and HOFTS [48], respectively, to examine the performance of the proposed model. It can be seen from Table 7 and Figure 8 that the performance of the proposed model using the training dataset is very good. It has been performed with the smallest values of RMSE, MAPE, and U statistic as compared to other forecasting models. Thus, the proposed model outperformed the other forecasting models. This indicates that the proposed model is a powerful model for predicting air pollution occurrences. In addition, it could be seen from Table 8 and Figure 9 that the proposed model, using the testing dataset, outperformed the other FTS models. This implies that the model can produce a better forecasting accuracy of air pollution, which indicates that the proposed model can be modeled very well using any sort of time series. More specifically, Figure 7 shows that the proposed model using the best number of partitions provides greatly improved performance in air pollution index prediction accuracy compared with the other petition numbers. This indicates that the proposed model produces accurate predicting results of the air pollution index.

Model's Validation
For validating the proposed model, the testing and training dataset of the APIs are used to evaluate the model performance and compare it with some of the famous existing models. Particularly, we introduce a comparison between the FTSMC model based on the optimal number of partitions and conventional FTS models that were proposed by Song & Chissom [18], Chen [21], Cheng [47], and Severiano et al. [48], which are FTS [18], CFTS [21], TWFTS [47], and HOFTS [48], respectively, to examine the performance of the proposed model. It can be seen from Table 7 and Figure 8 that the performance of the proposed model using the training dataset is very good. It has been performed with the smallest values of RMSE, MAPE, and U statistic as compared to other forecasting models. Thus, the proposed model outperformed the other forecasting models. This indicates that the proposed model is a powerful model for predicting air pollution occurrences. In addition, it could be seen from Table 8 and Figure 9 that the proposed model, using the testing dataset, outperformed the other FTS models. This implies that the model can produce a better forecasting accuracy of air pollution, which indicates that the proposed model can be modeled very well using any sort of time series.

Model's Validation
For validating the proposed model, the testing and training dataset of the APIs are used to evaluate the model performance and compare it with some of the famous existing models. Particularly, we introduce a comparison between the FTSMC model based on the optimal number of partitions and conventional FTS models that were proposed by Song & Chissom [18], Chen [21], Cheng [47], and Severiano et al. [48], which are FTS [18], CFTS [21], TWFTS [47], and HOFTS [48], respectively, to examine the performance of the proposed model. It can be seen from Table 7 and Figure 8 that the performance of the proposed model using the training dataset is very good. It has been performed with the smallest values of RMSE, MAPE, and U statistic as compared to other forecasting models. Thus, the proposed model outperformed the other forecasting models. This indicates that the proposed model is a powerful model for predicting air pollution occurrences. In addition, it could be seen from Table 8 and Figure 9 that the proposed model, using the testing dataset, outperformed the other FTS models. This implies that the model can produce a better forecasting accuracy of air pollution, which indicates that the proposed model can be modeled very well using any sort of time series.   Figure 8. Comparison of the proposed model using training dataset with some FTS models proposed by Song and Chissom [18], Chen [21], Cheng [47], and Severiano et al. [48].  Figure 9. Comparison of the proposed model using the testing dataset with some FTS models proposed by Song and Chissom [18], Chen [21], Cheng [47], and Severiano et al. [48].
In addition, a comparison of the proposed model and some of the famous existing time series models are presented for further validation of our model. The time series models are ARMA [1], Figure 8. Comparison of the proposed model using training dataset with some FTS models proposed by Song and Chissom [18], Chen [21], Cheng [47], and Severiano et al. [48].   Figure 8. Comparison of the proposed model using training dataset with some FTS models proposed by Song and Chissom [18], Chen [21], Cheng [47], and Severiano et al. [48].  Figure 9. Comparison of the proposed model using the testing dataset with some FTS models proposed by Song and Chissom [18], Chen [21], Cheng [47], and Severiano et al. [48].
In addition, a comparison of the proposed model and some of the famous existing time series models are presented for further validation of our model. The time series models are ARMA [1], Figure 9. Comparison of the proposed model using the testing dataset with some FTS models proposed by Song and Chissom [18], Chen [21], Cheng [47], and Severiano et al. [48].
In addition, a comparison of the proposed model and some of the famous existing time series models are presented for further validation of our model. The time series models are ARMA [1], ARIMA [5,49], exponential smoothing [49], SARIMA [50], autoregressive conditional heteroskedasticity (ARCH) [51], GARCH [51], Markov chain [3], and fuzzy-ARIMA [52]. The evaluation of the model has been done based on the Akaike information criterion (AIC) [53] and Bayesian information criteria [54] using Equations (11) and (12), respectively, which are the common goodness of fit criteria for selecting the best time series models. AIC = 2k − r ln(L) (11) where r is the number of observations, and k is the number of parameters used in models, and L = L θ is the maximum value of the likelihood function of the model, which can stand for mean square error (MSE). It could be seen from Table 9 that the proposed model produced the smallest values of AIC and BIC compare to the other models. This indicates that the proposed model outperformed the other models; thus, it is an adequate model, and it could provide an accurate forecast of air pollution. Furthermore, air pollution forecasting is based on daily API concentrations [3,39,45]. According to the time series lag test, as shown in Figure 10, we can effectively develop the performance of the fuzzy time-series Markov chain forecasting model. Based on different testing periods, the time lags of the API time series are not the same. ARIMA [5,49], exponential smoothing [49], SARIMA [50], autoregressive conditional heteroskedasticity (ARCH) [51], GARCH [51], Markov chain [3], and fuzzy-ARIMA [52]. The evaluation of the model has been done based on the Akaike information criterion (AIC) [53] and Bayesian information criteria [54] using Equations (11) and (12), respectively, which are the common goodness of fit criteria for selecting the best time series models.
= ln( ) − ln ( ) where is the number of observations, and is the number of parameters used in models, and = ( ) is the maximum value of the likelihood function of the model, which can stand for mean square error ( ). It could be seen from Table 9 that the proposed model produced the smallest values of AIC and BIC compare to the other models. This indicates that the proposed model outperformed the other models; thus, it is an adequate model, and it could provide an accurate forecast of air pollution. Furthermore, air pollution forecasting is based on daily API concentrations [3,39,45]. According to the time series lag test, as shown in Figure 10, we can effectively develop the performance of the fuzzy time-series Markov chain forecasting model. Based on different testing periods, the time lags of the API time series are not the same.  Figure 10. Training data lag test. Figure 10. Training data lag test.

Conclusions
This study proposed the FTSMC model based on the optimal partition number for forecasting the air pollution in Malaysia using daily API data gathered from Klang for a period of three years. In this study, the Markov weights of the fuzzy logical relationships (FLRs) in the FLRG have been calculated based on the Markov transition probability. The grid partition method has been used to determine the optimal partition number of U. Then, the evaluation of the proposed model has been performed using a different number of partitions, which is chosen in order to avoid the arbitrary choosing of intervals. This is considered the first study that has ever properly defined the number of partitions in the FTSMC model. Although, the optimal number of partitions could be developing the model performance. In the proposed forecasting method, fitting the optimal number of partitions provided an improvement in the forecasting accuracy. In forecasting the daily API data, it shows that the proposed model has produced a higher prediction accuracy as compared to some FTS models. This indicated that the model could be used for forecasting air pollution data, in addition to various time-series data. For future studies, the proposed model could be performed to provide accurate results of air pollution for the sub-index variables such as PM2.5, PM10, O3, SO2, NO2, and CO, including the weather factors, such as wind speed and temperature, to provide a comprehensive exanimation of the air pollution problem. In addition, the proposed model can be developed by utilizing optimization methods such as Kalman filter, topology method, and Bayesian method, which are recommended to be employed in future works to provide an accurate forecasting model to predict air pollution. Additionally, it can be developed by combining the model with clustering and machine learning techniques in order to improve the model forecasting accuracy.  Acknowledgments: The authors are grateful to University Teknologi PETRONAS for providing financial support and good facilities. Additionally, they are thankful to the Department of Environment Malaysia for providing the air pollution data.

Conflicts of Interest:
The authors declare no conflict of interest. Table A1. List of symbols and abbreviations in the study.

A i Fuzzy set U
Universe of discourse D min The minimum value in the universe of discourse U D max The maximum value in the universe of discourse U D 1 Positive value D 2 Positive value f A i Membership function of fuzzy set u i Linguistic intervals F(t) Fuzzy time series at time t FLR Fuzzy logical relationships FLRGs Fuzzy logical relationship groups m k Midpoints of the linguistic intervals u i P ij Transition probability N ij Number of transitions N i.
Total number of transitions P Transition probability matrix Y(t) Actual value