1. Introduction
The air pollution problem has long been the centre of discussions all over the world. This issue is particularly alarming for the urban areas of developed and developing countries [
1,
2,
3]. In Malaysia, air pollution is generally determined from five major types of pollutants, namely, carbon monoxide (CO), suspended particulate matter (PM
10), sulphur dioxide (SO
2), ozone (O
3), and nitrogen dioxide (NO
2) [
4,
5]. These five pollutant variables are observed simultaneously and monitored constantly to provide relevant information about air quality [
6]. As these pollutants are simultaneously being observed and monitored, the investigation into the relationship and correlation among these five pollutant variables is expected to contribute to a comprehensive understanding of the behaviour of air pollution events at any particular area. For example, Marković et al. [
7] have investigated the behaviour of CO, O
3, NO
2, SO
2, and PM
10 in the Belgrade urban area during the autumnal period of 2005. They reported the existence of a positive correlation between the PM
10 SO
2, NO
2, and CO. On the contrary, the O
3 variable was found to indicate a negative correlation with SO
2, NO
2, PM
10 and CO. With the basis of the correlation measured, they contended that SO
2, NO
2, CO, and PM
10 could originate from similar sources. Rich et al. [
8] found a high association between myocardial infarction disease and particulate matter concentration in the presence of other pollutants, such as O
3, CO, SO
2, and NO
2. The analysis by Xie et al. [
9] for 31 Chinese cities showed the following: the correlation between particulate matter and NO
2 and SO
2 is either high or moderate, the correlation between particulate matter and CO is diverse, and the correlation between particulate matter and O
3 is either weak or uncorrelated. These results varied spatio-temporally across all the cities.
Generally, the dependency between two or more pollutant variables is measured by the Pearson, Spearman, or Kendall correlation methods. These common methods are widely popular because of their computational simplicity and easily interpretable results. However, despite their simplicity, the Pearson correlation method can only measure the strength of the linear dependence between the underlying random variables. In fact, the Pearson correlation method fails to provide an accurate measurement if any of the variables involved do not follow a normal distribution [
10,
11]. Unfortunately, the distribution of air pollutant data rarely follows a normal distribution. Instead, they often follow fat tailed distributions, such as the extreme value distribution, especially during periods of extreme pollution events [
12,
13,
14]. However, the empirical measure of the Spearman or Kendall correlation method provides rank correlation measures which do not consider the properties of marginal distributions and time-varying properties among the random variables.
Apart from the common methods, several studies have considered a long-memory approach to investigate the temporal relationship and dependency behaviour of air pollutant data [
15]. For example, Lee [
16] showed the existence of scale-invariant behaviour in the air pollution concentration time series based on the concept of multifractal characteristics and long-term memory. Weng et al. [
17] found that the time series of the ozone in Southern Taiwan indicates a long and persistent memory process involving nonlinearity and fractal time series based on R/S analyses. In a similar vein, Liu et al. [
18] have shown the existence of persistence in the time-scaling behaviour for three pollution indices (SO
2, PM
10, NO
2) in Shanghai, China, using the method of detrended fluctuation analysis (DFA) and the multifractal approach. However, these methods are bound by some limitations. Particularly, they fail to integrate the behaviour of asymmetric co-movements and contagion effects that exist in the dynamic behaviour among variables. The existence of this behaviour could affect the results of the analysis [
19]. Thus, to overcome this problem, the method of dynamic conditional correlation (DCC) proposed by Engle [
20] and Engle and Colacito [
21] seems to be a good alternative. However, the data indicate the existence of extreme values and long tail properties; hence, an asymmetrical measure of a tail correlation may lead to biased estimates in DCC models [
22,
23].
As the probabilistic behaviour of extreme events can be effectively described using extreme value models, such as the generalized Pareto distribution (GPD), this study proposes a combination of a GPD model with copulas to establish a model that is able to describe the dependence structure between the PM10 variable and a set of four major pollutant variables, namely, CO, NO2, SO2, and O3. This study adopts the copula approach because it is flexible to use with various types of marginal distributions without being constrained by normality limitations. A copula model can easily provide accurate information on the joint distribution of several pollution variables. The Gaussian copula was used in this study to describe the behaviour of the dependence structure among the pollutant variables. To investigate the tail dependence among the pollutant variables, this work employed the symmetrized Joe–Clayton (SJC) copulas. The generalized autoregressive conditional heteroskedasticity (GARCH) model was also combined with the GPD model to improve the distribution modelling. This combination method integrates all the properties of fat tail behaviour, asymmetric co-movements, stochastic volatility, as well as the contagion effects that exist on the dynamic behaviour among variables.
7. Results and Discussion
A preliminary statistical analysis was carried out prior to the detailed discussion of the analysis results.
Figure 2 shows that the volatility of PM
10 is higher than those of the other pollutant variables.
Table 1 presents the descriptive statistics of all pollutant variables. The mean and median for PM
10 were found to be higher than those of the other pollutant variables. Hence, this pollutant exerts the greatest influence on the status of air quality. In addition, the standard deviation of PM
10 is the highest among all variables, thus implying that PM
10 presents more volatile behaviour than the other pollutant variables. NO
2 shows the lowest standard deviation and thus exhibits the most stable dynamic behaviour. The PM
10 pollutant also shows the highest of kurtosis value, which indicates that it appears most frequently in unhealthy pollution events. All the pollutant variables were found to have a positive skewness. A high positive skewness means that the pollutants exhibit long tailed behaviour to the right of the distribution data. The Jarque–Bera test results confirm that all the pollutant variables are not normally distributed. In particular, high Jarque–Bera test statistics are found for PM
10 and SO
2. The results in
Table 1 indicate that all the pollutant variables exhibit a nonlinear phenomenon and extreme behaviour. Thus, a model that functions under the normality assumption is not appropriate to use in representing these types of data.
Table 2 shows the results of the correlation among all the pollutant variables. The linear correlation values for the PM
10 pollutant in relation to the other pollutants range from −0.077 to 0.684. The PM
10 pollutant-related pairs are strongly correlated with CO, followed by O
3, NO
2, and SO
2. The NO
2 pollutant is found to have a moderate correlation with the CO pollutant. For the other pairs of pollutant variables, they present a positive low correlation. However, the measurement of linear correlation may generate misleading results because it assumes that the pairs of data share linearity properties. Moreover, linear correlation measures generally work well for pairs of data that satisfy a normality assumption. For most air pollutant data, the existence of skewness cannot be neglected, as presented in the results in
Table 1. A linear correlation measure also indicates a constant correlation behaviour over time for each pollutant variable. However, one of the most important characteristics of air pollution data is that they always fluctuate with volatility behaviour over time. Particularly during unhealthy air pollution events, some of the pollution variables can increase significantly and thereby correspond to extreme values. These behaviours are clearly depicted in
Figure 2. Thus, we believe that a linear correlation measure may not be a reliable tool for describing the relationship among pollutant variables. As mentioned previously, we address these issues by proposing a time-varying copula with a combination of the GPD and GARCH models.
In the analysis, the parameters for the tails of each pollutant variable are estimated on the basis of the GPD model.
Table 3 shows the estimated parameters describing the upper and lower tail behaviour of the data of each pollutant variable. The sign of the shape parameter shows how fast the tail decreases. A negative value indicates that the tail is finite, whereas a positive value indicates that the tail decreases as a polynomial. The higher the absolute value of the shape parameter is, the heavier the tail distribution will be. As shown in
Table 3, for the lower tails, all pollutant variables are found to have a negative estimate of parameter
ξ. This result indicates that all pollution variables exhibit finite short tail behaviour, with the PM
10 pollutant exhibiting the lowest value of
ξ (−0.0626). For the upper tails, the CO, NO
2, and SO
2 pollutants exhibit short tail behaviour. The PM
10 pollutant is found to have the highest positive value
ξ = 0.1142, followed by the SO
2 pollutant with
ξ = 0.0179. These values indicate that the PM
10 and SO
2 pollutants demonstrate heavy upper tail behaviour. These results indicate that all pollutant variables present the same pattern for a low API at a particular time. In the case of extreme pollution events, the PM
10 variable is most likely to have the highest API value among all pollution variables at a particular time. Hence, the PM
10 pollutant is highly volatile, particularly at the upside of its distribution.
Figure 3 shows a comparison of the empirical and GPD plots for the cumulative distribution function (CDF) of the exceedance of the residuals in the upper tail of each pollutant variable. The fitted GPD closely follows the exceedance of the residuals, although only 10% of the standardized residuals were employed. Thus, the GPD model is a suitable choice for the upper tail data of each pollutant. This result is also valid for the lower tails. On the basis of the GPD model, the overall CDF of the semi-parametric models is obtained (
Figure 4). In particular, the lower and upper tails are determined by the fitted GPD model while the interior part is estimated by the GARCH model.
The results shown in
Table 3 and
Figure 3 and
Figure 4 demonstrate that our approach can adequately model the marginal distributions of the time series data of the air pollutant variables in Malaysia. However, the estimated marginal distribution alone is not able to describe the dependence structure that exists among the pollutant variables. Thus, a copula needs to be adopted in the model. As mentioned previously, the Gaussian and SJC copulas are employed in this study. Given the available properties on each copula model, the Gaussian copula is useful in assessing the overall dependence structure of pollutant variables, as described in Equation (10). For the SJC copula model, it is useful in exploring the dependence behaviour of the lower and upper tails, as described in Equations (14) and (15).
Table 4 shows the results of the parameter estimates for the Gaussian and SJC copulas. The most important parameters for the constant copula are determined by δ for the Gaussian copula and by the
and
parameters for the SJC copula. For the time-varying copulas, the important parameters for the Gaussian copula are ω, α, and β; and those for the SJC copula are
,
,
,
,
and
. As shown in
Table 3, the PM
10 variable is likely to have the highest API value among all pollution variables at a particular time. Thus, investigating the dynamic dependency of PM1
0 on other pollutant variables could yield informative results. In this regard, the parameters of
,
, and
are useful in describing the magnitude of dependence between PM
10 and other pollutant variables. The adjustment in the dependence measure is captured by the parameters of
,
and
. Moreover, the parameters
,
and
are useful to represent the degree of the persistence of the dependence.
The best fitting copula model is determined by goodness-of-fit measures, namely, Akaike’s information criterion (AIC) and Bayesian information criterion (BIC). The lowest value determined by the AIC or BIC indicates a well fitted model (
Table 4). Two important points can be derived from the information provided in
Table 4 and
Figure 5,
Figure 6,
Figure 7 and
Figure 8. First, the overall dependence (δ) between PM
10 and the other pollutants can be ranked in decreasing order as PM
10–CO > PM
10–SO
2 > PM
10–NO
2 > PM
10–O
3. Second, the rank in decreasing order for the upper tail (
) and lower tail (
) dependence between PM
10 and the other pollutants is found to be same as that of the overall dependence, that is, PM
10–CO > PM
10–SO
2 > PM
10–NO
2 > PM
10–O
3. This ranking implies that the dynamic fluctuations between PM
10 and CO over time have the strongest overall dependency and tail dependency; they are followed by the relationship of PM
10 with SO
2 and by PM
10 with NO
2. The weakest overall dependency and tail dependency are those between PM
10 and O
3. Thus, we can conclude that CO has the greatest influence on the behaviour of PM
10 among all the pollutants in a normal air quality, good air quality (lower tail), or bad air quality (upper tail). The results in
Table 4 also show that the value of
is larger than that of
for all possible pairs. Hence, the upper tail dependency between PM
10 and the other pollutants is stronger than the lower tail dependency.
To further investigate the dynamic dependence structure among the pollutant variables, we present in
Figure 5,
Figure 6,
Figure 7 and
Figure 8 the time paths for the overall, lower, and upper tail dependencies based on the Gaussian and SJC time-varying copulas. In each figure, the red dashed line denotes the dependence parameter for the constant copula while the solid blue line denotes the dynamic parameter values under the time-varying copula. As mentioned previously, the time variations of the dependency measures between the pollutant variables are described by the parameters of
,
,
,
,
and
. Then, as shown in
Table 4, the results of the time-varying Gaussian copula model reveal that most of the time paths are close to a white noise series as the values of the variation coefficient α are relatively higher than those of the persistence coefficient
β, except for the PM
10–NO
2 relationship. For the time-varying SJC copula, all the persistence coefficients
β are larger than the coefficients α for all pairs. This result gives some insights into the changes of the dependence structure for the upper and lower tails over a time period.
For the PM
10–CO pair (
Figure 5), the parameters realized for the overall time-varying (
) dependence are found to be in the mean of 0.6 and range of 0.5–0.7. For a lower tail dependence, the time-varying parameters (
) fluctuate around the mean of 0.35 and range from 0.2 to 0.4. For an upper tail dependence, the time-varying parameters (
) fluctuate around the mean of 0.42 and range from 0.39 to 0.42. The volatility properties of the time-varying parameters of the upper tail dependence are more stable than those of the overall and lower tail dependencies. They fluctuate around the mean of 0.6 and range from 0.5 to 0.7. Thus, the overall time-varying dependence for PM
10–CO is quite high. For the lower tail and upper tail, their time-varying dependencies are relatively moderate. The dependence path of the time-varying copula for PM
10–NO
2 is shown in
Figure 6. The overall time-varying properties of the dependence parameters (
) are found to be in the range of 0 to 0.5 with a mean of 0.35. Meanwhile, the time-varying lower dependence (
) fluctuates between 0 and 0.3 with a mean of 0.13. The time-varying upper dependence (
) fluctuates between 0.02 and 0.25 with a mean of 0.12. The fluctuations of the lower and upper dependency behaviour are more volatile than those of the overall time-varying dependence parameters (
). Thus, we can conclude that the time-varying overall, lower tail, and upper tail dependencies for PM
10–NO
2 are low.
As mentioned previously,
Figure 7 shows the dependence path of the time-varying copula for the PM
10–O
3 pair. The overall dependence parameters (
) are found to be more volatile than those of the time-varying lower dependence (
) and upper dependence (
). The values of the parameter fluctuate around the mean of 0.12 within a range of 0.05–0.19. The parameters for the time-varying lower dependence (
) and upper dependence (
) fluctuate around a very low mean and very low range, except for some early points of the lower tail dependency before the year 2004. However, in general, these values are considerably lower than those for the PM
10–CO and PM
10–NO
2 pairs. Thus, we can conclude that the behaviour of the time-varying overall, lower tail, and upper tail dependencies for PM
10–O
3 are low. The dependency behaviour of the PM10–NO2 pair is shown in
Figure 8. The values of the time-varying overall dependence parameter (
), lower dependence (
), and upper dependence (
) indicate similar volatilities in the range of 0–0.5. The mean of the overall dependence parameter is about 0.36, which is higher than the mean of the lower and upper dependence parameters. These values indicate the low magnitudes of the overall, lower tail, and upper tail dependencies for the PM
10–NO
2 pair. The results shown in
Figure 5,
Figure 6,
Figure 7 and
Figure 8 also show that the measure based on the constant copula (red dashed line) is not sufficient to describe the dependency behaviour among the pollutant variables over time. Nevertheless, the constant measure of dependency is found to provide a good approximation of the means of the fluctuations of the plots for overall dependence, lower tail dependence, or upper tail dependence (except for the lower and upper tails of PM
10–O
3 pair).