# Analysis of Particulate Matter (PM10) Behavior in the Caribbean Area Using a Coupled SARIMA-GARCH Model

## Abstract

## 1. Introduction

## 2. Materials and Methods

#### 2.1. Source of Experimental Data

#### 2.2. Data Processing

#### 2.3. Statistical Criterion

#### 2.4. SARIMA Model

#### 2.5. GARCH Model

- ${\mathcal{F}}_{t-1}$ denotes a filtering of all information of the process ${\zeta}_{t}$ until the time $t-1$;
- ${Z}_{t}\sim i.i.d.WN(0,1)$;
- for any time $t,\phantom{\rule{2.84526pt}{0ex}}{\sigma}_{t}^{2}$ denotes the conditional variance as a function of ${\zeta}_{t}^{2}$, which we are trying to simulate.

#### 2.6. Indexes of Forecast Evaluation

## 3. Results and Discussion

#### 3.1. PM10 Descriptive Statistics

^{−16}. The coefficient of determination ${R}^{2}$ in the first case was 37.19%; this indicates that 37.19% of the variation in the dependent variable was represented by the model. The Spearman’s correlation coefficient $\rho $ was 0.5084, i.e., 50.84%: there is therefore a positive relationship of average intensity between PM10GPE and PM10PR variables. In the second case, we have ${R}^{2}=38.94\%$ and $\rho =50.84\%$. These percentages prove once again that when a large-scale event occurs in GPE, it is felt at PR.

#### 3.2. Chronogram and Decomposition of PM10 Data

**b**) of it can be seen as a sum of:

- a trend that is the average behavior of the two data series, i.e., their evolution over the long term. This trend is characterized by a linear increase or decrease at irregular intervals. Each time process shows a monotonic behavior from one year to another;
- a seasonal component (cycle) that corresponds to different cases where PM10 phenomenon repeats at regular or periodic intervals. Here, the period is intra-year; strong pollution peaks are observed in the middle of each year;
- a random component or noise or residual. This corresponds to low-intensity fluctuations of a stochastic nature and is part of the disturbing elements [50].

#### 3.3. Analysis of Seasonal Effects

#### 3.4. Selection of the PM10 Model

- The expression of the stochastic process $\left({X}_{t}\right)$ describing PM10 in GPE and following a $SARIMA(1,0,3)(0,1,0)\left[365\right]$ is written:$$(1-0.8275B)(1-{B}^{365}){X}_{t}=(1-0.2186B-0.2338{B}^{2}-0.0931{B}^{3}){\zeta}_{t}$$Equivalently,$${X}_{t}=0.8275{X}_{t-1}+{X}_{t-365}-0.8275{X}_{t-366}-0.2186{\zeta}_{t-1}-0.2338{\zeta}_{t-2}-0.0931{\zeta}_{t-3}+{\zeta}_{t}$$
- The background sources of the PM10 at PR can be described by the stochastic process $\left({Y}_{t}\right)$ following a $SARIMA(0,0,5)(0,1,0)\left[365\right]$. The mathematical expression of ${Y}_{t}$ is then written:$$(1-{B}^{365}){Y}_{t}=(1+0.5947B+0.2878{B}^{2}+0.1457{B}^{3}+0.1112{B}^{4}+0.0959{B}^{5}){\u03f5}_{t}\phantom{\rule{0ex}{0ex}}\Updownarrow $$$${Y}_{t}={Y}_{t-365}+0.5947{\u03f5}_{t-1}+0.2878{\u03f5}_{t-2}+0.1457{\u03f5}_{t-3}+0.1112{\u03f5}_{t-4}+0.0959{\u03f5}_{t-5}+{\u03f5}_{t}$$

#### 3.5. Stationarity of Each PM10 Model

#### 3.6. Dynamics of Conditional Heteroskedasticity of the Residuals from the SARIMA Model

#### 3.7. Forecasting of the PM10 Model

## 4. Conclusions

## Abbreviations

ACF | Autocorrelation Function |

AIC | Akaike Information Criterion |

$AI{C}_{c}$ | Akaike Information Criterion corrected |

ARIMA | Autoregressive Integrated Moving Average |

GARCH | Generalized Autoregressive Conditional Heteroskedasticity |

MAPE | Mean Absolute Percentage Error |

MASE | Mean Absolute Scaled Error |

PM10GPE | Particulate Matter (diameter $10\phantom{\rule{1.42262pt}{0ex}}\mathsf{\mu}$m or less) in Guadeloupe |

PM10PR | Particulate Matter (diameter $10\phantom{\rule{1.42262pt}{0ex}}\mathsf{\mu}$m or less) at Puerto Rico |

p-$value\phantom{\rule{2.84526pt}{0ex}}\left(PR\right(>\left|t\right|\left)\right)$ | Critical probability to reject a null hypothesis |

SARFIMA | Seasonal Autoregressive Fractionally Integrated Moving Average |

SARIMA | Seasonal Autoregressive Integrated Moving Average |

Std. Error | Standard Error |

t-test | t-value of Student’s test |

${U}_{1},{U}_{2}$ | Theil’s U statistic presented in its two forms |

VARMA | Vector Autoregressive Moving Average |

WN | White Noise |

## Appendix A

#### Appendix A.1. A Visual Look at the Symmetry and Normality of PM10 Data in Guadeloupe and Puerto Rico

**Figure A1.**Histogram and Normal Q-Q Plot curve of PM10 from 2006 to 2010 in (

**a**) GPE and (

**b**) PR before and after the Box–Cox transformation.

#### Appendix A.2. Results of Some Statistical Tests with

#### Appendix A.3. PM10 Values Measured and Modeled by the SARIMA Model

#### Appendix A.4. Forecast of the Coupled SARIMA-GARCH Model of PM10 Data in GPE and PR

**Figure A4.**Forecasts on the horizon $h=365$ of the SARIMA-GARCH model of PM10 data in (

**a**) GPE and (

**b**) PR.

**Figure 1.**Overview of the Caribbean area with the location of Guadeloupe archipelago (16.25° N, −61.58° E; GPE in orange) and Puerto-Rico (18.23° N, −66.50° E; PR in yellow).

**Figure 4.**Scatterplot and polynomial regression curve in the relationship where (

**a**) (resp. (

**b**)) PM10PR (resp. PM10GPE) is a function of PM10GPE (resp. PM10PR).

**Figure 5.**Daily evolution of PM10 concentrations at (

**a**) GPE and (

**b**) PR from 2006 to 2010. The red horizontal dashed line shows the Air Quality Guideline for 24 h mean PM10 concentrations at 50 $\mathsf{\mu}$g/m${}^{3}$ [66]. The black curve represents the smoothed moving average series calculated by taking $p=24$.

**Figure 7.**Autocorrelation function (ACF) of PM10 data at (

**a**) GPE and (

**b**) PR from 2006 to 2010 before the Box–Cox transformation and the seasonal differentiation. The gray vertical lines of equation $v=365n,n\in \{1,2,3,4,5\}$ denote the offsets that are multiples of 365.

**Figure 8.**Chronogram of PM10 data at (

**a**) GPE and (

**b**) PR from 2006 to 2010 after the Box–Cox transformation and the seasonal differentiation.

**Figure 11.**Chronogram of PM10 SARIMA model residuals in (

**a**) GPE and (

**b**) PR from 2006 to 2010. Selected red and black portions denote locations with high and low residual error variability, respectively.

**Figure 12.**Chronogram of squared residuals of the SARIMA model of PM10 in (

**a**) GPE and (

**b**) PR from 2006 to 2010.

**Figure 14.**Shape for distribution of PM10 model residuals at (

**a**) GPE and (

**b**) PR from 2006 to 2010. The red curve is the residual error density, while the green curve is the normal distribution.

**Figure 15.**Collinearity between variance of residuals from the SARIMA model and computed variance of the GARCH model in (

**a**) GPE and (

**b**) (PR).

**Figure 16.**Variance of residuals from the SARIMA model against computed variance by the GARCH model in (

**a**) GPE and (

**b**) PR.

**Figure 17.**Forecasts of PM10 data series in (

**a**) GPE and (

**b**) PR after the Box–Cox transformation and seasonal differentiation. The light gray band represents the 80% confidence interval. The red curves contained in this region are the forecasts beyond 2010.

**Figure 19.**Actual (blue curve) and predicted (red curve) values plot using the coupled SARIMA-GARCH model for PM10 data transformed in (

**a**) GPE and (

**b**) PR.

Statistics | Before Correction | After Correction | ||
---|---|---|---|---|

$\underset{(\mathbf{n}=\mathbf{1789})}{\mathbf{PM}\mathbf{10}\mathbf{GPE}}$ | $\underset{(\mathbf{n}=\mathbf{1747})}{\mathbf{PM}\mathbf{10}\mathbf{PR}}$ | $\underset{(\mathbf{n}=\mathbf{1826})}{\mathbf{PM}\mathbf{10}\mathbf{GPE}}$ | $\underset{(\mathbf{n}=\mathbf{1826})}{\mathbf{PM}\mathbf{10}\mathbf{PR}}$ | |

Minimum | $4.00$ | $7.00$ | $4.00$ | $7.00$ |

First quartile | $17.00$ | $17.00$ | $17.00$ | $17.00$ |

Median | $21.00$ | $21.00$ | $21.00$ | $21.00$ |

Mean | $26.59$ | $25.54$ | $26.62$ | $25.65$ |

Third quartile | $30.00$ | $27.00$ | $30.00$ | $28.00$ |

Maximum | $164.00$ | $197.00$ | $164.00$ | $197.00$ |

Missing data | $37.00$ | $79.00$ | − | − |

Variance | $271.46$ | $263.52$ | $266.94$ | $258.13$ |

Standard deviation | $16.48$ | $16.23$ | $16.34$ | $16.07$ |

Coefficient of variation | $0.62$ | $0.64$ | $0.61$ | $0.63$ |

Skewness | $2.69$ | $3.61$ | $2.69$ | $3.57$ |

Kurtosis | $10.81$ | $19.78$ | $10.97$ | $19.67$ |

Regression | Coefficients | Estimate | Std. Error | t Value | Pr(>|t|) |
---|---|---|---|---|---|

1 | Intercept | $7.2143$ | $0.9453$ | $7.63$ | 3.71 × 10^{−14} |

PM10GPE | $0.7502$ | $0.0490$ | $15.32$ | <2.00 × 10^{−16} | |

$I\left(PM10GP{E}^{2}\right)$ | $-0.0016$ | $0.0005$ | $-3.38$ | $7.36$ × ^{10−4} | |

2 | Intercept | $5.0056$ | $0.9183$ | $5.45$ | $5.69$× 10^{−8} |

PM10PR | $0.9537$ | $0.0461$ | $20.71$ | <2.00 × 10^{−16} | |

$I\left(PM10P{R}^{2}\right)$ | $-0.0031$ | $0.0004$ | $-7.99$ | $2.29$ × 10^{−15} |

PM10GPE | PM10PR | ||
---|---|---|---|

Model | AIC | Model | AIC |

$SARIMA(3,0,1)(0,1,0)\left[365\right]$ | $-2964.79$ | $SARIMA(1,0,1)(0,1,0)\left[365\right]$ | $-3112.59$ |

$SARIMA(2,0,1)(0,1,0)\left[365\right]$ | $-2961.21$ | $SARIMA(2,0,1)(0,1,0)\left[365\right]$ | $-3108.94$ |

$SARIMA(4,0,2)(0,1,0)\left[365\right]$ | $-2961.61$ | $SARIMA(0,0,1)(0,1,0)\left[365\right]$ | $-3005.81$ |

$SARIMA(2,0,0)(0,1,0)\left[365\right]$ | $-2960.96$ | $SARIMA(1,0,2)(0,1,0)\left[365\right]$ | $-3109.79$ |

$SARIMA(1,0,3)(0,1,0)\left[365\right]$ | −2965.30
| $SARIMA(0,0,5)(0,1,0)\left[365\right]$ | −3114.69 |

**Table 4.**Estimation and significance of PM10 model parameters in GPE and PR; C.I.: Confidence Interval.

Model | Parameters | Estimate | Std. Error | t-Test | p-Value | Coefficient C.I. | |
---|---|---|---|---|---|---|---|

2.5% | 97.5% | ||||||

$SARIMA(1,0,3)(0,1,0)\left[365\right]$ | $AR1$ | $0.8275$ | $0.0933$ | $8.87$ | $0.000000$ | $0.6446$ | $1.0104$ |

$MA1$ | $-0.2186$ | $0.0989$ | $-2.21$ | $0.027081$ | $-0.4124$ | $-0.0248$ | |

$MA2$ | $-0.2338$ | $0.0698$ | $-3.35$ | $0.000811$ | $-0.0970$ | $-0.0970$ | |

$MA3$ | $-0.0931$ | $0.0463$ | $-2.01$ | $0.044123$ | $-0.0024$ | $-0.0024$ | |

$SARIMA(0,0,5)(0,1,0)\left[365\right]$ | $MA1$ | $0.5947$ | $0.0261$ | $22.75$ | $0.000000$ | $0.5435$ | $0.6460$ |

$MA2$ | $0.2878$ | $0.0304$ | $9.46$ | $0.000000$ | $0.2282$ | $0.3474$ | |

$MA3$ | $0.1457$ | $0.0313$ | $4.66$ | $0.000003$ | $0.0844$ | $0.2069$ | |

$MA4$ | $0.1112$ | $0.0295$ | $3.77$ | $0.000161$ | $0.0534$ | $0.1689$ | |

$MA5$ | $0.0959$ | $0.0259$ | $3.70$ | $0.000216$ | $0.0451$ | $0.1468$ |

**Table 5.**Normality, autocorrelation, and heteroskedasticity tests of residuals from the SARIMA model.

PM10GPE | PM10PR | |||||
---|---|---|---|---|---|---|

Hypothesis Tests | Statistics | Df | p-Value | Statistics | Df | p-Value |

Shapiro–Wilk | $0.97$ | NA | <2.2 $\times {10}^{-16}$ | $0.97$ | NA | <2.2 $\times {10}^{-16}$ |

Jarque–Bera | $302.24$ | 2 | <2.2 $\times {10}^{-16}$ | $175.77$ | 2 | <2.2 $\times {10}^{-16}$ |

ARCH LM-test | $175.46$ | 20 | <2.2 $\times {10}^{-16}$ | $114.48$ | 20 | <3.0 $\times {10}^{-15}$ |

PM10GPE | PM10PR | ||
---|---|---|---|

Model | AIC | Model | AIC |

$GARCH(1,0)$ | $-2.3038$ | $GARCH(1,0)$ | $-2.4006$ |

$GARCH(1,1)$ | −3.3754 | $GARCH(1,1)$ | −3.8294 |

$GARCH(2,0)$ | $-2.3381$ | $GARCH(2,0)$ | $-1.1025$ |

$GARCH(2,1)$ | $-3.3716$ | $GARCH(2,1)$ | $-3.8223$ |

$GARCH(2,2)$ | $-3.3714$ | $GARCH(2,2)$ | $-3.8229$ |

$GARCH(3,0)$ | $-2.7294$ | $GARCH(3,0)$ | $-3.0561$ |

$GARCH(3,1)$ | $-3.3710$ | $GARCH(3,1)$ | $-3.8239$ |

$GARCH(3,2)$ | $-3.3698$ | $GARCH(3,2)$ | $-3.8255$ |

PM10GPE | PM10PR | |||||||
---|---|---|---|---|---|---|---|---|

Parameters | Estimate | Std. Error | t Value | Pr(>|t|) | Estimate | Std. Error | t Value | Pr(>|t|) |

$\omega $ | 8.652 $\times {10}^{-7}$ | $0.000000$ | $2.18$ | $0.029506$ | 8.567 $\times {10}^{-8}$ | $0.000001$ | $0.06$ | $0.94943$ |

${a}_{1}$ | 1.622 $\times {10}^{-1}$ | $0.009482$ | $17.10$ | $0.000000$ | 1.951 $\times {10}^{-1}$ | $0.009912$ | $19.68$ | $0.00000$ |

${b}_{1}$ | 8.368 $\times {10}^{-1}$ | $0.008301$ | $100.81$ | $0.000000$ | 8.039 $\times {10}^{-1}$ | $0.008671$ | $92.71$ | $0.00000$ |

PM10GPE | PM10PR | |||||||||
---|---|---|---|---|---|---|---|---|---|---|

Models | $\underset{(\mathbf{data}\phantom{\rule{1.42262pt}{0ex}}\mathbf{point})}{\mathit{n}}$ | $\underset{(\%)}{\mathit{MAPE}}$ | MASE | ${\mathit{U}}_{1}$ | ${\mathit{U}}_{2}$ | $\underset{(\mathbf{data}\phantom{\rule{1.42262pt}{0ex}}\mathbf{point})}{\mathit{n}}$ | $\underset{(\%)}{\mathit{MAPE}}$ | MASE | ${\mathit{U}}_{1}$ | ${\mathit{U}}_{2}$ |

SARIMA | 350 | $3.743$ | $0.025$ | $0.083$ | $0.167$ | 350 | $2.312$ | $0.009$ | $0.034$ | $0.070$ |

GARCH | 365 | $134.238$ | $0.775$ | $0.556$ | $0.882$ | 365 | $141.817$ | $0.773$ | $0.551$ | $0.884$ |

Coupled SARIMA-GARCH | 350 | $15.127$ | $0.069$ | $0.045$ | $0.091$ | 337 | $2.396$ | $0.008$ | $0.034$ | $0.069$ |

