#### 2.3. Statistical Analysis

We proposed a two-stage Bayesian hierarchical spatio-temporal model (Model 3) to capture the spatial and temporal dynamics. Two competing models (Model 1, Model 2) were used to compare performances of the proposed model, Model 3. Mathematical expressions for Model 1 and Model 2 are as follows:

**Model 1:**$\mathrm{log}\left({\theta}_{it}\right)={\beta}_{0}+{\beta}_{1}{X}_{it}+{S}_{1}\left({Z}_{it}\right)+{S}_{2}\left({W}_{it}\right)+{S}_{3}\left({Q}_{it}\right)+\delta {D}_{i}$,

**Model 2**: $\mathrm{log}\left({\theta}_{it}\right)={\beta}_{0}+{\beta}_{1}{X}_{it}+{S}_{1}\left({Z}_{it}\right)+{S}_{2}\left({W}_{it}\right)+{S}_{3}\left({Q}_{it}\right)+\delta {D}_{i}+{u}_{i}+{v}_{i}+{k}_{t}+{l}_{t}+{\varphi}_{it}$.

The observed mortality for area $i$ and month $t$, ${y}_{it}$, followed a Poisson distribution with mean ${\theta}_{it}{N}_{it}$, where ${\theta}_{it}$ is the relative risk, and ${N}_{it}$ is the elderly population. The constant term ${\beta}_{0}$ indicates the intercept of the log of the relative risk that was common to all areas and months. Covariates ${X}_{it},{Z}_{it},{W}_{it}$, and ${Q}_{it}$ are the monthly average air pollutant concentrations of ${\mathrm{PM}}_{10}$ or ${\mathrm{PM}}_{2.5}$, temperature, humidity, and wind speed, respectively. The socio-economic covariate ${D}_{i}$ indicates the deprivation index. The smoothing function ${S}_{i}\left(\xb7\right),i=1,2,3$ denotes a natural cubic spline function to explain the nonlinear effects of meteorological variables on mortality. The degrees of freedom of the natural cubic spline were 3, 2, and 3 for temperature, humidity, and wind speed, respectively. The parameters ${\beta}_{1}\mathrm{and}\delta $ denote the regression coefficients ${X}_{it}$ and ${D}_{i}$, respectively. The random effects ${u}_{i}$ and ${v}_{i}$ are spatially uncorrelated and correlated terms, respectively, and ${k}_{t}$ and ${l}_{t}$ are temporally uncorrelated and correlated terms. Lastly, the random component ${\varphi}_{it}$ is the space-time interaction term.

To deal with the spatial confounding bias problem [

17] in Model 2, a two-stage model [

22] was considered. In the first stage, the Poisson regression model with only covariates (Model 1) was used. Using this model, we acquired the estimated relative risk

$\widehat{{\theta}_{it}}$, and the continuous-type residuals

$\widehat{{r}_{it}}$ were obtained from

To capture the extra spatial and temporal variations in the residuals, we considered the following model:

In the second stage, our model was expressed as follows, using $\widehat{{S}_{it}},$ the estimated ${S}_{it}$:

Regression coefficient estimation was performed at this stage.

In the Bayesian framework, we used non-informative prior distributions for the parameters. For the intercept

${\beta}_{0}$ and air pollutant coefficient

${\beta}_{1}$, we assumed normal distribution with zero 0 and variance 1,000,000, which is a fairly flat prior. For random effects, spatially and temporally uncorrelated terms had independent and identical normal distributions with zero mean hyperparameters

${\sigma}_{u}$ and

${\sigma}_{k}$. The spatially correlated random term

${v}_{i}$ had a conditional autoregressive (CAR) [

23] prior, and the temporally correlated random term

${l}_{t}$ had an autoregressive (AR)(1) prior. For the interaction term

${\varphi}_{it}$, we considered four of the types proposed in [

24]. The interaction term with different spatial trends for each time unit showed the best performance. Uniform distributions with lower bound 0 and upper bound 10 were specified for all hyperparameters.

The three models described above were fitted for the 2012–2014 data. Based on the results from the fitted models, we forecasted death counts for 2015 for all administrative regions in South Korea. We followed the forecasting scheme used in [

25].

Bayesian analyses were carried out using the WinBUGS statistical package [

26] Two parallel Monte Carlo Markov Chains (MCMC) were used with different initial values. To assess sample convergence, we utilized trace plots, auto-correlation plots, and the Gelman–Rubin statistic [

27]. After burn-in, we generated 2500 samples for each chain with thin 50, resulting in a total of 5000 posterior samples. Including the burn-in period, it took around 30 hours to obtain 5000 posterior samples with a CPU Intel Xeon gold 5118 2.3 GHz and RAM 32 GB computer. The open source software R [

28] was used to produce the figures in this paper.

We compared the performance of Models 1–3 to identify the model with the best performance, which is shown in

Table 2. For comparison, the deviance information criterion (DIC) and mean squared prediction error (MSPE) were used to evaluate model fitness in the Bayesian framework and prediction performance, respectively. The mathematical expression for MSPE is as follows:

where

${y}_{it}$ is the observed value, and

$\widehat{{y}_{it}}$ is the predicted value. DIC is defined as follows:

where

$\overline{D\left(\alpha \right)}$ is the posterior mean of deviance

$D\left(\alpha \right)=-2\mathrm{log}f\left(y|\alpha \right)$, and

$pD$ is

$\overline{D\left(\alpha \right)}-D\left(\widehat{\alpha}\right)$, where

$\widehat{\alpha}$ is the posterior mean of the parameter

$\alpha $.

To investigate the impact of the degrees of freedom in the spline functions of meteorological variables, we performed a sensitivity analysis by changing degrees of freedom from 2 to 12. The models differed very little in terms of regression coefficient estimates of air pollution and model performance.