Space–Time Relationship between Short-Term Exposure to Fine and Coarse Particles and Mortality in a Nationwide Analysis of Korea: A Bayesian Hierarchical Spatio-Temporal Model

Previous studies have shown an association between mortality and ambient air pollution in South Korea. However, these studies may have been subject to bias, as they lacked adjustment for spatio-temporal structures. This paper addresses this research gap by examining the association between air pollution and cause-specific mortality in South Korea between 2012 and 2015 using a two-stage Bayesian spatio-temporal model. We used 2012–2014 mortality and air pollution data for parameter estimation (i.e., model fitting) and 2015 data for model validation. Our results suggest that the relative risks of total, cardiovascular, and respiratory mortality were 1.028, 1.047, and 1.045, respectively, with every 10-µg/m3 increase in monthly PM2.5 (fine particulate matter) exposure. These findings warrant protection of populations who experience elevated ambient air pollution exposure to mitigate mortality burden in South Korea.


Introduction
Air pollution has been viewed as a threat to human health since the onset of rapid industrialization. It is widely known that air pollutants, including particulate matter with an aerodynamic diameter less than 10 µm (PM 10 ) or 2.5 µm (PM 2.5 ; fine particulate matter), is linked to total, cardiovascular, and respiratory mortality [1]. Since air pollution is particularly detrimental to vulnerable people, such as senior citizens and infants, understanding how ambient air pollution affects health is very important for making relevant policy [2,3]. Several studies have been conducted regarding the association of PM 10 and health in South Korea. One study [3] showed that post-neonatal infants are most susceptible to PM 10 in terms of mortality, especially respiratory mortality. Post-neonatal mortality increased by 14.2% for each 42.9-µg/m 3 rise in PM 10 . Another study [4] showed that with a 43.12-µg/m 3 increase in PM 10 concentration, daily non-accidental, respiratory, cardiovascular, and cerebrovascular mortality increased by 3.7%, 13.9%, 4.4%, and 6.3%, respectively.
Numerous epidemiological studies have investigated the associations of PM 10 and PM 2.5 with mortality worldwide [5][6][7][8][9]. One study [5] concluded that PM 10 concentration is associated with Int. J. Environ. Res. Public Health 2019, 16, 2111 2 of 11 total, cardiovascular, and respiratory daily mortality in 16 cities in China using a Poisson regression model. Another study [6] showed how air pollutants affect human health in 25 European cities across 12 countries. They determined that a decrease in PM 2.5 level could have led to a gain in life expectancy, postponing many deaths in the city. A European study [7] determined that the daily total mortality rose as PM 10 and black smoke concentrations increased with a Poisson regression model. The study [8] investigated the short-term positive effects of PM 10 and PM 2.5 on all-cause, cardiorespiratory, and other-cause mortality in the United States using a Bayesian hierarchical model. Another study in the United States [9] showed that increases in PM 10 and PM 2.5 concentrations led to increased numbers of post-neonatal infant deaths in 96 cities with populations greater than 250,000. Most of the domestic and foreign studies mentioned above focused on the effects of air pollutants on health outcomes in specific locations. However, to establish national-level air pollution control policy, a study should consider the entire spatial domain of a country.
Spatio-temporal modeling of the association between air pollution and health has recently received attention in environmental epidemiological studies [10,11]; however, there are some challenges in this domain. First, there is frequently a misalignment problem, due to the different dataset sources. This is because air pollution monitoring data are collected at monitoring stations, while health data are collected at an aggregated areal level. There are several approaches to overcome this problem. For example, after spatial modeling of air pollution data, spatial kriging can be conducted, and areal-level air pollution estimates can be produced [12][13][14][15]. Alternatively, one study [16] proposed a Markov Cube Kriging method to improve computational efficiency. In this paper, we applied a commonly used numerical interpolation method to obtain areal-level air pollution data. Second, there is a spatial confounding bias because space-time varying air pollution covariates and space-time random effects are simultaneously included in space-time health modeling [17]. For example, air pollution may be related to space-time random effects, which would bias the estimate on the effects of air pollution. In this paper, to better estimate air pollution effects without the influence of spatial confounding bias, we adopted the two-stage model proposed in [15].
Until now, most of the epidemiologic studies of air pollution conducted in South Korea have focused on how temporally varying air pollutant concentrations affected mortality in individual cities [18,19], but such studies should be conducted over entire areas of Korea to establish national-level air pollution control policy for health improvement. No studies have examined the relationship between temporally varying air pollutants and mortality throughout South Korea. Additionally, there have been almost no studies on the link between mortality and PM 2.5 in South Korea [18]. Herein, we investigated the associations of PM 10 and PM 2.5 with mortality throughout South Korea from 2012 to 2015. A two-stage Bayesian spatio-temporal hierarchical model was employed to better estimate the effects of air pollution on mortality outcomes, as well as to better predict the mortality associated with ambient air pollution.

Study Domain
This study was conducted over the entire area of South Korea and accounted for the period from 2012 to 2015. There are 250 administrative districts in South Korea ( Figure 1) and, because some administrative districts changed during the study period, we conducted analyses based on the administrative areas as of 2012. We

Data Description
We obtained monthly mortality data for 250 administrative districts using the Micro Data Integrated Service (MDIS) from Statistics Korea [20]. We used three types of mortality data: total mortality (all deaths except ICD codes V01-Y98), cardiovascular mortality (I00-I99), and respiratory mortality (J00-J99).
We acquired monthly average concentrations of PM10 and PM2.5 and meteorological data (temperature, humidity, and wind speed) for 250 districts for the years 2012-2015. The district-level air pollution data were produced based on Pun's interpolation method [21], a data-assimilation method that combines data from monitoring stations and numerical model outputs.
The regional deprivation index (RDI) was also added to the covariates to control for socioeconomic status effects. Because there was little temporal variation in the regional deprivation index, we only used data from 2010. Higher index values indicated that the area was more economically deprived. Because deaths occur more often in areas with relatively large elderly populations, we used the number of people over the age of 65 as an offset. Table 1 shows the summary statistics for the 36 months of data (2012-2014) and 250 administrative districts.

Data Description
We obtained monthly mortality data for 250 administrative districts using the Micro Data Integrated Service (MDIS) from Statistics Korea [20]. We used three types of mortality data: total mortality (all deaths except ICD codes V01-Y98), cardiovascular mortality (I00-I99), and respiratory mortality (J00-J99).
We acquired monthly average concentrations of PM 10 and PM 2.5 and meteorological data (temperature, humidity, and wind speed) for 250 districts for the years 2012-2015. The district-level air pollution data were produced based on Pun's interpolation method [21], a data-assimilation method that combines data from monitoring stations and numerical model outputs.
The regional deprivation index (RDI) was also added to the covariates to control for socio-economic status effects. Because there was little temporal variation in the regional deprivation index, we only used data from 2010. Higher index values indicated that the area was more economically deprived. Because deaths occur more often in areas with relatively large elderly populations, we used the number of people over the age of 65 as an offset. Table 1 shows the summary statistics for the 36 months of data (2012-2014) and 250 administrative districts.

Statistical Analysis
We proposed a two-stage Bayesian hierarchical spatio-temporal model (Model 3) to capture the spatial and temporal dynamics. Two competing models (Model 1, Model 2) were used to compare performances of the proposed model, Model 3. Mathematical expressions for Model 1 and Model 2 are as follows: The observed mortality for area i and month t, y it , followed a Poisson distribution with mean θ it N it , where θ it is the relative risk, and N it is the elderly population. The constant term β 0 indicates the intercept of the log of the relative risk that was common to all areas and months. Covariates X it , Z it , W it , and Q it are the monthly average air pollutant concentrations of PM 10 or PM 2.5 , temperature, humidity, and wind speed, respectively. The socio-economic covariate D i indicates the deprivation index. The smoothing function S i (·), i = 1, 2, 3 denotes a natural cubic spline function to explain the nonlinear effects of meteorological variables on mortality. The degrees of freedom of the natural cubic spline were 3, 2, and 3 for temperature, humidity, and wind speed, respectively. The parameters β 1 and δ denote the regression coefficients X it and D i , respectively. The random effects u i and v i are spatially uncorrelated and correlated terms, respectively, and k t and l t are temporally uncorrelated and correlated terms. Lastly, the random component φ it is the space-time interaction term.
To deal with the spatial confounding bias problem [17] in Model 2, a two-stage model [22] was considered. In the first stage, the Poisson regression model with only covariates (Model 1) was used. Using this model, we acquired the estimated relative riskθ it , and the continuous-type residualsr it were obtained fromr To capture the extra spatial and temporal variations in the residuals, we considered the following model:r In the second stage, our model was expressed as follows, usingŜ it , the estimated S it : Regression coefficient estimation was performed at this stage.
In the Bayesian framework, we used non-informative prior distributions for the parameters. For the intercept β 0 and air pollutant coefficient β 1 , we assumed normal distribution with zero 0 and variance 1,000,000, which is a fairly flat prior. For random effects, spatially and temporally uncorrelated terms had independent and identical normal distributions with zero mean hyperparameters σ u and σ k . The spatially correlated random term v i had a conditional autoregressive (CAR) [23] prior, and the temporally correlated random term l t had an autoregressive (AR)(1) prior. For the interaction term φ it , we considered four of the types proposed in [24]. The interaction term with different spatial trends for each time unit showed the best performance. Uniform distributions with lower bound 0 and upper bound 10 were specified for all hyperparameters.
The three models described above were fitted for the 2012-2014 data. Based on the results from the fitted models, we forecasted death counts for 2015 for all administrative regions in South Korea. We followed the forecasting scheme used in [25].
Bayesian analyses were carried out using the WinBUGS statistical package [26] Two parallel Monte Carlo Markov Chains (MCMC) were used with different initial values. To assess sample convergence, we utilized trace plots, auto-correlation plots, and the Gelman-Rubin statistic [27]. After burn-in, we generated 2500 samples for each chain with thin 50, resulting in a total of 5000 posterior samples. Including the burn-in period, it took around 30 hours to obtain 5000 posterior samples with a CPU Intel Xeon gold 5118 2.3 GHz and RAM 32 GB computer. The open source software R [28] was used to produce the figures in this paper.
We compared the performance of Models 1-3 to identify the model with the best performance, which is shown in Table 2. For comparison, the deviance information criterion (DIC) and mean squared prediction error (MSPE) were used to evaluate model fitness in the Bayesian framework and prediction performance, respectively. The mathematical expression for MSPE is as follows: where y it is the observed value, andŷ it is the predicted value. DIC is defined as follows: where D(α) is the posterior mean of deviance D(α) = −2 log f (y α), and pD is D(α) − D(α), whereα is the posterior mean of the parameter α. To investigate the impact of the degrees of freedom in the spline functions of meteorological variables, we performed a sensitivity analysis by changing degrees of freedom from 2 to 12. The models differed very little in terms of regression coefficient estimates of air pollution and model performance.

Results
To show the spatial distributions of air pollutants and mortality in South Korea over the period of 2012-2014, we used the average values of air pollutant concentrations for 36 months and the total death counts for each region over the 36-month period, which are presented in Figure 2. Each map is partitioned into four colors based on quantile. As shown in Figure 2, the total number of deaths from 2012 to 2014 was highest in Nowon-gu, in Seoul, at 7030, and lowest in Gyeryong-si, in Chungcheongnam-do, at 394. Moran's I statistic was applied to measure the spatial dependencies of mortality and air pollution data at the administrative region level. The null hypothesis of Moran's I statistic is that there is no spatial correlation among the data. As the adjacency matrix from Moran's I statistic, various neighborhood structures including binary, row-standardized, and globally standardized techniques were considered. Binary neighborhood structure used a matrix with a value of 1 if it was adjacent, or 0 otherwise. Row-standardized neighborhood structure was used to divide the binary matrix by the sum of the rows. Globally standardized neighborhood structure was used to divide the binary matrix by the mean of the sum of the rows. Moran's I statistics using row-standardized neighborhood structure of the total, cardiovascular, and respiratory mortality data were 0.27, 0.25, and 0.14, respectively. The results of Moran's I test for all neighborhood structures showed that all mortality data had statistically significant spatial dependency, with a p-value less than 0.05. Similarly, Moran's I test indicated spatial variations in PM 10 and PM 2.5 with p-values less than 0.05. standardized neighborhood structure of the total, cardiovascular, and respiratory mortality data were 0.27, 0.25, and 0.14, respectively. The results of Moran's I test for all neighborhood structures showed that all mortality data had statistically significant spatial dependency, with a p-value less than 0.05. Similarly, Moran's I test indicated spatial variations in PM and PM . with p-values less than 0.05. Model performance for total mortality and PM is shown in Table 2. Smaller MSPE and DIC values indicate a better model. While the difference in MSPE between Model 2 and Model 3 was not large, Model 3 yielded a smaller DIC than the other models. Therefore, Model 3 was chosen as the best model. Based on these results, we fitted only Model 3 for cardiovascular and respiratory deaths. Table 3 gives the posterior summaries of the estimated coefficients of the explanatory variables. The relative risks of air pollutants are based on a 10-/ increase. The estimated relative risks (RR) of PM and deprivation index for total mortality were 1.011 and 1.001, respectively. Since 95% credible intervals of all variables did not include 1, all estimated coefficients except PM for respiratory mortality were statistically significant. When the relative risk was larger than 1, we inferred that the number of deaths increases as the corresponding variable increases. Figure 3 shows the calibration plots of the observed and estimated numbers of deaths for 2012-2014 for the best model, Model 3. The points are close to the line y = x (red line in Figure 3), indicating that the estimation matched the observations well. Figure 4 shows the calibration plots for the observed and Model performance for total mortality and PM 10 is shown in Table 2. Smaller MSPE and DIC values indicate a better model. While the difference in MSPE between Model 2 and Model 3 was not large, Model 3 yielded a smaller DIC than the other models. Therefore, Model 3 was chosen as the best model. Based on these results, we fitted only Model 3 for cardiovascular and respiratory deaths. Table 3 gives the posterior summaries of the estimated coefficients of the explanatory variables. The relative risks of air pollutants are based on a 10-µg/m 3 increase. The estimated relative risks (RR) of PM 10 and deprivation index for total mortality were 1.011 and 1.001, respectively. Since 95% credible intervals of all variables did not include 1, all estimated coefficients except PM 10 for respiratory mortality were statistically significant. When the relative risk was larger than 1, we inferred that the number of deaths increases as the corresponding variable increases. Figure 3 shows the calibration plots of the observed and estimated numbers of deaths for 2012-2014 for the best model, Model 3. The points are close to the line y = x (red line in Figure 3), indicating that the estimation matched the observations well. Figure 4 shows the calibration plots for the observed and forecasted death counts in 2015 using Model 3 with the 2012-2014 data. It shows that the forecasted numbers of deaths were very similar to the observed numbers of deaths. To see the forecasting performance in detail, we calculated the Pearson residuals of the observed and forecasted values, which are shown in Figure S1 in the supplementary material. forecasted death counts in 2015 using Model 3 with the 2012-2014 data. It shows that the forecasted numbers of deaths were very similar to the observed numbers of deaths. To see the forecasting performance in detail, we calculated the Pearson residuals of the observed and forecasted values, which are shown in Figure S1 in the supplementary material.

Discussion
This study focuses on verifying the association between air pollution and human health over the entire area of South Korea for the years 2012-2015. To adjust for other effects on mortality, we also used meteorological factors and regional deprivation index as covariates.
After controlling for the effects of weather variables, the deprivation index was positively associated with different types of mortality. A previous study [19] found that socioeconomic status, as represented by marital status, education level, and occupation, was linked to different causes of death. Because the deprivation index is a comprehensive index encompassing all types of social or economic statuses for small geographic areas, areas with higher values for this index represent areas

Discussion
This study focuses on verifying the association between air pollution and human health over the entire area of South Korea for the years 2012-2015. To adjust for other effects on mortality, we also used meteorological factors and regional deprivation index as covariates.
After controlling for the effects of weather variables, the deprivation index was positively associated with different types of mortality. A previous study [19] found that socioeconomic status, as represented by marital status, education level, and occupation, was linked to different causes of death. Because the deprivation index is a comprehensive index encompassing all types of social or economic statuses for small geographic areas, areas with higher values for this index represent areas with greater deprivation. Deprived areas generally have higher death rates; therefore, our findings are consistent with those of [19].
After controlling for other factors, increases in the concentrations of PM 10 and PM 2.5 were associated with increases in various types of mortality. Other than the non-significant association of PM 10 with respiratory death, these findings are consistent with those from several previous studies [29][30][31]. One previous study [29] determined that the daily PM 10 concentration was positively associated with daily mortality. One of their findings was that PM 10 was positively associated with all-cause, cardiovascular, and respiratory mortalities after adjusting for weather variables [29]. Although there are very few existing studies addressing the relationship between PM 2.5 and death in South Korea, the effects of PM 2.5 on mortality can be inferred from previous studies on PM 10 and human health.
One of our study's strengths is that it covered the entire area of South Korea from 2012 to 2015. Although some studies have considered different geographic areas in South Korea, most did not evaluate spatial and temporal dynamics. For instance, [18] focused on the seven metropolitan cities in South Korea, but analyzed each geographic area independently. In contrast, our study area encompassed all 250 South Korean administrative areas, and we also considered spatial and temporal variations. We utilized spatial and temporal random effects to capture the spatial and temporal dependency structures not captured by the covariates. As far as we are concerned, this study is the first attempt to illustrate the link between air pollutants and human health using a Bayesian two-stage spatio-temporal model.
Despite the many strengths of this study, there are several limitations. First, the regression coefficient of particulate matter is the same for all administrative areas in our proposed model. However, since the effect of air pollution on death might vary across regions, region-specific regression coefficients may be more suitable. This will be one of the future directions of our research into air pollution and human health. Second, stratification of age and sex might lead to a more detailed understanding of air pollution and health. Several previous studies have conducted age-and sex-specific analyses and obtained meaningful findings [5]. Lastly, the results of this paper may be biased due to the possibility of exposure misclassification resulting from the aggregated nature of the data. For example, we assigned air pollution averages to all cases in each district for a given month, which may not truly capture the monthly average exposure for the 30 days prior to the day of mortality. Ideally, individual level data with precise location and date of death are needed to assess daily time-lagged effects of air pollution exposure [32][33][34]. Future research should be geared to address the above limitations.

Conclusions
In conclusion, this study contributes to an understanding of the relationship between air pollution and human health using a Bayesian two-stage spatio-temporal model. Our findings show that air pollution positively affects total, cardiovascular, and respiratory mortalities. Additionally, this study indicates the need for considering spatio-temporal dynamics in epidemiological studies.