Comparison of Frequentist and Bayesian Generalized Additive Models for Assessing the Association between Daily Exposure to Fine Particles and Respiratory Mortality: A Simulation Study

Objective: To compare the performance of frequentist and Bayesian generalized additive models (GAMs) in terms of accuracy and precision for assessing the association between daily exposure to fine particles and respiratory mortality using simulated data based on a real time-series study. Methods: In our study, we examined the estimates from a fully Bayesian GAM using simulated data based on a genuine time-series study on fine particles with a diameter of 2.5 μm or less (PM2.5) and respiratory deaths conducted in Shanghai, China. The simulation was performed by multiplying the observed daily death with a random error. The underlying priors for Bayesian analysis are estimated using the real world time-series data. We also examined the sensitivity of Bayesian GAM to the choice of priors and to true parameter. Results: The frequentist GAM and Bayesian GAM show similar means and variances of the estimates of the parameters of interest. However, the estimates from Bayesian GAM show relatively more fluctuation, which to some extent reflects the uncertainty inherent in Bayesian estimation. Conclusions: Although computationally intensive, Bayesian GAM would be a better solution to avoid potentially over-confident inferences. With the increasing computing power of computers and statistical packages available, fully Bayesian methods for decision making may become more widely applied in the future.


Introduction
With accelerated urban development and modernization, air pollution is worsening and its impact on human health has been the main research topic in developing countries [1]. Air pollutants include gaseous pollutants and particulate matter (PM). Studies have shown that PMs with an aerodynamic diameter of 2.5 µm or less (referred to as PM 2.5 or fine particles) have a greater impact on human health [2,3]. The impact of both long term (chronic) and short term (acute) exposure to PMs has been widely studied, with the former being most often estimated from cohort studies [4] and the latter from ecological time-series studies [5]. However, cohort studies are expensive and time-consuming to implement due to the long follow-up period required for collecting individual-level pollution and disease data [6]. This has led to the use of spatiotemporal ecological study design in this field [7,8], which takes advantage of routinely collected data [5] such as data from the air quality monitoring (AQM) stations in Beijing and the Causes of Death Registry (CDR) of the Chinese Centers for Disease Control and Prevention [9]. Although causal inference is an important problem in time-series studies due to the difficulty of selecting an appropriate regression model that can be well fitted to the data, they contribute to and independently corroborate the body of evidence provided by cohort studies [10].
The semiparametric Poisson regression has been widely used for time-series analyses of air pollution and health, which uses daily mortality or morbidity counts as the outcome, linear terms to measure the percentage of increase in the outcome associated with elevations in air pollution levels, and smooth functions of time and weather variables to adjust for time-varying confounders [11]. Generalized linear models (GLMs) with parametric splines (e.g., natural cubic splines) [12] or generalized additive models (GAMs) with nonparametric splines (e.g., smoothing splines or locally weighted smoothers or (LOESS) [13] are used to estimate effects associated with exposure to air pollution while accounting for smooth fluctuations in outcome that confound the estimated effects of pollution. GAM and GLM can be applied in similar situations, but they serve different analytic purposes. GLM emphasizes estimation and inference for the parameters of the model, while GAM focuses on exploring data non-parametrically. GAM is more suitable for exploring the data and visualizing the relationship between the dependent variable and the independent variables [14].
The basic form of GAM applied in air pollution and health studies may be expressed using Equation (1) [11]: log(E(Y t )) = β 0 + β l X t−l + S(t) + S(weather) + φ·DOW t (1) where Y t is count of daily mortality or morbidity, β 0 denotes the intercept, t indicates calendar day, X t are daily concentrations of the studied air pollutant, i.e., PM 2.5 in our study, l is the lag time of the pollution exposure (which is generally restricted to one to seven days for acute effects), S(t) denotes a smooth function of a covariate (calendar day or meteorological variables such as temperature and humidity). The smooth functions are usually constructed using LOESS, smoothing splines or natural cubic splines. φ is the vector of the regression coefficients associated with vector DOW t (indicating the 7 days of a week) for the tth day. β l is the parameter of interest describing the change in the logarithm of the average mortality count over population per unit of change in X t−l , which is generally interpreted as the percentage of increase in mortality for every 10 units or a standard deviation (SD) or interquartile range (IQR) of increase in ambient concentrations of the studied air pollutant at lag l.
The reason that the model is called semiparametric is that it assumes a linear relation with X t−l and unknown functional relations with time and weather variables. The conventional algorithm for fitting GAM (hereinafter called frequentist GAM) is the backfitting algorithm [15] and the corresponding robust estimation method has also been developed [16]. A disadvantage of backfitting is that it is difficult to integrate with the estimation of the degree of smoothness of the model terms, so that in practice the user must set these, or select between a modest set of pre-defined smoothing levels. Although the degree of smoothness can be estimated as part of model fitting using generalized cross-validation or by restricted maximum likelihood when the smooth components are represented using smoothing spline, it carries a computational cost [17]. The computationally efficient approaches such as fully Bayesian method have thus been developed in recent years [18].
Although there are some applications [19][20][21][22][23] of Bayesian GAM analyses in recent years, few of them compared the performance of frequentist and Bayesian GAMs in terms of accuracy and precision. In our study, we examined the estimates from a fully Bayesian GAM using simulated data based on underlying 'true' parameters from a genuine time-series study on PM 2.5 and respiratory deaths conducted in Shanghai, China.

Real World Data
The data of our study were from the Swedish-Chinese joint project Assessing Variability in the Short-term Impact of Air Pollution and Extreme Weather Conditions on Non-accidental Deaths in Shanghai during 2012-2014. The study was approved by the Ethical Review Committee of the Shanghai Municipal Center for Disease Control and Prevention (SCDC), China (approval number: SCDC2016-08).
Briefly, daily average PM 2.5 concentrations between 1 January 2012 and 31 December 2014 were obtained from one fixed air quality monitor of the Shanghai Meteorological Bureau and one monitor from the U.S. Consulate General in Shanghai, China. Daily mortality data during the corresponding time period for all the permanent residents in Shanghai were obtained from the Causes of Death Registry of the SCDC. The causes of death were coded according to the International Classification of Diseases Codes, version 10 (ICD-10). Deaths for respiratory diseases (ICD-10 codes J00-J99) were retrieved. Citywide daily meteorological data including temperature, relative humidity, barometric pressure, wind speed, precipitation, and sunshine time were retrieved from the Shanghai Meteorological Bureau.
The distribution of the observed daily respiratory mortality and the theoretical quasi Poisson distribution with the same mean and an overdispersion index = 1.3 are shown in Figure 1. The observed data showed a few deviations from the theoretical distribution. Therefore, a Poisson regression model is suitable for our data.

Real World Data
The data of our study were from the Swedish-Chinese joint project Assessing Variability in the Short-term Impact of Air Pollution and Extreme Weather Conditions on Non-accidental Deaths in Shanghai during 2012-2014. The study was approved by the Ethical Review Committee of the Shanghai Municipal Center for Disease Control and Prevention (SCDC), China (approval number: SCDC2016-08).
Briefly, daily average PM2.5 concentrations between 1 January 2012 and 31 December 2014 were obtained from one fixed air quality monitor of the Shanghai Meteorological Bureau and one monitor from the U.S. Consulate General in Shanghai, China. Daily mortality data during the corresponding time period for all the permanent residents in Shanghai were obtained from the Causes of Death Registry of the SCDC. The causes of death were coded according to the International Classification of Diseases Codes, version 10 (ICD-10). Deaths for respiratory diseases (ICD-10 codes J00-J99) were retrieved. Citywide daily meteorological data including temperature, relative humidity, barometric pressure, wind speed, precipitation, and sunshine time were retrieved from the Shanghai Meteorological Bureau.
The distribution of the observed daily respiratory mortality and the theoretical quasi Poisson distribution with the same mean and an overdispersion index = 1.3 are shown in Figure 1. The observed data showed a few deviations from the theoretical distribution. Therefore, a Poisson regression model is suitable for our data. To illustrate the relationship between respiratory mortality and PM2.5 pollution and meteorological variables, we used the daily respiratory mortality and the corresponding average PM2.5 concentration and the average temperature in lag 1 as an example ( Figure 2). In general, daily respiratory mortality was positively related to PM2.5 concentration and an approximately linear relationship was found. A similar relationship was found for PM2.5 concentrations at other lags. However, an apparent reversed sigmoid relationship was found between daily respiratory mortality and daily average temperature. A nonlinear relationship was also found for other meteorological variables. Therefore, a GAM with linear components for PM2.5 concentrations and nonlinear components for weather conditions and time trend is suitable for our study. To illustrate the relationship between respiratory mortality and PM 2.5 pollution and meteorological variables, we used the daily respiratory mortality and the corresponding average PM 2.5 concentration and the average temperature in lag 1 as an example ( Figure 2). In general, daily respiratory mortality was positively related to PM 2.5 concentration and an approximately linear relationship was found. A similar relationship was found for PM 2.5 concentrations at other lags. However, an apparent reversed sigmoid relationship was found between daily respiratory mortality and daily average temperature.
A nonlinear relationship was also found for other meteorological variables. Therefore, a GAM with linear components for PM 2.5 concentrations and nonlinear components for weather conditions and time trend is suitable for our study.

Fully Bayesian Generalized Additive Model
A fully Bayesian approach for modeling and inference within GAM requires prior assumption for unknown function S(t). Several alternatives have been recently proposed to specify smoothness prior for continuous covariates or time trends, such as random walk priors or more generally autoregressive priors [18,24], Bayesian P(enalized)-splines [25], and Bayesian smoothing splines [26]. Bayesian P-splines were used in our study because it has the advantage of allowing for simultaneous estimation of smooth functions and smoothing parameters. Moreover, it can easily be extended to more complex formulations [25]. The method assumes that an unknown smooth function Sj(xj) of a covariate xj can be approximated by a polynomial spline of degree ν defined on a set of equally spaced knots Bm are only positive with an area spanned by ν + 2 knots, which is essential for the construction of the smoothness penalty for P-splines. The estimation of Sj(xj) is thus reduced to the estimation of the vector of unknown regression coefficients βj from the data. To choose the number of knots, a moderately large number of equally spaced knots (around 10 knots per time interval) as suggested by Eilers and Marx [27] is flexible enough to capture the variability of the data. In the Bayesian approach, penalized splines are introduced by replacing the different penalties with their stochastic analogues, i.e., the first or second order random walk priors for the regression coefficients. First order random walk prior for equidistant knots is given by [28]: and a second order random walk by: with Gaussian errors , ~(0, ) and diffuse priors , ∝ , or , and , ∝ , for initial values, respectively. When we define the unknown function evaluation Sj as the matrix product of a design matrix ψj

Fully Bayesian Generalized Additive Model
A fully Bayesian approach for modeling and inference within GAM requires prior assumption for unknown function S(t). Several alternatives have been recently proposed to specify smoothness prior for continuous covariates or time trends, such as random walk priors or more generally autoregressive priors [18,24], Bayesian P(enalized)-splines [25], and Bayesian smoothing splines [26]. Bayesian P-splines were used in our study because it has the advantage of allowing for simultaneous estimation of smooth functions and smoothing parameters. Moreover, it can easily be extended to more complex formulations [25]. The method assumes that an unknown smooth function S j (x j ) of a covariate x j can be approximated by a polynomial spline of degree ν defined on a set of equally spaced knots within the domain of x j . Such a spline can be written in terms of a linear combination of M j (= d + ν) B-spline basis functions B m , i.e., and β j = β j,1 , · · · , β j,M j corresponds to the vector of unknown regression coefficients. The functions B m are only positive with an area spanned by ν + 2 knots, which is essential for the construction of the smoothness penalty for P-splines. The estimation of S j (x j ) is thus reduced to the estimation of the vector of unknown regression coefficients β j from the data. To choose the number of knots, a moderately large number of equally spaced knots (around 10 knots per time interval) as suggested by Eilers and Marx [27] is flexible enough to capture the variability of the data. In the Bayesian approach, penalized splines are introduced by replacing the different penalties with their stochastic analogues, i.e., the first or second order random walk priors for the regression coefficients. First order random walk prior for equidistant knots is given by [28]: and a second order random walk by: with Gaussian errors u j,m ∼ N(0, τ 2 j ) and diffuse priors p β j,1 ∝ const, or p β j,1 and p β j,2 ∝ const, for initial values, respectively.
When we define the unknown function evaluation S j as the matrix product of a design matrix ψ j and a vector of unknown parameters β j , i.e., then we obtain the predictor in Equation (1) as Depending on the above parameterization of the model, the posterior for fully Bayesian inference is given by: where L denotes the likelihood which is the product of individual likelihood contributions.
In the fully Bayesian approach, parameter estimates are obtained by drawing random samples from the posterior Equation (7) via Markov Chain Monte Carlo (MCMC) simulation techniques. More details about the fully Bayesian inference can be found in Fahrmeir and Lang [18] and Brezger and Lang [29].

Estimation of True Parameters
Although we focused on the effect of PM 2.5 concentrations in day lag 1 in the study, to exclude the effects from other lag days, the 'true' parameters were estimated using the frequentist distributed lag GAM instead of the basic model aforementioned. The distributed lag model associates health outcomes in a given day with PM 2.5 concentrations in several earlier days by replacing β l X t−l in (1) with: where θ measures the cumulative effect of PM 2.5 during the days, and η l measures the contribution of the lagged exposure X t−l to θ [30,31]. To reduce the computational work, we introduced only one smoothness term i.e., seasonal trend in the model. The nonlinear effects from meteorological variables were examined using a categorical variable for six synoptic weather types (SWTs) [32]. The final model may be expressed as: where β l (l = 1, . . . , 7) approximate to the percentage increase in deaths associated with the corresponding explanatory variables, i.e., lag1 t -lag7 t , the daily PM 2.5 concentrations of lag 1-7 days. Frequentist S(t) was realized using cubic B-splines with in total 18 knots (6 knots per year). W t denotes the vector of the six SWTs, i.e., hot dry, warm humid, cold dry, moderately dry, moderately humid, and cold humid weather types. The SWTs are defined using 15 routinely monitored meteorological variables through principal component analysis and K-means clustering method. The details of the methods were published elsewhere [33]. In total, 35,135 respiratory deaths occurred during the study period between 1 January 2012 and 31 December 2014 in Shanghai. Mean and median daily deaths were 32 and 30, respectively.
The descriptive statistics of daily respiratory deaths, ambient PM 2.5 concentrations, and major meteorological variables are shown in Table 1.
The parametric coefficients of the covariates derived from the frequentist GAM are shown in Table 2.
In our study, we focused on the acute effect of PM 2.5 concentrations on respiratory mortality, i.e., β 1 of lag 1 day in Equation (9). It is 0.0049 and statistically significant (corresponding to 5.37 excessive deaths per day or 1960 excessive deaths per year). The standard error (SE) of β 1 is 0.0017. The unit of PM 2.5 concentrations is 10 µg/m 3 in the analysis, therefore β 1 can be explained as that per 10 µg/m 3 increase in PM 2.5 concentration of lag 1 day is associated with approximately 0.49% increase in daily respiratory deaths. The result is consistent with the findings from two meta-analyses [34,35] and a large observational study performed in China [36], where the corresponding percent increase in respiratory mortality ranged from 0.29% to 0.75%.

Simulation
To assess the estimation of the fully Bayesian GAM approach compared with the realistic situations, we conducted a simulation study. In our simulation, the estimates in Table 2 derived from the real world data were used as 'true' parameters. We used the predicted daily deaths Y t based on Equation (9) as the mean daily deaths, then simulated daily death Y t by multiplying Y t by a random error e ε : where ε follows a distribution from exponential family and we applied normal distribution here. The simulation framework ensures that the same concurvity will exist between the simulated mortalities and the observed ones. By changing σ we may introduce different 'noise' in mortality to simulate the effects of unobserved confounders. In our simulation, the changing of the σ was achieved by multiplying σ, the SD of logarithmic daily deaths, by a factor γ, i.e., σ = γ σ. By selecting different random seeds, we may generate different time-series using random number generator in any statistical software. Figure 3 shows nine simulated time-series of daily respiratory deaths for γ = 0.1, 0.2, . . . , 0.9.
We can see when γ is equal to 0.4 or 0.5 the simulated data are most approximate to the real world data. statistical software. Figure 3 shows nine simulated time-series of daily respiratory deaths for γ = 0.1, 0.2, …, 0.9. We can see when γ is equal to 0.4 or 0.5 the simulated data are most approximate to the real world data.  Figure 4 shows the difference between the simulated data and the real world data. Almost all the differences were within the range of ±2SD of the real data when γ is between 0.1 and 0.9. The larger γ the larger difference presents, which means more 'noise' being included in the simulated data. When γ is equal to 0.5 the difference between SDs of the simulated data and the real world data (10.97 vs. 11.25) is the smallest. Therefore, we used the simulated data with γ ≥ 0.5 in the following analyses to include relatively larger uncertainty for our estimation.  Figure 4 shows the difference between the simulated data and the real world data. Almost all the differences were within the range of ±2SD of the real data when γ is between 0.1 and 0.9. The larger γ the larger difference presents, which means more 'noise' being included in the simulated data. When γ is equal to 0.5 the difference between SDs of the simulated data and the real world data (10.97 vs. 11.25) is the smallest. Therefore, we used the simulated data with γ ≥ 0.5 in the following analyses to include relatively larger uncertainty for our estimation.

Comparison of the Results from Frequentist GAM and Bayesian GAM
First, we set γ = 0.5, 0.6, …, 1.0 to generate six sets of simulated respiratory mortality data, where each set included 2000 time-series. Then we run the frequentist GAMs using simulated daily mortality as the dependent variable. We set the degrees of freedom (df) for S(t) from 1 to 20 per year in our models. For each df, we ran the frequentist GAM using 100 simulated time-series. In total, 12,000 s (from 6 γs × 20 dfs × 100 time-series) were derived. Distributions of the estimated s against the true β1 are shown using a violin plot combined with a box plot in Figure 5.

Comparison of the Results from Frequentist GAM and Bayesian GAM
First, we set γ = 0.5, 0.6, . . . , 1.0 to generate six sets of simulated respiratory mortality data, where each set included 2000 time-series. Then we run the frequentist GAMs using simulated daily mortality as the dependent variable. We set the degrees of freedom (df ) for S(t) from 1 to 20 per year in our models. For each df, we ran the frequentist GAM using 100 simulated time-series. In total, 12,000 β 1 s (from 6 γs × 20 df s × 100 time-series) were derived. Distributions of the estimated β 1 s against the true β 1 are shown using a violin plot combined with a box plot in Figure 5. In general, although the variance of s tends to increase as the df for S(t) is increased, the decrease in bias is far more dramatic, i.e., the greater df the closer mean of s approach to the true β1. The bias in the estimates is only serious for df ≤ 4 per year. With sufficient df to represent the smoothness of the nonparametric nonlinear trend leads to an asymptotically unbiased estimate of [37]. The apparent decrease in the bias of s with increasing df is explained by Daniels et al [38], Rice [39] and Speckman [40]. As we expected, the increased noise in the simulated data results in a larger variance of s ( Figure 5 and Table 3), which is approximate to 2γ × SE(β1) or 2γ × 0.0018. In general, although the variance of β 1 s tends to increase as the df for S(t) is increased, the decrease in bias is far more dramatic, i.e., the greater df the closer mean of β 1 s approach to the true β 1 . The bias in the estimates is only serious for df ≤ 4 per year. With sufficient df to represent the smoothness of the nonparametric nonlinear trend leads to an asymptotically unbiased estimate of β 1 [37]. The apparent decrease in the bias of β 1 s with increasing df is explained by Daniels et al. [38], Rice [39] and Speckman [40]. As we expected, the increased noise in the simulated data results in a larger variance of β 1 s ( Figure 5 and Table 3), which is approximate to 2γ × SE(β 1 ) or 2γ × 0.0018.  [41]. We used 1000 burn-in iterations and 2000 iterations after burn-in for Markov chains. Although 10,000 iterations after burn-in are usually suggested, we did not find noticeable differences in the estimates between 10,000 iterations and 2000 iterations. The latter reduced however the computation time significantly. For comparison purpose, we also used df s ranged from 1 to 20 per year for the Bayesian P-splines to control for smoothness. By default we used non-informative uniform priors for the coefficients and the derived estimates are quite similar to those derived from frequentist GAMs. Although the Bayesian β 1 s appear more fluctuated around the true β 1 (Figure 6), their SDs are comparable to those of their frequentist counterparts (Table 4).

Sensitivity of Bayesian GAM to Choice of Prior Mean and Variance
One of the most important (however also controversial) features of Bayesian method is that they can integrate prior knowledge and observed data in their inference. The posterior is a compromise between prior and likelihood. In the second simulation, we investigated the impact of informative priors rather than non-informative uniform prior on the posterior β 1 . We simulated the time-series of daily respiratory mortality with a fixed σ = 0.5 σ and true β 1 = 0.0049. In total, 12,000 time-series were generated. For Bayesian GAM analyses, we used a normal prior for β 1 , and set the varied prior mean µ(β 1 ) ranging from 0.001 to 0.020 by 0.001, and varied prior variance V(β 1 ) equal to γβ 1 , where γ = 0.5, 0.6, . . . , 1.0. For each combination of µ(β 1 ) and V(β 1 ), we did 100 Bayesian analyses. To get unbiased estimates with fewer computation task, we set the df for splines to eight per year. The distribution of Bayesian estimates ( β 1 s) are shown in Figure 7. The mean of β 1 s is fluctuated but closely around the true β 1 for different µ(β 1 ). These is no noticeable difference among the means of β 1 s derived from different V(β 1 ) (Figure 7 and Table 5). The SD of β 1 s is not sensitive to the V(β 1 ) ( Table 5). Var(β1) = .

Sensitivity of Bayesian GAM to True Parameter
In our third simulation study, we artificially set the σ = 0.5 σ and 'true' β 1 ranged from 0.001 to 0.020 to generate 20 sets of simulated daily respiratory deaths, while kept the other coefficients in Equation (9) unchanged.
Var (  We can see that the mean of the estimated β 1 s is only sensitive to the underlying true β 1 and is almost not affected by the prior µ(β 1 ). Due to the small coefficients, the difference between means and SDs of the estimated β 1 s can only be seen in the fifth or sixth decimal digit. The bivariate linear regression analysis between the true β 1 s and the Bayesian β 1 s in Table 6 reveals that all the coefficients are almost equal to 1 and all the R are larger than 0.99, which indicates the high precision and accuracy of the Bayesian estimates.

Discussion
In the presented study, we evaluated the performance of frequentist and fully Bayesian GAM approaches in a time-series study on the relationship between daily exposure to PM 2.5 and respiratory mortality. According to our estimates, per 10 µg/m 3 increase in PM 2.5 concentration of lag 1 day is associated with an approximately 0.49% increase in daily respiratory deaths in Shanghai between 2012 and 2014, which is consistent with the results from other studies conducted in China [9,36]. Using the estimated effect as true parameter, we compared the frequentist GAM and Bayesian GAM based on simulation. Both frequentist GAM and Bayesian GAM show the similar mean estimates of the interested parameters. However, the estimates from frequentist GAM showed relatively less fluctuation ( Figures 5 and 6), which to some extent reflects the over-confident inferences embedded in this method. Regarding the accuracy and precision of the estimates, both methods gave mean estimates close to the true parameter with comparable confidence intervals. It means that Bayesian GAM might be an ideal alternative to the conventional frequentist GAM. Our simulation study also indicated that when the underlying parameter was true, the informative normal priors had no noticeable influence on the Bayesian estimate (Figure 7), which was only sensitive to the underlying true parameter (Figure 8). The reason might be the large number of data that we have and the posterior is dominated by the data rather than the prior.
As a flexible extension of GLM introduced by Hastie and Tibshirani [42], GAM can estimate both linear trends from parametric components and nonlinear trend from any general nonparametric components during the fitting. It has been widely used in time-series studies on air pollution and health effects, controlling for daily variations in meteorological conditions and seasonal trends. The original GAM fitting method estimated the smooth components of the model using non-parametric smoothers, such as smoothing splines or LOESS, via the backfitting algorithm [42]. By iteratively smoothing partial residuals, backfitting provides a general module to estimate the S j terms that are capable of using a wide variety of smoothing methods. The computational cost issue of full spline method has been addressed recently by using Markov random fields to find sparse representations of the smooths, which can be viewed as an empirical Bayesian method [43]. An alternative approach with particular advantages in high dimensional settings is to use boosting, which typically requires bootstrapping for uncertainty quantification [44,45].
Although frequentist GAM gives a rich family of models that have been widely applied, a crucial problem with GAM is the choice of the number and the position of the knots, in terms of analytical tractability. A small number of knots may result in insufficient flexibility to capture the variability of the data while a large one may lead to overfitting. P-splines approach makes a more parsimonious parameterization possible, which is of particular advantage in a Bayesian framework where inference is based on MCMC techniques [25]. By taking into account the complete likelihood surface rather than plugging in the maximum likelihood estimate of the covariance structure, the approach provides the posterior distributions of the quantities of interest, such as 'the true parameter has a probability of 0.95 of falling in a 95% credible interval (CI)', which is more interpretable. Unlike frequentist methods, which provided one estimate for each model parameter, Bayesian methods may provide, for each parameter, a sample of thousands of MCMC estimates from the simulated posterior distribution of the parameter. The reported posterior mean and posterior distribution are the corresponding summaries of the marginal posterior distribution of the parameter.
Due to recent developments in MCMC algorithms, software, and hardware, we can now use MCMC methods to analyze complex models that would have been impossible only a few decades ago [46]. However, the Bayesian GAM is only available in a few R packages such as gammSlice [47], R2BayesX [48], and spikeSlabGAM [49], which limits its application in a variety of scientific fields. In gammSlice Pham et al. used the slicing sampling method [50] for GAM fitting and inference within a Bayesian framework [47]. R2BayesX supports similar models to those in gammSlice. Scheipl used spike-and-slab type prior distributions on the spline coefficients [49]. Especially, Klein et al. proposed a general class of Bayesian GAM for count data within the GAM framework for location, scale, and shape where semiparametric predictors can be specified for several parameters of a count data distribution [21].
There are some limitations to our study. First, we did not impose any structure on the relationship of the coefficients of the lagged PM 2.5 concentrations with each other. However, multicollinearity among the lagged independent variables often arises, leading to high variance of the coefficient estimates. We plan to address this problem by constraining the β l to be a simple function of the lag number using a structured finite distributed lag model in the future [51]. This problem can also be addressed by a time-scale model which estimates the association between daily smooth variations of pollution and health outcomes by replacing β l X t−l in Equation (1) with: ∑ K k=1 β k W kt (11) where W 1t , . . . , W kt , . . . , W Kt is a set of predictors obtained by applying a wavelet analysis or Fourier analysis to X t to satisfy orthogonality, such that ∑ K k=1 W kt = X t [52][53][54]. The parameter β k measures the logarithmic relative rate of the health outcome for increasing air pollution at time scale k. Time scales of interest include short-term variations within several days and long-term variations within one to two months because it is believed that any effects are dominated by seasonal confounding beyond two months [54]. Other tools such as auto-regressive moving average model or vector auto-regressive model could be also appropriate [55]. Second, our simulation framework did not address the issue of measurement error in the covariates. Because such error can in some situations attenuate the estimated effects, it may be useful in the future to employ a more elaborate simulation framework to incorporate the measurement error. Third, our model did not model interaction between PM 2.5 and other covariates yet. The concentration-response relationship between PM 2.5 and respiratory deaths was assumed to be approximately linear, but this might not be true in very low or very high PM 2.5 concentrations. The two limitations can be addressed using varying coefficient model (VCM), where linear β l X t−l is replaced by S(X t−l )z l . Estimation of VCM poses no further difficulties since only the β l in Equation (7) has to be redefined by S(X t−l ) multiplying with Z l . Furthermore, we acknowledge that our study is based on simulated data, and the difference in the results between the frequentist and Bayesian methods is small and might be biased towards a Bayesian approach because of the predefined parameters. Therefore, the evidence to advocate the Bayesian approach over the frequentist one has yet been strong, and a confirmation study using real-world data is needed to address these issues in the future.

Conclusions
Our simulation study indicates that fully Bayesian GAM may generate an accurate and precise estimations as conventional frequentist GAM does while revealing potential uncertainty that frequentist GAM could not detect. Although computationally intensive, Bayesian GAM would be a better solution to avoid over-confident inferences potentially seen in a frequentist GAM. With the increasing computing power and available statistical packages, fully Bayesian methods may see wider applications in decision-making processes.