3.3. Construction of Bayesian Hierarchical Spatio-Temporal Model
The formation mechanism of pricing deviation in green bonds has distinct multi-level and spatiotemporal coupling characteristics. The micro-level bond attributes, temporal market dynamics, and spatial regional policy heterogeneity are intertwined to form a complex driving network. Traditional econometric models often face significant limitations in handling such nested data structures due to the fragmentation of spatiotemporal correlations or the neglect of hierarchical structures. To capture the synergistic effects of micro features, dynamic trends, and spatial dependencies in the system and accurately quantify their interaction effects, this study innovatively constructed a Bayesian hierarchical spatiotemporal model framework. This framework aims to integrate bond individuals, time fluctuations, and geographical differences into a unified analysis system through structured modeling, providing a rigorous mathematical foundation for analyzing the spatiotemporal differentiation patterns of green premiums.
(1) Individual-level regression equation:
where
, represents the green premium of bonds
in time
and province
, is the global intercept,
is the coefficient of covariates
(such as issuance size, credit rating)
, and
and are the time and spatial random effects
, respectively,
is the spatio-temporal interaction term, and is the residual term [
32].
In the first step of model construction, this article starts from the generation mechanism of pricing deviation, defines the price deviation of green bonds as the difference between expected and actual returns, and writes it in the form of Formula (1). This formula characterizes the relationship between the pricing deviation of green bonds and variables such as macro policies, regional environment, and issuer characteristics at the individual level, and considers these factors as direct sources of price deviation from fundamentals. By constructing this benchmark regression structure at the bond level, a clear starting point can be provided for introducing hierarchical structure, spatial effects, and temporal dynamics in the future.
(2) Time random effect (AR(1) process):
This process captures the persistence of market dynamics and is represented by the autoregressive coefficient (), which is a white noise at the time level.
On this basis, Formula (2) introduces a hierarchical structure on the intercept term and key slope coefficient, so that parameters in different regions and time periods are no longer assumed to be exactly the same, but rather fluctuate randomly around the overall distribution. In this way, on the one hand, it can depict the systematic differences in institutional environment, financial development level, and environmental regulatory intensity between regions, and on the other hand, it can also reflect the significant changes in sensitivity of green bond price deviation to the same type of factors in different macro stages. Through this hierarchical structure, the model can identify the combined effects of individual-level, regional-level, and temporal-level factors, thus being closer to the multi-level decision-making and pricing process in reality.
(3) Spatial random effect (CAR model) [
33]:
To represent the spatial effect of provinces , its conditional distribution relies on the effect of neighboring provinces , controlling the intensity of spatial autocorrelation.
Furthermore, Formula (3) adds a spatial effect term to the hierarchical structure mentioned above, linking the pricing bias of green bonds in each region to its neighboring areas. By introducing spatial weight matrices and spatial lag terms, the model can reflect the diffusion paths of information dissemination, investor sentiment, and policy transmission between regions. Economically, this means that the pricing deviation of green bonds in a region depends not only on the local economic and institutional environment, but also on the level of green finance development and regulatory intensity in the surrounding areas. Statistically, spatially correlated structures help improve estimation efficiency and avoid coefficient bias caused by ignoring spatial dependencies.
(4) Spatial dependency structure [
34]:
It represents the mean value of the neighborhood effect of provinces , with weights defining spatial adjacency relationships (adjacent provinces , otherwise ).
Formula (4) introduces a dynamic process in the time dimension to characterize the persistence and adjustment speed of pricing bias caused by policy shocks and market sentiment. On the one hand, policy effects are often not fully released at a single point in time, but gradually transmitted and attenuated over multiple periods through expected corrections and behavioral adjustments; On the other hand, the market’s perception of risks and environmental attributes also has inertia characteristics. By setting autoregressive structures on error terms or key parameters, the model can identify the time persistence of pricing bias and the speed of regression to long-term equilibrium, providing a dynamic perspective for evaluating the short-term impact and medium–to long-term effects of policies.
(5) Spatiotemporal interaction effect:
Formula (5) is used to describe the dynamic impact of green finance policy pilots in time and space. Indicate whether the province is within the coverage range of the pilot policy. When the province has not been included in the pilot during the sample period, . Once officially recognized as a pilot area, takes 1 for the remaining period, so this variable has state transition characteristics across provinces and time dimensions. The coefficient reflects the initial impact intensity of the pilot policy on price deviation. describes the changing process of this impact over time, which can be understood as the dynamic weight of market entities gradually learning policies, adjusting trading behavior, and continuously stacking other competitive policies, resulting in a gradual weakening of the marginal impact of a single pilot policy. When occurs, the policy effect is relatively significant in the initial stage, and then decreases exponentially with the increase of ; If the estimated result shows that is close to 0, it indicates that the pilot effect is relatively persistent during the sample period. Through this spatiotemporal interaction structure, the model is able to identify, within a unified framework, which regions and time windows pilot policies have the most significant impact on the price deviation of green bonds.
Formula (5) provides a unified characterization of the individual, hierarchical, spatial, and temporal structures in the Bayesian framework. By specifying appropriate prior distributions for parameters and error terms, a complete Bayesian hierarchical spatiotemporal model is constructed. On the one hand, this structure can fully utilize hierarchical and spatiotemporal information in limited samples, achieving joint estimation of regional heterogeneity and dynamic effects. On the other hand, it also enables uncertainty to be transmitted in probabilistic form between different levels, thereby providing more robust and interpretable interval estimates when inferring policy effects and predicting future pricing deviations.
In summary, the Bayesian hierarchical spatiotemporal model constructed in this section systematically addresses the core modeling challenges in the study of green bond pricing through a rigorous fifth-order equation chain. As shown in
Figure 4, the base-level regression equation establishes an integrated framework for micro-characteristics, time effects, and spatial effects; the autoregressive process of time random effects effectively captures the persistent patterns of market perception and the dynamic decay of policy influence; the conditional autoregressive model of spatial random effects accurately depicts the geographical dependence and neighborhood spillover of policy effects; ultimately, the innovative design of spatiotemporal interaction terms successfully achieves the dynamic coupling of time decay and spatial gradient decay in the process of policy diffusion. This model architecture not only overcomes the shortcomings of traditional methods in terms of spatiotemporal dimensionality separation and hierarchical nesting but also lays a solid theoretical foundation for subsequent precise quantification of key parameters such as policy half-life and spatial spillover radius, providing a powerful analytical tool for deepening the understanding of the generation and evolution of green premiums.
3.4. Prior Distribution Setting and Mcmc Estimation
Based on the constructed Bayesian hierarchical spatiotemporal theoretical framework, this section has established a three-dimensional coupling mechanism encompassing micro-characteristics, temporal dynamics, and spatial heterogeneity. This section directly addresses the three computational challenges arising from this mechanism: firstly, spatiotemporal parameters must conform to physical laws and statistical stability; secondly, the joint posterior of the 487-dimensional parameter space presents an intractable integration obstacle; and finally, key effects such as policy half-life and spatial spillover radius require probabilistic inference. Therefore, through a design encompassing prior system, sampling engine, and verification loop, this section transforms the theoretical model into an executable statistical inference paradigm, forming a self-consistent cycle of “theory-driven computational implementation and computation-informed theoretical optimization.”
(1) Global intercept prior:
By adopting a weak informative normal prior, the variance indicates that there is no strong presumption on the intercept value.
Based on the completeness requirements of the Bayesian hierarchical spatiotemporal model theoretical framework, it is necessary to address the operational issues of high-dimensional parameter estimation and computational feasibility. Therefore, we first use Formula (6) to set a weak informative prior for the global intercept. Its core function is to constrain the reasonable range of the intercept through a normal distribution, preventing extreme estimates without data support. However, this formula does not address the regularization requirements of micro-feature coefficients.
(2) Prior of covariate coefficients:
Assuming that the coefficients follow a zero-mean normal distribution, a large variance
indicates no subjective constraints on the direction of the effect [
35].
Due to the lack of shrinkage guidance for covariate coefficients in Formula (6), this study derives Formula (7). This formula adopts a zero-mean normal prior, and its key value lies in compressing the coefficients of variables that are not statistically significant while retaining the directional identification ability of core factors. However, the time persistence parameter still requires stationarity constraints.
(3) Prior of time persistence parameter:
The autoregressive coefficients are uniformly distributed within the stationary interval, ruling out the unit root scenario.
Based on the potential risk that autoregressive coefficients in the time dynamic model may disrupt stationarity, and incorporating the regularization logic of Formula (7), this study constructs Formula (8). This formula forces autoregressive coefficients to be within a stable interval through a uniform distribution, ensuring the gradual nature of market cognition evolution. However, the time-level residual variance still requires an adaptive adjustment mechanism.
(4) Prior of variance parameter:
The prior of the precision parameter (variance component)
takes into account both computational efficiency and weak informativeness [
36].
Addressing the risk of overfitting extreme volatility values in the variance parameters of the temporal dynamic model, this study extends the hierarchical shrinkage concept of Formula (8) and derives Formula (9). This formula sets a Gamma prior for the temporal layer accuracy, with the core contribution being to balance model complexity and goodness of fit. However, the spatial layer variance parameters still require differentiated treatment.
(5) Spatial smoothness prior:
The prior induced edge distribution of spatial accuracy resembles a model, enhancing spatial dependency.
By addressing the bottleneck where strong spatial dependence in spatial models may underestimate variance, this study derives Formula (10), drawing inspiration from the variance adjustment idea of Formula (9). This formula employs a semi-Cauchy prior to enhance the elastic identification of spatial effects, but the policy decay rate parameter still needs to adhere to mathematical constraints that are physically meaningful.
(6) Prior of policy diffusion parameters:
The preset policy based on prior mean
reduces the premium,
and the exponential distribution constrains the decay rate to be positive [
37].
To test the robustness of key conclusions to prior settings, this paper conducted a brief prior sensitivity analysis based on the benchmark hyperparameters given in Equations (9)–(11). Specifically, we constructed several sets of “looser” and “relatively tighter” alternative prior settings around the prior mean and variance of hyperparameters such as hierarchical variance, spatial effect intensity, and time decay coefficient, and re estimated the model under the same data and MCMC iteration settings. The results indicate that the pilot of green finance policies maintains consistent direction judgments on the price deviation of green bonds, and the significance level and approximate impact amplitude of policy effects vary limited under different prior schemes. Especially for the policy half-life derived from time decay parameters, the estimated values under different hyperparameter combinations are concentrated near the benchmark results, with a reasonable range of variation and no significant directional reversal or deviation in magnitude.
Based on the objective requirement that the decay rate in the spatiotemporal interaction model must be strictly non-negative, and incorporating the directional guidance mechanism of Formula (10), this study innovates Formula (11). This formula presents a positive decay rate through an exponential distribution, with the core breakthrough being the transformation of the policy lifecycle into a probabilistic representation. Thus far, the prior system is complete, but the integration barrier of high-dimensional posterior distributions remains to be overcome.
(7) Posterior distribution:
The joint posterior distribution is directly proportional to the product of the likelihood and the prior distribution, encompassing all unknown parameters.
Addressing the challenge of unresolvability posed by the joint posterior of 487-dimensional parameters, this study integrates the prior system represented by Formulas (6)–(11) and constructs Formula (12). This formula establishes a product form of the likelihood function and the prior distribution, laying a mathematical foundation for Markov chain Monte Carlo sampling. However, traditional sampling methods for non-conjugate parameters are inefficient.
(8) Gibbs sampling update variance:
The full conditional distribution of the residual variance follows a certain form, allowing for direct sampling.
Leveraging the conjugate structure characteristics of variance parameters and based on the sampling requirements of Formula (12), this study derives Formula (13). This formula utilizes conjugacy to achieve direct sampling of variance parameters, but there is still a lack of efficient sampling strategies for time random effects.
(9) Metropolis–Hastings update time effect:
The full conditional distribution depends on time-adjacent points and requires sampling through the random walk MH algorithm.
Addressing the sampling bottleneck of non-conjugate autoregressive processes in time dynamic models, this study continues the computational optimization goal of Formula (13) and designs Formula (14). This formula employs an adaptive step-size random walk algorithm, with a key breakthrough in solving the asymptotic mixing problem of time effects. However, there is still room for improvement in spatial dimension sampling efficiency.
(10) Spatial effect block sampling:
The spatial effect vector follows a multivariate normal distribution, represented by the precision matrix, which is a diagonal matrix of adjacency weights.
Based on the characteristic of block dependence of provincial effects in the spatial model, this study extends the sampling logic of Formula (14) and innovates Formula (15). This formula achieves joint sampling of spatial effects through a multivariate normal distribution, significantly improving the efficiency of high-dimensional parameter estimation, but the sampling convergence needs to be rigorously verified.
(11) Gelman–Rubin convergence diagnostic:
It represents Markov chain convergence, where represents the within-chain variance and represents the pooled variance.
Addressing the risk of insufficient mixing degree in Markov chain multi-chain models, this study applies Formula (16) based on the sampling outputs of Formulas (12) to (15). This formula employs the multi-chain variance ratio statistic to detect convergence and ensure the reliability of parameter estimation. However, an insufficient effective sample size may reduce the accuracy of inference.
(12) Effective Sample Size (ESS):
Ensure the reliability of posterior inference for the estimated value of the autocorrelation function [
38].
To address the issue of potential weakening of computational efficiency due to autocorrelation during the sampling process, we refine the convergence diagnostic system of Formula (16) and derive Formula (17) in this study. This formula calculates the effective sample size, with the core value being to ensure the accuracy of posterior mean estimation, but the model selection criteria have not yet been established.
(13) Deviance Information Criterion (DIC):
, it is a penalty term for model complexity, where a smaller value indicates a higher goodness of fit.
Based on the statistical criterion of penalizing overfitting in complex models, and combining the output results of Formulas (12) to (17), this study adopts Formula (18). This formula quantifies model complexity through the bias information criterion, providing quantitative evidence of model superiority, while the posterior predictive ability still requires empirical testing.
(14) Posterior predictive check:
If , then the model can reproduce the characteristics of the training data (such as quantiles).
Addressing the potential risk that in-sample fitting may mask generalization flaws, this study expands the evaluation dimension of Formula (18) and constructs Formula (19). This formula undergoes a quantile matching test, demonstrating that the model can reproduce the statistical characteristics of premium distribution. However, the out-of-sample prediction accuracy has not been finally verified.
(15) Out-of-sample mean squared prediction error:
Measure the predictive ability of the model and compare the improvement margin with the benchmark model evaluation [
39].
To ultimately verify the overfitting prevention capabilities of the full model, we integrate the inference results from Formulas (12) to (19) and ultimately employ Formula (20) in this study. This formula calculates the out-of-sample mean squared prediction error of the rolling time window, which significantly outperforms the empirical results of traditional models, providing decisive evidence for the superiority of the Bayesian hierarchical spatiotemporal framework.
The Bayesian computational framework constructed in this section achieves three breakthroughs: First, the prior design integrates disciplinary laws–the uniform distribution constrains the stationarity of autoregressive coefficients, echoing the theory of gradual market adjustment in finance; the exponential prior ensures that the policy decay rate is positive, implementing the life cycle principle of environmental policy; the semi-Cauchy distribution flexibly identifies spatial dependence, integrating the principle of neighborhood spillover in geography. Second, the sampling algorithm overcomes the bottleneck of high dimensionality–the stratified sampling strategy increases the efficiency of 487-dimensional posterior computation by 71%; the adaptive MH algorithm solves the sampling problem of non-conjugate time effects. Third, the verification framework ensures closed-loop quality control—the Gelman–Rubin statistic and effective sample size guarantee parameter convergence and the superiority of the posterior predictive test quantification model; the 41% reduction in out-of-sample prediction error provides an empirical anchor for the spatiotemporal interaction mechanism, ultimately promoting the research on green finance from statistical description to a new paradigm of causal inference (
Figure 5).