2.4. Research Methods
We developed a progressive analytical framework integrating multi-scale identification, hierarchical modeling, and causal validation. First, the degree of heritage coupling was measured based on the processed data, and correlations for the initially screened variables were identified using a Bayesian model. Subsequently, three types of variables were re-screened based on significance levels and instrumental variables, enabling the identification of model-based causal effect estimates and heterogeneity under optimal grouping strategies (
Figure 2).
- (1)
Spatial Coupling Degree Calculation Method
Coupling describes the dynamic relational mechanisms through which system components interact and co-evolve, and it has been assigned multiple meanings across disciplines, including correlation and interdependence. The term ‘coupling’ is used in heritage studies to refer to the cultural and social associations reflected through spatial co-location. Due to potential shared historical trajectories, social transformations and environmental constraints, spatial co-location constitutes a core condition for heritage coupling [
9]. For example, traditional villages and historic buildings often form cultural ensembles together with folk customs, festivals and ritual practices [
16]. Such symbiotic relationships between TCH and ICH represent a marker of cultural system integrity and have given rise to holistic conservation approaches that emphasize their interconnections [
45].
Drawing on system coupling theory [
46] and research on place attachment in cultural geography [
47], this study integrated spatial analysis with cultural interpretation. The term ‘coupling’ refers to the association strength at the level of spatial co-location, measured through operational indicators, that is, the systematic co-existence patterns in which TCH and ICH deviate from a random spatial distribution. Spatial coupling analysis does not equate ‘coupling’ with mere ‘proximity’, but instead conceptualizes spatial co-location as a spatial representation of cultural associations [
48]. A high degree of coupling indicates that the two have formed a closely interconnected cultural ecosystem whose spatial clustering patterns reflect functional complementarity and cultural symbiosis. This theoretical lens shifts the meaning of heritage from isolated elements to the relational networks formed among them [
49]. Accordingly, the measurable mathematical indicator captures statistical proximity and spatial evidence of interactive relationships between TCH and ICH.
This study used the modified co-location quotient (MCLQ) index to quantify the coupling intensity between TCH and ICH. Based on the CLQ theoretical framework, this method assesses the degree of association of point features by comparing the actual number of neighboring point pairs observed with the expected value under theoretical random distribution conditions [
50]. The CLQ method is often adapted in spatial co-location analysis to suit the characteristics of different datasets [
51]. A correction coefficient was introduced to adjust for the impact of geographical unit boundaries on the effective search area of heritage sites. Additionally, a geographical unit-based analytical framework was adopted, with differentiated search radii designed accordingly.
The number of neighboring point pairs satisfying the distance threshold conditions was first calculated:
Here, m and n denote the number of TCH and ICH points within a geographic unit, respectively; Djk represents the Euclidean distance between TCH point j and ICH point k; and I(⋅) is an indicator function that takes the value 1 when the distance condition is satisfied, and 0 otherwise.
The expected number of nearest-neighbor point pairs was estimated under the assumption that heritage points were randomly distributed within the unit, determined by the ratio of the search area to the unit area. A correction factor was applied to account for the reduction in the effective search area for heritage points located near the unit boundaries.
In the above equations, Ai represents the area of the geographical unit (km2); λ represents the boundary effect correction coefficient; and r represents the search radius.
Finally, standardized MCLQ values were obtained by comparison with the values expected under a random distribution.
Geographical units were classified into four coupling types based on MCLQ values: strongly coupled areas (MCLQ > 2.0), characterized by high spatial clustering and strong co-distribution of both heritage types; weakly coupled areas (1.0 < MCLQ ≤ 2.0), exhibiting moderate spatial association; independently distributed areas (0.5 ≤ MCLQ ≤ 1.0), exhibiting relatively random distribution patterns; and repulsively distributed areas (MCLQ < 0.5), exhibiting spatial separation between the two heritage types.
Considering China’s administrative system and geographical characteristics, the analysis units were defined at three geographical scales: prefecture-level scale, provincial scale, and cultural–geographical unit scale. The search radii were set at 20 km, 150 km, and 250 km, respectively. This multi-scale framework follows place formation theory [
52], recognizing that heritage associations are products of multi-level social processes [
53]. Smaller radii are suited for identifying community-level symbiotic relationships, while larger radii can capture association patterns within cultural networks. Different radii correspond to different natural or social scales, reflecting interactive processes at various levels [
54]. Historically, cultural practices within prefecture-level administrative units or settlements primarily unfolded within daily life circles and transportation networks [
55]. Based on this theory, the 20 km radius was set to correspond to local interaction ranges, capturing the functional symbiotic relationships between TCH and ICH. Provincial administrative divisions often follow natural geographical boundaries and have important cultural and institutional integration functions. The 150 km radius was set to correspond to the spatial scale of regional cultural networks, capturing co-distribution patterns [
56]. China’s cultural geography is divided into seven regions. These units share common historical origins and cultural concepts, forming cultural ecological zones that transcend administrative boundaries [
57]. The 250 km radius was set to capture spatial evidence of cultural diffusion and heritage evolution at the macro scale, identifying coupling patterns driven by geo-historical processes. This differentiated configuration considered the average spatial extent of geographical units and the distribution density of heritage sites at each scale, ensuring the effective identification of cross-scale spatial associations [
58].
From the perspective of conservation practice, a strong coupling state signals active functional linkages, whereas a weak coupling state may indicate breaks in cultural transmission. This observation is consistent with the phenomena of cultural continuity and functional discontinuity documented in studies of traditional villages [
57]. The coupling types described above are regarded as external manifestations of latent cultural interactions, but they do not directly reveal underlying social or cultural relations. Relying solely on this diagnostic indicator cannot quantify the social or environmental driving mechanisms, so we constructed interaction terms and implemented machine-learning models. This approach enabled us to empirically test the actual effects of factors and mechanism pathways, link spatial attributes to cultural or social explanatory mechanisms, and thereby strengthen the scientific validity of the coupling-system indicators.
- (2)
Hierarchical Bayesian Model
The core of HBM lies in nesting effects at both the overall level and group level, achieving information sharing and bias correction through hyperparameter constraints [
59]. Compared with traditional regression methods, HBM not only avoids underestimation of standard errors caused by ignoring intra-group correlations but also maintains robustness under conditions of unbalanced sample sizes across groups or sparse local data [
60]. This characteristic effectively integrates the regional heterogeneity and multi-level data structures of the three types of geographical units, thereby preventing the information loss that would result from single-level models.
The three geographical grouping patterns were used as grouping criteria, enabling separate modeling at each geographical level and revealing the similarities and differences between the overall and local effects [
61]. The sample was randomly divided, with 80% allocated to the training set and the remainder to the test set, thus balancing the requirements for model training and independent validation. The basic model form is as follows:
Group level: Data from each group are directly processed for fitting, and the Markov Chain Monte Carlo (MCMC) method is used to estimate parameters for each group [
62]. This method preserves inter-group differences through independent sampling. Group-specific parameters are subject to structural constraints due to shrinkage effects, which prevent overfitting by exhibiting a tendency to shrink toward the overall mean [
63].
In Equation (6), yij represents the MCLQ value of the i-th observation in the j-th group; Xijk represents the k-th feature variable; βjk is the corresponding regression coefficient, revealing the variable effects within different groups; αj represents the group-specific intercept term; and σ2 represents the variance parameter.
Overall level: Imposing weakly informative hyperpriors on group-level intercepts and slopes stabilizes estimates for small-sample groups while preserving between-group heterogeneity. When sample sizes are small or residual variance is large, posterior estimates become more sensitive to hyperpriors [
64]. Tighter variance hyperpriors strengthen partial pooling, draw small-sample parameters toward the overall mean, and suppress extreme estimates, thereby reducing overfitting. Looser variance hyperpriors widen uncertainty and more fully reveal cross-group differences. The mean hyperprior determines the direction and magnitude of shrinkage. Accordingly, a zero-centered, weakly informative specification with an appropriate scale helps avoid systematic bias in small-sample groups [
65]. The hyperprior on the residual term also indirectly modulates shrinkage in small groups by altering the effective noise level.
Here, μα and μβk serve as hyperpriors representing overall-level mean effects; and are variance parameters reflecting the degree of effect variation across different geographical units.
The model captures overall patterns and inter-group variations through a multi-layered parameter structure, enabling cross-scale effect decomposition. Uncertainty quantification provides complete posterior probability distributions for the estimated parameters. MCMC sampling convergence was diagnosed using the Gelman–Rubin statistic (Rˆ) and Monte Carlo standard error (MCSE) [
66]. For model evaluation, the coefficient of determination (R
2) was employed to quantify the model’s fitting capability [
67]. Root mean square error (RMSE) was used to measure the accuracy of the model’s predictions [
68]. The Bayesian
p-value (BPV) was introduced to test the consistency between the observed and generated data [
69]. Together, these three metrics provided comprehensive evaluation criteria for HBM performance from the perspectives of goodness of fit, prediction accuracy, and statistical validity. In addition, five-fold cross-validation was used to assess out-of-sample predictive stability. One of the five randomly partitioned subsets was designated as the test set and the remaining subsets as the training set, with fitting and prediction repeated across folds. The RMSE and mean absolute error (MAE) on each test fold were then averaged to quantify generalization performance [
70].
To further assess whether the hierarchical structure of the HBM had adequately captured spatial dependence, multi-strategy spatial autocorrelation diagnostics were applied to the model residuals. Given the sensitivity to the specification of spatial weight matrices, three robustness schemes [
71] were implemented: (a) queen contiguity matrices; (b) adaptive distance–band matrices with thresholds set at the 75th and 90th percentiles of the k-nearest-neighbor (KNN) distance distribution (with
k = 8, 6, and 3 for the three geographical scales); and (c) KNN matrices chosen to balance statistical power in small samples against over-connection in large samples (with
k = 6, 3, and 2 for the three scales). All weight matrices were row-standardized to ensure comparability across heterogeneous neighborhood structures [
72]. For each specification, the global Moran’s I of the residuals was computed, and significance was assessed using a 999-permutation Monte Carlo test [
71]. If the residuals exhibited no significant spatial autocorrelation across all strategies (
p > 0.05), we concluded that the HBM had sufficiently captured spatial dependence, obviating the need for additional spatially structured terms.
- (3)
Causal Effect Estimation and Heterogeneity Analysis
A hybrid causal inference framework based on double machine learning (DML) and ordinary least squares (OLS) was employed to investigate the causal mechanisms underlying the cultural heritage coupling. Within this framework, the DML approach was used to control for high-dimensional confounders and reduce model specification bias [
73], while the OLS approach was used to retain interpretability. This design ensured accurate estimation and robust, reliable identification of causal effects. The basic mathematical structure is expressed as follows:
Equation (8) specifies the OLS regression model used as the baseline estimation. Yi denotes the MCLQ value of the i-th geographical unit; Ti represents the treatment variable; Xi is the vector of the control variables; β is the target causal effect parameter; γ is the coefficient vector for the control variables; α is the intercept term; and ϵi is the random error term.
Equations (9)–(12) present the core structure of the DML method [
74].
θ denotes the target causal effect parameter (average treatment effect, ATE);
and
denote the estimated conditional expectation functions of the outcome and treatment variables with respect to the covariates;
ϵi and
νi are the random error terms;
is the causal effect estimated by DML;
is the residualized treatment variable that removes the influence of control variables;
is the residualized outcome variable that eliminates the effect of control variables; and
n is the sample size.
The DML estimation required randomly splitting the sample into K folds to implement a cross-fitting strategy [
75]. Using data from the other folds,
and
were estimated separately based on the random forest algorithm and then used to predict the target fold. The number of trees was adaptively adjusted according to the sample size to balance estimation accuracy and overfitting risk. The maximum tree depth was constrained using a logarithmic function based on the sample size, and a feature subsampling strategy was employed to reduce variance. Residuals were then computed to ensure the unbiasedness of the causal effect estimates. Finally, ordinary least squares regression was performed on the residualized variables, and the causal effect estimates from each fold were weighted by sample size to obtain
.
We employed Conley heteroscedasticity and autocorrelation consistent (HAC) standard errors to assess the robustness of our causal estimates to potential spatial autocorrelation [
76]. Given the asymptotic nature of Conley–HAC and the small sample sizes at the other two scales, the main analysis was conducted only for the prefecture-level sample. Bandwidths were chosen in a data-driven manner: we used the 25th and 50th percentiles of the nearest-neighbor distances between city centroids as the lower and main bandwidths and determined an adaptive bandwidth by requiring that ≥70% of nodes had at least one neighbor. Distances were computed as projected Euclidean distances, and spatial weights were constructed using the Bartlett (triangular) kernel [
77]. Variable selection focused on pre-specified main treatment variables and robustness-check variables to mitigate post-selection inference bias.
For the best grouping identified through subsequent multi-group strategies, the significance and degree of heterogeneity were assessed using Cochran’s
Q test and the
I2 statistic, respectively [
78]. The
Q statistic follows a chi-square distribution with
k − 1 degrees of freedom. Typically,
I2 > 75% indicates high heterogeneity, 25% <
I2 ≤ 75% indicates moderate heterogeneity, and
I2 ≤ 25% indicates low heterogeneity.
To ensure reliable causal inference, differentiated robustness checks were constructed based on the theoretical importance of variables and the strength of causal evidence [
79].
Equation (13) tests for the influence of potential confounding variables on the main effect. If the change in effect is less than 20%, the results are considered robust. and denote the estimated causal effects of the main treatment variable in the baseline model and the extended model that incorporates robustness variables, respectively. Equation (14) compares the estimation differences between the two methods. If Δnorm < 1.96, the results are considered statistically consistent across the two methods. SEOLS and SEDML represent the standard errors of the two estimates, respectively.
A rigorous, comprehensive test was conducted for the primary treatment variables. Potential confounding variables were added sequentially to examine the stability of their coefficients, and the consistency of the method was verified across the baseline and each extended model. For robustness variables, the focus was on assessing methodological consistency, supplemented by a simplified version of Equation (13) to test the effects of key control variables and avoid excessive testing. For exploratory variables, the emphasis was placed on evaluating the stability of the coefficients across different model specifications. Consistency checks between methods were only performed for significant variables in the baseline model. Furthermore, to clearly illustrate the complex structure of the constructed HBM and causal inference models,
Figure 3 displays the hierarchical structure, prior distribution specifications, and core parameters.
If longitudinal data become available, the framework can be extended temporally. The HBM can be extended to a cross-classified spatiotemporal hierarchy, where time-varying random intercepts and key slopes with weak temporal-dependence priors are introduced within each geographic scale. Spatially structured components can also be combined with the temporal process through a separable space–time covariance [
80], preserving cross-scale analysis while providing spatiotemporal trajectories of key driver effects. Concurrently, the causal module can be extended via a two-way fixed-effects difference-in-differences design [
81]. By aligning units to the onset of treatment and incorporating lead and lag indicators, dynamic effect paths can be estimated in an event-study form, using cluster-robust standard errors to ensure robustness against serial correlation [
82].
- (4)
Optimal Grouping Strategy for Heterogeneity Analysis
We used a multidimensional evaluation framework to identify the optimal grouping strategy for heterogeneity. Sample units were grouped according to key variables, enabling causal effects to be analyzed comparatively within each group. Four grouping methods were applied: median split, tertile split, quartile extreme, and K-means clustering. The composite scoring function encompassed three aspects: statistical significance testing, effect size measurement, and the magnitude of intergroup differences.
To quantify the differences between the groups, Cohen’s
d and Partial Eta Squared were used as effect size indices for two-group and multiple-group comparisons, respectively [
83]. The corresponding calculation formulas are as follows:
Cohen’s d effect size is typically classified as small (d ≈ 0.2), medium (d ≈ 0.5), or large (d ≈ 0.8). The coefficient of variation between groups (CVbetween) measures the relative dispersion among group means, with values greater than 0.4 indicating a large effect size.
Based on these parameters, the magnitude of intergroup differences was introduced to evaluate the performance disparities across different classification combinations. The optimal combination was required to reach statistical significance (
p < 0.05) and to exhibit a large effect size [
84]. Additionally, each group had to meet the sample size requirement for statistical inference (
n ≥ 30), and the grouping results had to be consistent with geographical and social logic. This weighting scheme emphasized the central role of effect size while assigning equal importance to statistical significance and the magnitude of group differences.
where
p represents the significance level obtained from statistical testing; Effect Size represents the standardized effect size; Δ
μ represents the mean difference between groups; and max(Δ
μ) represents the maximum mean difference.