Abstract
Quantifying geographic variation is crucial for policy evaluation, yet researchers often rely on complex national surveys not designed for sub-national inference. This design-analysis mismatch creates two challenges when decomposing variance across domains like states: informative sampling confounds substantive heterogeneity with design artifacts, and finite-sample variance inflation conflates sampling noise with signal. We introduce the Bayesian Hybrid Framework that reconciles design-based and model-based inference through Bayesian Pseudo-Likelihood for design consistency and a hybrid generalized linear mixed model that simultaneously estimates substantive domain effects and nuisance design effects (strata, PSUs). We propose a Dual Estimand Framework distinguishing between Descriptive (total observed variance) and Policy (substantive variance net of design) estimands, with explicit de-attenuation to correct finite-sample inflation. Simulations based on the 2019 National Survey of Early Care and Education demonstrate negligible bias and superior efficiency compared to standard alternatives. Applied to subsidy receipt among home-based child care providers, we find the observed between-state variation (16.7%) reduces to only 5.4% after accounting for design artifacts and sampling noise. This three-fold reduction reveals that local factors, not state policies, drive most heterogeneity, highlighting the necessity of our framework for rigorous geographic variance decomposition in complex surveys. An accompanying R package (version 0.3.0), bhfvar, implements the complete framework.