Skip Content
You are currently on the new version of our website. Access the old version .
MathematicsMathematics
  • This is an early access version, the complete PDF, HTML, and XML versions will be available soon.
  • Feature Paper
  • Article
  • Open Access

31 January 2026

Disentangling Signal from Noise: A Bayesian Hybrid Framework for Variance Decomposition in Complex Surveys with Post-Hoc Domains

and
1
Department of Educational Studies in Psychology, Research Methodology, and Counseling, College of Education, The University of Alabama, Tuscaloosa, AL 35487, USA
2
Department of Curriculum and Instruction, College of Education, The University of Alabama, Tuscaloosa, AL 35487, USA
*
Author to whom correspondence should be addressed.
Mathematics2026, 14(3), 512;https://doi.org/10.3390/math14030512 
(registering DOI)
This article belongs to the Section D1: Probability and Statistics

Abstract

Quantifying geographic variation is crucial for policy evaluation, yet researchers often rely on complex national surveys not designed for sub-national inference. This design-analysis mismatch creates two challenges when decomposing variance across domains like states: informative sampling confounds substantive heterogeneity with design artifacts, and finite-sample variance inflation conflates sampling noise with signal. We introduce the Bayesian Hybrid Framework that reconciles design-based and model-based inference through Bayesian Pseudo-Likelihood for design consistency and a hybrid generalized linear mixed model that simultaneously estimates substantive domain effects and nuisance design effects (strata, PSUs). We propose a Dual Estimand Framework distinguishing between Descriptive (total observed variance) and Policy (substantive variance net of design) estimands, with explicit de-attenuation to correct finite-sample inflation. Simulations based on the 2019 National Survey of Early Care and Education demonstrate negligible bias and superior efficiency compared to standard alternatives. Applied to subsidy receipt among home-based child care providers, we find the observed between-state variation (16.7%) reduces to only 5.4% after accounting for design artifacts and sampling noise. This three-fold reduction reveals that local factors, not state policies, drive most heterogeneity, highlighting the necessity of our framework for rigorous geographic variance decomposition in complex surveys. An accompanying R package (version 0.3.0), bhfvar, implements the complete framework.

Article Metrics

Citations

Article Access Statistics

Article metric data becomes available approximately 24 hours after publication online.