Inference with Pólya-Gamma Augmentation for US Election Law

Hall, Adam C.; Kang, Joseph

doi:10.3390/math13060945

Open AccessArticle

Inference with Pólya-Gamma Augmentation for US Election Law

by

Adam C. Hall

^† and

Joseph Kang

^*,†

US Census Bureau, 4600 Silver Hill Rd., Washington, DC 20233, USA

^*

Author to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Mathematics 2025, 13(6), 945; https://doi.org/10.3390/math13060945

Submission received: 7 February 2025 / Revised: 6 March 2025 / Accepted: 11 March 2025 / Published: 13 March 2025

(This article belongs to the Special Issue Statistical Simulation and Computation: 3rd Edition)

Download

Browse Figure

Versions Notes

Abstract

Pólya-gamma (PG) augmentation has proven to be highly effective for Bayesian MCMC simulation, particularly for models with binomial likelihoods. This data augmentation strategy offers two key advantages. First, the method circumvents the need for analytic approximations or Metropolis–Hastings algorithms, which leads to simpler and more computationally efficient posterior inference. Second, the approach can be successfully applied to several types of models, including nonlinear mixed-effects models for count data. The effectiveness of PG augmentation has led to its widespread adoption and implementation in statistical software packages, such as version 2.1 of the R package BayesLogit. This success has inspired us to apply this method to the implementation of Section 203 of the Voting Rights Act (VRA), a US law that requires certain jurisdictions to provide non-English voting materials for specific language minority groups (LMGs). In this paper, we show how PG augmentation can be used to fit a Bayesian model that estimates the prevalence of each LMG in each US voting jurisdiction, and that uses a variable selection technique called stochastic search variable selection. We demonstrate that this new model outperforms the previous model used for 2021 VRA data with respect to model diagnostic measures.

Keywords:

small area estimation; Pólya-gamma augmentation; stochastic search variable selection; Bayesian regression; Gibbs sampling; Voting Rights Act

MSC:

62F15

1. Introduction

1.1. A Motivating Example

Section 203(b) of the Voting Rights Act (VRA) is an election law that requires some US states or their political subdivisions to make voting materials available in languages other than English. The US Census Bureau determines whether a jurisdiction, e.g., county, is subject to such requirements based on the rates of limited English language proficiency and illiteracy within designated population subgroups, called language minority groups (LMGs), in that jurisdiction. For the purposes of VRA coverage determinations, limited English proficiency (LEP) is defined as speaking a language other than English at home, and speaking English less than “Very Well”. Illiteracy (ILL) is defined as having less than a 5th-grade education. A more detailed discussion of the legal and statistical criteria used to make VRA coverage determinations is available in Slud et al. [1].

To make these determinations, the Census Bureau must estimate the number of voting-age (18 or over) people (VOT), voting-age citizens (CIT), voting-age citizens who have limited English proficiency (LEP), and voting-age citizens with limited English proficiency who are illiterate (ILL) in each of these geographies. The Census Bureau initially used “direct” estimates based on data from the long-form of the Decennial Census (later replaced by the American Community Survey, or ACS) to approximate these quantities. The term “direct” indicates that an estimate does not use any statistical model or auxiliary information beyond the survey weights and observations within that area. However, since 2011, Section 203 determinations have been based on estimates from small-area models. These statistical models were updated in 2014 by Joyce et al. [2], who developed a Bayesian hierarchical model, and Slud et al. [3], who adopted a Dirichlet-multinomial model.

In the VRA 2021 determinations, model-based estimates were obtained using both frequentist and Bayesian methods [1]. The frequentist method used adaptive Gaussian quadrature [4] to fit random effects models for LMGs with a sample in less than 700 jurisdictions (counties and MCDs), or less than 3000 jurisdictions where most citizens did not have severe illiteracy. The Bayesian model was fitted using STAN [5], which only worked well for large LMGs. Models fit with STAN had relatively slow and unstable convergence in smaller LMGs even when using priors recommended for computational efficiency, such as the LKJ prior for the random-effects covariance distribution. In addition, both methods used covariates that were selected based on their performance in a few large LMGs to model all LMGs, rather than performing independent variable selection procedures in the models for smaller groups. More details on the VRA 2021 methods are described in Slud et al. [1].

Table 1 shows a dataset that contains hypothetical examples of jurisdiction-level estimates that VRA models might produce for an LMG. For this LMG, let

V O T_{j}

,

C I T_{j}

,

L E P_{j}

, and

I L L_{j}

represent VOT, CIT, LEP, and ILL, respectively, in jurisdiction j. Then, in Table 1, CITprop, LEPprop, and ILLprop represent the direct estimates of the ratios

C I T_{j} / V O T_{j}

,

L E P_{j} / V O T_{j}

, and

I L L_{j} / V O T_{j}

, respectively. The model-based predictions of these ratios are represented by

π_{C I T}

,

π_{L E P}

, and

π_{I L L}

, respectively.

Note that the model-based estimates are similar to the direct estimates for jurisdiction 1, which has a large number of voting-age residents. However, these estimates differ in jurisdictions with fewer voting-age persons. This pattern occurs because direct estimates perform well in large samples, but poorly when sample sizes are small. Model-based estimates use vectors of auxiliary covariates, denoted in Table 1 by X, to help improve their low sample performance.

The new model introduced in this article uses Pólya-gamma (PG) augmentation [6] to estimate Bayesian models for all LMGs and jurisdictions. This enables the new model to more easily implement stochastic search variable selection (SSVS) within each LMG to select covariates, which allows our new model to outperform the older models in terms of diagnostic metrics.

1.2. Utility of Pólya-Gamma Augmentation in Bayesian Methods

Pólya-gamma (PG) augmentation, as introduced by Polson et al. [6], is an efficient method for Bayesian inference in models with binomial likelihoods. The application of PG augmentation yields closed-form expressions for full conditionals of posterior distributions in sampling algorithms like Markov chain Monte Carlo (MCMC), which eliminates the need for analytic approximations, numerical integration, or Metropolis–Hastings algorithms. This allows the construction of simple and effective Gibbs samplers for models with binomial likelihoods, particularly logistic regression. The method has been successfully applied to various models, including logistic regression and nonlinear mixed-effects models [6]. In practice, the PG Gibbs sampler for Bayesian logistic regression involves two main steps in each iteration: sampling PG random variables and drawing from a multivariate normal distribution. This approach has been implemented in various statistical software packages, such as Version 2.1 of the R package BayesLogit [7].

The remainder of this paper is organized into the following sections. Section 2 reviews the 2021 VRA model, introduces PG augmentation, and explains a basic application in the context of the VRA data. Section 3 builds on this foundation to introduce the new Bayesian multinomial logit model, and describes a Gibbs sampling algorithm that makes use of PG augmentation to fit it. Section 4 applies the new model to the data used for the 2021 VRA determinations and shows its superior diagnostic results compared with the 2021 model results. Finally, Section 5 discusses the results.

2. The Pólya-Gamma Distribution for Binomial VRA Datasets

2.1. The 2021 VRA Model

The 2021 VRA model was constructed by modeling the counts mentioned in Section 1 using a multinomial distribution. Specifically, for each jurisdiction j where

j \in {1, \dots, 7859}

, and

n_{j}

is the number of voting-age people in j, we model

M_{j, 1}, M_{j, 2}, M_{j, 3}, M_{j, 4} \sim M u l t i n o m (n_{j}, (p_{j, 1}, p_{j, 2}, p_{j, 3}, p_{j, 4})),

(1)

where

M_{j, 1}

,

M_{j, 2}

,

M_{j, 3}

, and

M_{j, 4}

are

V O T_{j}

-

C I T_{j}

,

C I T_{j}

-

L E P_{j}

,

L E P_{j}

-

I L L_{j}

, and

I L L_{j}

, respectively.

An equivalent formulation of this model treats the M variables in Equation (1) as the composition of three conditionally binomial (

B i n o m

) models:

\begin{matrix} M_{j, 1} \sim & B i n o m (n_{j}, p_{j, 1}) \\ M_{j, 2} \sim & B i n o m (n_{j} - M_{j, 1}, \frac{p_{j, 2}}{1 - p_{j, 1}}) given M_{j, 1} \\ M_{j, 3} \sim & B i n o m (n_{j} - M_{j, 1} - M_{j, 2}, \frac{p_{j, 3}}{1 - p_{j, 1} - p_{j, 2}}) given M_{j, 1}, M_{j, 2}, \end{matrix}

(2)

The intuition behind the success probabilities in these distributions is as follows. In the multinomial model,

p_{j, 1}

represents the marginal probability that a voting-age person is not a citizen. Thus, if we ignore the other categories,

M_{j, 1}

can be modeled as a binomial distribution counting the number of voting-age non-citizens against the number of voting-age citizens. Similarly, the marginal probability that a person is a voting-age citizen and is not LEP is

p_{j, 2}

in the multinomial model. It can be inferred that

1 - p_{1}

is the marginal probability that a voting-age person is a citizen. Thus, the probability of a voting-age person not having limited English proficiency conditional on being a citizen is

p_{j, 2} / (1 - p_{j, 1})

. By the same logic, the marginal probability that a voting-age person is a citizen and LEP but not ILL is

p_{3}

in the multinomial model, and the marginal probability that a voting-age person is a citizen and is LEP is

1 - p_{j, 1} - p_{j, 2}

. Thus, the conditional probability of a voting-age person not being ILL given that they are a citizen and have LEP is

p_{j, 3} / (1 - p_{j, 1} - p_{j, 2})

.

M_{j, 4}

is known given

M_{j, 1}, M_{j, 2}, M_{j, 3}

, and

n_{j}

, and thus does not need to be modeled.

The complements of the “success” probabilities of the binomial distributions for

M_{j, 1}

,

M_{j, 2}

, and

M_{j, 3}

can be interpreted as estimates of the ratios

v_{j, 1} = C I T_{j} / V O T_{j}

,

v_{j, 2} = L E P_{j} / C I T_{j}

, and

v_{j, 3} = I L L_{j} / L E P_{j}

in jurisdiction j. With this notation, the probabilities in Equation (1) can be written for each jurisdiction j as

(p_{j, 1}, p_{j, 2}, p_{j, 3}, p_{j, 4}) \equiv (1 - v_{j, 1}, v_{j, 1} (1 - v_{j, 2}), v_{j, 1} v_{j, 2} (1 - v_{j, 3}), v_{j, 1} v_{j, 2} v_{j, 3})

This means that we can model the multinomial success probabilities in (1) by modeling the binomial success probabilities in (2), and we can model the binomial success probabilities in (2) by modeling the

v_{j, k}

ratios. To model the

v_{j, k}

ratios, we assume that

v_{j, k} = \frac{exp (η_{j, k})}{1 + exp (η_{j, k})},

(3)

where

η_{j, k} \equiv X_{j}^{T} β_{k} + u_{j, k}

and the u are multivariate normal random variables with a zero mean and variance parameter

Σ = (\begin{matrix} σ_{1}^{2} & σ_{12} & σ_{13} \\ σ_{12} & σ_{2}^{2} & σ_{23} \\ σ_{13} & σ_{23} & σ_{3}^{2} \end{matrix}) .

(4)

This specification is called the “multinomial logit normal” (MLN) model due to the multinomial outcome described in Equation (1), the logistic link function in Equation (3), and the multivariate normal random effects. When these parameters are estimated, we can use them to derive model-based estimates of

π_{C I T}

,

π_{L E P}

, and

π_{I L L}

similar to the hypothetical examples in Table 1. Specifically, for jurisdiction j, we estimate

π_{C I T}

as

v_{j, 1}

,

π_{L E P}

as

v_{j, 1} \times v_{j, 2}

, and

π_{I L L}

as

v_{j, 1} \times v_{j, 2} \times v_{j, 3}

. The difficulty of estimating these parameters stems from the nonlinearity of the logistic link function in Equation (3). The Pólya-gamma augmentation makes the estimation process linear, resulting in closed-form solutions for the posterior distributions of our parameters.

2.2. The Pólya-Gamma (PG) Distribution

A PG random variable is defined as an infinite weighted sum of gamma random variables. Specifically, if

ω \sim P G (b, c)

, where b > 0 and c∈

R

, then:

ω \overset{d}{=} \frac{1}{2 π^{2}} \sum_{k = 1}^{\infty} \frac{g_{k}}{{(k - 0.5)}^{2} + \frac{c^{2}}{4 π^{2}}}

where

\overset{d}{=}

denotes equality in distribution, and

g_{k} \sim G a m m a (b, 1)

are independent gamma random variables. Polson et al. [6] proved the following useful property of the PG

ω

variable. Let

ψ

denote a constant, e.g., a linear combination of covariates X and their coefficients

β

. Then, the following equation holds for an exponential functional form:

\frac{{(e^{ψ})}^{a}}{{(1 + e^{ψ})}^{b}} = 2^{- b} e^{κ ψ} \int_{0}^{\infty} e^{- ω ψ^{2} / 2} P (ω) d ω

(5)

where

κ = a - b / 2

and

p (ω) = P G (ω | b, 0)

. The utility of the above equation is that for

j = 1, \dots, N

, we can express the contribution of jurisdiction j to the likelihood as

\begin{matrix} L_{j} (β) & = \frac{{(e^{ψ_{j}})}^{y_{j}}}{{(1 + e^{ψ_{j}})}^{n_{j}}} \\ = 2^{- n_{j}} e^{κ_{j} ψ_{j}} \int_{0}^{\infty} e^{- ω_{j} ψ_{j}^{2} / 2} p (ω_{j}) d ω_{j}, \end{matrix}

(6)

where

κ_{j} = y_{j} - n_{j} / 2 .

Note that if we condition on

Ω = d i a g (ω_{1}, \dots, ω_{N})

, the kernel of the posterior distribution for

β

looks similar to a Gaussian density function:

\begin{matrix} P (β | Ω, y, n) & = P (β) \prod_{j = 1}^{N} L_{j} (β | ω_{j}) \end{matrix}

(7)

\begin{matrix} \propto P (β) \prod_{j = 1}^{N} e^{κ_{j} ψ_{j}} e^{- ω_{j} ψ_{j}^{2} / 2} \end{matrix}

(8)

\begin{matrix} \propto P (β) \prod_{j = 1}^{N} exp \{- \frac{ω_{j}}{2} {(ψ_{j} - \frac{κ_{j}}{ω_{j}})}^{2}\} \end{matrix}

(9)

\begin{matrix} = P (β) exp \{- \frac{1}{2} {(Z - X β)}^{T} Ω (Z - X β)\} \end{matrix}

(10)

where

Z = (κ_{1} / ω_{1}, \dots, κ_{N} / ω_{N})

. Step (8) is justified because conditional on

ω_{j}

, the expectation in (6) is constant. Step (9) is derived by completing the square. From this, Step (10) can be derived using algebra.

In summary, if our prior on

β

is Gaussian (quadratic in

β

), then (7) has a closed-form solution. This enables us to construct a Gibbs sampler for

β

, by repeatedly sampling

Ω | β

and then sampling

β | Ω

.

2.3. A Simplified Application of Pólya-Gamma Augmentation

To illustrate the application of PG augmentation to the 2021 VRA model, consider jurisdiction j, for

j \in {1, \dots, N}

, where N denotes the total number of jurisdictions. Let

n_{j}

denote the number of voting-age citizens,

v_{j}

denote the probability that a voting-age citizen is LEP, and

y_{j}

denote the number of voting-age citizens with LEP. For simplicity, we do not include random effects in the example for this section. Then, PG augmentation as introduced by Polson et al. [6] can be applied to write the probability of observing

y_{j}

out of

n_{j}

as

{(1 - v_{j})}^{n_{j} - y_{j}} v_{j}^{y_{j}} = \frac{exp {(ψ_{j})}^{y_{j}}}{{(1 + exp (ψ_{j}))}^{n_{j}}} \propto N (κ_{j} / ω_{j} | X_{j} β, ω_{j}^{- 1}),

(11)

where

κ_{j} = (y_{j} - n_{j} / 2)

,

ω_{j} \sim P G (n_{j}, ψ_{j})

, and

ψ_{j} = X_{j} β

. Following Polson et al. [6], we let

Z = (κ_{1} / ω_{1}, \dots, κ_{N} / ω_{N})

, which allows us to write the

β

posterior distribution as

\begin{matrix} P (β | y, ω) \propto & P (β) \prod_{j = 1}^{N} L_{j} (β | ω_{j}) \propto P (β) exp \{- \frac{1}{2} {(Z - X β)}^{T} Ω (Z - X β)\} \\ = & N (β | b, B) N (Z | X β, Ω^{- 1}) \\ = & N (β | m_{ω}, V_{ω}), \end{matrix}

where

X = {[X_{1}^{T}, \dots, X_{N}^{T}]}^{T}

,

κ = {[κ_{1}, \dots, κ_{N}]}^{T}

,

Ω = d i a g (ω_{1}, \dots, ω_{N})

,

V_{ω} = {(X^{T} Ω X + B^{- 1})}^{- 1}

, and

m_{ω} = V_{ω} (X^{T} κ + B^{- 1} b)

and a standard normal distribution

N (b, B)

for the

β

prior. Thus, a simple Gibbs sampling algorithm to fit this model iterates:

\begin{matrix} (ω_{j} | β) \sim & P G (n_{j}, X_{j}^{T} β) \\ (β | y_{j}, n_{j}, ω_{j}) \sim & N (m_{ω}, V_{ω}) . \end{matrix}

This Gibbs sampling algorithm was implemented using the R package BayesLogit [7] and provides closed-form equations for the

β

posterior. This allows users to adopt methods developed for linear regressions, including the Bayesian model selection procedure [8]. Appendix A explains the derivation of the closed-form

β

posterior distribution.

3. New MLN Model with PG Augmentation

3.1. Basic Setup

In this section, we introduce the model specification for our new Bayesian VRA model. This new model is distinguished by our adoption of stochastic search variable selection [8], which is the most important difference between it and previous VRA models. However, the new model retains the multinomial logit normal structure from the 2021 VRA model [1] described in Section 2.1. In particular, we can write this model as a composition of three conditional binomial distributions with success probabilities

v_{1}

,

v_{2}

, and

v_{3}

. Table 2 summarizes the relationships between the multinomial category proportions

p_{1}

,

p_{2}

,

p_{3}

, and

p_{4}

and the modeled ratios

v_{1}

,

v_{2}

, and

v_{3}

in this model. It also shows the contribution that each jurisdiction, American Indian Area (AIA), and/or Alaskan Native Regional Corporation (ANRC) makes to the log-likelihood when fitting this statistical model.

Note that the log-likelihood contribution for

v_{k}

compares

M_{k}

with

\sum_{c = k + 1}^{4} M_{c}

for k = 1, 2, 3.

3.2. Model Likelihood and Priors

Regression coefficients

β

, the random effects u, and the covariance matrix

Σ

are specified with the following likelihood contribution for the jth jurisdiction:

\begin{matrix} L H_{j} = & {(1 - v_{j, 1})}^{V O T_{j} - C I T_{j}} v_{j, 1}^{C I T_{j}} {(1 - v_{j, 2})}^{C I T_{j} - L E P_{j}} v_{j, 2}^{L E P_{j}} {(1 - v_{j, 3})}^{L E P_{j} - I L L_{j}} v_{j, 3}^{I L L_{j}} \\ = & {(1 - v_{j, 1})}^{M_{j, 1}} v_{j, 1}^{\sum_{c = 2}^{4} M_{j, c}} {(1 - v_{j, 2})}^{M_{j, 2}} v_{j, 2}^{\sum_{c = 3}^{4} M_{j, c}} {(1 - v_{j, 3})}^{M_{j, 3}} v_{j, 3}^{M_{j, 4}} \\ = & \prod_{c = 1}^{3} [\frac{{(exp (η_{j, c}))}^{\sum_{k = c + 1}^{4} M_{j, k}}}{{(1 + exp (η_{j, c}))}^{\sum_{k = c}^{4} M_{j, k}}}], \end{matrix}

Using this notation, the likelihood for all N jurisdictions multiplied by the prior distributions for the parameters can be expressed as

\prod_{j = 1}^{N} L H_{j} \times \prod_{k = 1}^{3} P (β_{k}) P (Σ) P (U_{j, k} = u_{j, k}) .

(12)

The Bayesian model in Equation (12) treats the random effects variables u as parameters to be estimated. In contrast, frequentist inference averages out random effects as “missing data” in the optimization process. However, the widely used frequentist adaptive Gaussian quadrature method [4] for fitting random effects models treats the random effects as parameters in the process of averaging them out. Thus, both Bayesian and frequentist inferences estimate random effects parameters from this technical perspective.

3.3. Conditional Posterior Distributions for Regression Coefficients and PG Variables

For each outcome

k \in {1, 2, 3}

and jurisdiction

j \in {1, \dots, 7859}

, the regression coefficients

β_{k}

and PG random variables

ω_{j, k}

are sampled similarly to what was done in Polson et al. [6]. Recall the key result from Section 2 that, conditionally given

ω

,

exp {(ψ)}^{y} / {(1 + exp (ψ))}^{n} \propto N (ω^{- 1} (y - n / 2) | ψ, ω^{- 1})

, which indicates a Gaussian normal distribution for

ω^{- 1} (y - n / 2)

with its mean

ψ

and variance

ω^{- 1}

. Using this result it can be shown that, conditional on

ω_{j, k}

, the posterior distributions for the

β_{k}

coefficients have the following closed-form representations:

$V O T_{j} - C I T_{j}$ vs. $C I T_{j}$ ( $\sum_{k = 1}^{4} M_{j, k}$ - $\sum_{k = 2}^{4} M_{j, k}$ vs. $\sum_{k = 2}^{4} M_{j, k}$ )

$β_{1} \sim N (V_{β_{1}}^{- 1} m_{β_{1}}, V_{β_{1}}^{- 1})$

(13)

conditional on $ω_{j, 1}$ being drawn from $ω_{j, 1} \sim P G (\sum_{k = 1}^{4} M_{j, k}, η_{j, 1}) \overset{d}{=} P G (V O T_{j}, η_{j, 1})$ for each j.
$C I T_{j}$ − $L E P_{j}$ vs. $L E P_{j}$ ( $\sum_{k = 2}^{4} M_{j, k}$ − $\sum_{k = 3}^{4} M_{j, k}$ vs. $\sum_{k = 3}^{4} M_{j, k}$ )

$β_{2} \sim N (V_{β_{2}}^{- 1} m_{β_{2}}, V_{β_{2}}^{- 1})$

(14)

conditional on $ω_{j, 2}$ being drawn from $ω_{j, 2} \sim P G (\sum_{k = 2}^{4} M_{j, k}, η_{j, 2}) \overset{d}{=} P G (C I T_{j}, η_{j, 2})$ for each j.
$L E P_{j}$ − $I L L_{j}$ vs. $I L L_{j}$ ( $\sum_{k = 3}^{4} M_{j, k} - \sum_{k = 4}^{4} M_{j, k}$ vs. $\sum_{k = 4}^{4} M_{j, k}$ )

$β_{3} \sim N (V_{β_{3}}^{- 1} m_{β_{3}}, V_{β_{3}}^{- 1})$

(15)

conditional on $ω_{j, 3}$ being drawn from $ω_{j, 3} \sim P G (\sum_{k = 3}^{4} M_{j, k}, η_{j, 3}) \overset{d}{=} P G (L E P_{j}, η_{j, 3})$ for each j.

Here, for each outcome category

k \in {1, 2, 3}

, we define:

η_{j, k} = X_{j}^{T} β_{k} + u_{j, k}

,

X = {[X_{1}^{T}, \dots, X_{N}^{T}]}^{T}

,

κ_{j, k} = \sum_{c = k + 1}^{4} M_{j, c} - \frac{1}{2} \sum_{c = k}^{4} M_{j, c}

,

κ_{k} = {[κ_{1, k}, \dots, κ_{N, k}]}^{T}

,

Ω_{k} = d i a g (ω_{1, k}, \dots, ω_{N, k})

,

U_{k} = {[u_{1, k}^{T}, \dots, u_{N, k}^{T}]}^{T}

,

V_{β_{k}} = X^{T} Ω_{k} X + B^{- 1}

,

m_{β_{k}} = X^{T} κ_{k} - X^{T} Ω_{k} U_{k} + B^{- 1} b

, and b and B are from the normal

β_{k}

prior

N (b, B)

. In general, the conditional

β_{k}

posterior for outcome category k is

P (β_{k} | X, κ_{k}, U_{k}, Ω_{k}) \sim N (V_{β_{k}}^{- 1} m_{β_{k}}, V_{β_{k}}^{- 1})

. This posterior distribution is derived in Appendix B.

3.4. Conditional Posterior Distribution for the Variable Selection Parameters

George and McCulloch [8] developed a Bayesian variable selection procedure known as stochastic search variable selection (SSVS) in the context of linear regression. Recall that the regression in Equation (11) assumes the

β

prior distribution

N (b, B)

. To perform SSVS, we specify the prior distribution for

β

as a mixture of two normal distributions: one with a smaller variance to signify a coefficient close to zero, and the other with a larger variance to signify a coefficient farther away from zero. Specifically, George and McCulloch [8] introduced the latent inclusion variable

γ_{i}

(1 if included, 0 otherwise) for the normal mixture prior distribution for the ith coefficient

β_{i}

(

i \in {1, \dots, q}

):

(1 - γ_{i}) N (0, τ_{i}^{2}) + γ_{i} N (0, c_{i}^{2} τ_{i}^{2}) .

(16)

The interpretation of this equation is as follows. When

γ_{i}

is 0,

β_{i}

is assumed to have N(0,

τ_{i}^{2}

), and when

γ_{i}

is 1,

β_{i}

is assumed to have N(0,

c_{i}^{2} τ_{i}^{2}

).

τ_{i}

is set to be small so that if

γ_{i}

is 1, then

β_{i}

will be close to 0.

c_{i}

is set to be large (

c_{i}

> 1 always) so that if

γ_{i}

is 1, then a non-zero estimate of

β_{i}

will be included in the model. For specific choices of

γ_{i}

and

c_{i}

, George and McCulloch [8] recommended that users should consider varying the settings to extract more information rather than treat any particular settings as rules that guarantee good results.

To implement the normal mixture prior in Equation (16), the prior covariance matrix B from the prior distribution

N (b, B)

is specified as

D_{γ} R D_{γ}

, where

γ = (γ_{1}, \dots, γ_{q})

, R is the prior correlation matrix (an identity matrix), and

D_{γ} \equiv d i a g (a_{1} τ_{1}, \dots, a_{q} τ_{q}),

with

a_{i}

= 1 if

γ_{i}

= 0 and

a_{i}

=

c_{i}

if

γ_{i}

= 1. That is,

a_{i} τ_{i}

= {

(1 - γ_{i})

+

γ_{i} c_{i}

}

\times τ_{i}

and

a_{i}^{2} τ_{i}^{2}

= {

(1 - γ_{i})

+

γ_{i} c_{i}^{2}

}

\times τ_{i}^{2}

. Thus,

D_{γ}

determines the scaling of the prior covariance matrix in such a way that Equation (16) is satisfied.

The conditional posterior distribution of

γ_{i}^{(m)}

for the ith covariate at the mth MCMC sampling step is

P (γ_{i}^{(m)} = 1 | β^{(m - 1)}, γ_{(i)}^{(m)}),

where

γ_{(i)}^{(m)}

= (

γ_{1}^{(m)}

,…,

γ_{i - 1}^{(m)}

,

γ_{i + 1}^{(m - 1)}

,…,

γ_{p}^{(m - 1)}

), and

β^{(m - 1)}

is the coefficients vector for the

(m - 1)

th MCMC sample with

m \in {1, \dots, 10^{5}}

. Notice that the posterior for

γ_{i}^{(m)}

does not depend on the rest of the parameters, including the outcomes. This yields a substantial simplification that reduces computational requirements and allows for faster convergence of the MCMC subsequence for all

γ_{i}

. Finally,

P (γ_{i}^{(m)} = 1 | β^{(m - 1)}, γ_{(i)}^{(m)}) = \frac{a}{a + b},

where

\begin{matrix} a = & P (β^{(m)} | γ_{(i)}^{(m)}, γ_{i}^{(m)} = 1) \times p_{i} \end{matrix}

and

\begin{matrix} b = & P (β^{(m)} | γ_{(i)}^{(m)}, γ_{i}^{(m)} = 0) \times (1 - p_{i}), \end{matrix}

where

p_{i}

was chosen to be 0.5, implying equal weights for a and b.

3.5. Conditional Posterior Distribution for Random Effects Covariance $Σ$

Recall U = (

u_{1}, u_{2}, \dots, u_{N}

) and

u_{j}

= (

u_{j, 1}

,

u_{j, 2} u_{j, 3}

).

Σ

was defined in Equation (4). Assuming U has a multivariate normal distribution

M V N (0, Σ)

, the conjugate

Σ

posterior is

I W (\sum_{j = 1}^{N} u_{j} u_{j}^{T} + V_{0}, N + s_{0}),

where

V_{0}

can be set to be

I_{3 \times 3}

, and

s_{0}

can be set to 3, which is the number of rows of

Σ

. These are typical choices for the inverse Wishart priors.

s_{0}

can be chosen to be bigger than 3, but larger values of

s_{0}

tend to attenuate the random effect covariance elements because the elements of the posterior mean

(\sum_{j = 1}^{N} u_{j} u_{j}^{T} + V_{0}) / (N + s_{0} - 3 - 1)

will get closer to zero as

s_{0}

increases. We keep

s_{0}

at 3, and set the off-diagonial elements to zero. Slud et al. [1] called the multinomial logit model with this random effects covariance structure “MLN-D”. When the off-diagonal elements were not set to zero, Slud et al. [1] called the model “MLN-F”. Fitting MLN-D is generally more stable than fitting MLN-F, but this does not necessarily mean that one is better than the other in terms of diagnostic results.

3.6. Conditional Posterior Distribution for Random Effects Variables

Consider a Gaussian prior distrubtion

N (0, Σ)

on

u_{j}

=

(u_{j, 1}, u_{j, 2}, u_{j, 3})

. Then, the conditional posterior

P (u_{j} | Σ)

can be shown to be

N (V_{j, u}^{- 1} m_{j, u}, V_{j, u}^{- 1})

, where

V_{j, u} = Ω_{j}^{*} + Σ^{- 1}

,

m_{j, u} = Ω_{j}^{*} M_{j}^{* u}

,

Ω_{j}^{*} = d i a g (ω_{j, 1}, ω_{j, 2}, ω_{j, 3})

,

M_{j, k}^{* u} = M_{j, k}^{*}

-

X_{j} β_{k}

,

M_{j, k}^{*} = ω_{j, k}^{- 1} (\sum_{c = k + 1}^{4} M_{j, c} - \frac{1}{2} \sum_{k = c}^{4} M_{j, c})

, and

η_{j, k} \equiv X_{j}^{T} β_{k}

+

u_{j, k}

.

Note that

m_{j, u}

=

Ω_{j}^{*} M_{j}^{* u}

can be expressed as:

\begin{matrix} (\begin{matrix} ω_{j, 1} & 0 & 0 \\ 0 & ω_{j, 2} & 0 \\ 0 & 0 & ω_{j, 3} \end{matrix}) \times (\begin{matrix} M_{j, 1}^{*} - X_{j} β_{1} \\ M_{j, 2}^{*} - X_{j} β_{2} \\ M_{j, 2}^{*} - X_{j} β_{k} \end{matrix}) = & (\begin{matrix} \sum_{k = 2}^{4} M_{j, k} - \frac{1}{2} \sum_{k = 1}^{4} M_{j, k} - ω_{j, 1} \times X_{j} β_{1} \\ \sum_{k = 3}^{4} M_{j, k} - \frac{1}{2} \sum_{k = 2}^{4} M_{j, k} - ω_{j, 2} \times X_{j} β_{2} \\ \sum_{k = 4}^{4} M_{j, k} - \frac{1}{2} \sum_{k = 3}^{4} M_{j, k} - ω_{j, 3} \times X_{j} β_{3} \end{matrix}) . \end{matrix}

This is computationally helpful because

M_{j, k}^{*}

is equal to

ω_{k}^{- 1} (\sum_{c = k + 1}^{4} M_{j, c} - \frac{1}{2} \sum_{k = c}^{4} M_{j, c})

. Calculating this expression requires taking the inverse of

ω_{j, k}

, which may be close to zero. For numerical stability, we offset

ω_{j, k}^{- 1}

by multiplying it by

ω_{j, k}

and set the mininum value of

ω_{j, k}

to

10^{- 16}

.

3.7. Gibbs Sampling Algorithm

The following Gibbs sampling steps are for respective posterior distributions for the aforementioned parameters—

β

,

ω

,

γ

,

Σ

, and u—with each step being conditioned on all other sampled parameters from previous steps. The initial values were drawn from either a standard normal distribution or a uniform distribution.

$ω_{j, k}$ (see Section 3.3): Draw PG random values $ω_{j, k}$ for each jurisdiction $j \in {1, \dots, N}$ and outcome category $k \in {1, 2, 3}$ as follows:

$ω_{j, k} \sim P G (\sum_{c = k}^{4} M_{j, c}, η_{j, k})$

(17)

where $η_{j, k}$ is as defined in Section 3.3.
$β_{k}$ (see Section 3.3): Draw coefficients $β_{k}$ for $k \in {1, 2, 3}$ as follows:

$β_{k} \sim N (V_{β_{k}}^{- 1} m_{β_{k}}, V_{β_{k}}^{- 1})$

(18)

where $V_{β_{k}}$ and $m_{β_{k}}$ are defined as in Section 3.3, and the prior covariance matrix B for $β_{k}$ is defined as in Section 3.4.
$γ_{i}$ (see Section 3.4): For each covariate $i \in {1, \dots, p}$ , draw

$γ_{i} \sim P (γ_{i} = 1 | Y, β, γ_{(i)}) = \frac{N (β | 0, D_{γ} R D_{γ}; γ_{(i)}, γ_{i} = 1) \times p_{i}}{\sum_{k \in {0, 1}} N (β | 0, D_{γ} R D_{γ}; γ_{(i)}, γ_{i} = k) p_{i}^{k} {(1 - p_{i})}^{1 - k}} .$

where the matrices $D_{γ}$ and R, as well as the proportions $p_{i}$ for each i, are as defined in Section 3.4.
$Σ$ (see Section 3.5): Draw

$Σ \sim I W (\sum_{j = 1}^{N} u_{j} u_{j}^{T} + V_{0}, N + s_{0})$

where $u_{j}$ is defined as in Section 3.5, $V_{0}$ is set to be the $3 \times 3$ identity matrix $I_{3 \times 3}$ , and $s_{0}$ is set to 3, due to the three disjoint outcome categories.
$u_{j}$ (see Section 3.6): For each $j \in {1, \dots, N}$ , draw

$u_{j} = (u_{j, 1}, u_{j, 2}, u_{j, 3}) \sim N (V_{j, u}^{- 1} m_{j, u}, V_{j, u}^{- 1}),$

where $V_{j, u}$ and $m_{j, u}$ are as defined in Section 3.6.

4. 2021 VRA Data Analysis Results

4.1. Data Description

This paper uses the ACS 2015–2019 5-year data, which was used for the most recent 2021 VRA determination. The ACS releases 1-year and 5-year data products. The 5-year products aggregate and re-weight data collected over a 5-year period, allowing increased precision of population estimates. The 5-year data are particularly useful for estimating features of small geographic areas or small domains in which 1-year estimates are too imprecise for release under Census Bureau statistical quality guidelines.

Data for the four disjoint categorical outcomes described in Equation (1) were obtained from the ACS 2015–2019 data, as were the following covariates:

Logit-transformed fraction of voting-age persons who are citizens;
Logit-transformed fraction of citizens that are limited English-proficient;
Proportion of voting-age persons who are non-Hispanic White in each geography;
Proportion of voting-age persons with no college education in each geography;
Average number of voting-age people per housing unit in each geography;
Average age among voting-age persons in any AIAN LMG in each geography;
Proportion of voting-age persons in poverty in each geography;
Proportion of voting-age persons speaking a language other than English at home in each jurisdiction;
Proportion of foreign-born voting-age persons in each jurisdiction;
Average years in US (as of 2019) of voting-age foreign-born persons in each jurisdiction.

4.2. Model Comparison

This section highlights the improvement in model diagnostic results made by the Bayesian model described in this paper over the diagnostic results for the model used to produce the 2021 VRA determinations. We estimate the new model for 73 Asian, Hispanic, and Native American LMGs in the 7859 jurisdictions with respondents in the 2015–2019 ACS.

Note that direct population estimates from the ACS are reliable when they are made within domains with large sample sizes. Slud et al. [1] evaluate the performance of models by examining the discrepancy between their estimates and direct ACS estimates when aggregated over larger domains. The model is judged to be performing well when the direct and model-based estimates agree in these large sample domains.

We follow Slud et al. [1] in using three metrics for model evaluation:

Delta

,

PctRel

Δ

, and

Stdiz

Δ

.

Delta

is the difference between model predictions

\tilde{N}

and the ACS direct estimates

\hat{N}

.

PctRel

Δ

is the percent relative difference

100 \cdot (\tilde{N} - \hat{N}) / \hat{N}

.

Stdiz

Δ

is the standardized relative difference

(\tilde{N} - \hat{N}) / S E (\hat{N})

, where

S E (\hat{N})

is the square root of the “successive difference replication” (SDR) estimate of the variance of the direct estimate

\hat{N}

described in the Appendix of Slud et al. [1]. This is the standard error estimate generally used by the ACS, and for large domains, the standardized difference

(\hat{N} - N_{t r u e}) / S E (\hat{N})

is distributed approximately as a standard normal. Because of this,

Stdiz

Δ

roughly measures the departure from the null hypothesis that the

\tilde{N}

prediction and

\hat{N}

estimate are the same. However, since model-based predictions are conditioned on the direct estimate,

\tilde{N}

and

\hat{N}

will be positively correlated, and

Stdiz

Δ

will be smaller in absolute value than a

N (0, 1)

deviate. We also follow Slud et al. [1] by evaluating these estimates in domains created by aggregating jurisdictions with voting-age sample sizes in the following pre-specified ranges: 1–4, 5–12, 13–25, 26–50, 51–200, and 201–

10^{6}

(the release of the statistics in this paper has been approved by the Census Bureau’s Disclosure Review Board (DRB) with the approval number CBDRB-FY25-0130). Priors were chosen such that model-diagnostic results were optimized.

4.2.1. Bangladeshi LMG Results

Table 3 compares the 2021 VRA model results for the Bangladeshi LMG published by Slud et al. [1] to the model results for the same LMG from our new Bayesian MLN model. In Table 3, the first three lines are results from the 2021 VRA model and the next three lines are results from the new model.

For the jurisdictions with the number of voting-age persons larger than 4, i.e., [5, 12], [13, 25],…, [201,

1 \times 10^{6}

], both the new Bayesian MLN model and the 2021 VRA model provided comparable Stdiz

Δ

values that were less than 1.96, the 97.5th percentile point of the standard normal distribution. The new Bayesian MLN model outperformed the 2021 VRA model for the jurisdictions with smaller number of voting-age persons ranging from 1 to 4; the old model resulted in a StdzDelt of 3.2, while the new model produced 1.2. The enhanced prediction accuracy in small areas can likely be attributed to the impact of the SSVS variable selection procedure in the new model.

4.2.2. Sri Lankan LMG Results

Table 4 shows similar results for the Sri Lankan LMG, which was also used as an example in Slud et al. [1]. Like the Bangladeshi LMG, the new Bayes MLN model resulted in more accurate prediction for jurisdictions with a voting-age sample size ranging from 1 to 4.

4.2.3. Overall Results

Table 5 compares the results for the new Bayesian MLN and old 2021 VRA models across all LMGs. The sum of absolute Stdiz

Δ

values across all 73 LMGs was produced for each sample size bin. The 73 LMGs include 21 Asian LMGs, 51 American Indian and Alaska Native (AIAN) LMGs, and a single Hispanic LMG that includes all racial groups. Overall, the sum of the absolute StdzDelt values across all of these LMGs is smaller for the new model than it was for the 2021 VRA model.

4.2.4. Improvement Rates

For each sample size bin, the first row of Table 6 shows the number of LMGs with absolute Stdiz

Δ

statistics larger than 1.96 in the 2021 VRA models, and the second shows the number of LMGs with absolute Stdiz

Δ

statistics larger than 1.96 in the new Bayes model. The third row shows the number of LMGs with absolute Stdiz

Δ

values greater than 1.96 in at least one of the two models, and the fourth row shows the number of LMGs where both models produce absolute Stdiz

Δ

values greater than 1.96. The fifth row of Table 6 shows the number of LMGs where |Stdiz

Δ

| > 1.96 for both the old and new models, but the new model produces better absolute Stdiz

Δ

values than the old model. Finally, the sixth row shows the same data as the fifth row as a percentage of the number of LMGs where at least one model has |Stdiz

Δ

| > 1.96.

It should be noted that in most LMGs, the (0, 4] sample size bin contains more jurisdictions than the other bins, since many LMGs are sparsely sampled. In the first two rows of this table, we see that the number of LMGs with model estimates that are more than 1.96 SDR SDs away from the direct estimates in the (0, 4] sample size bin decreases by almost half in the new model compared to the old model. These numbers remain the same as the old model in the other sample size bins, except in the (4, 12] bin, for which there is one additional LMG that is “far” from the direct estimates. We also see that the new model performs better than the old model even when the model estimates are far from the direct estimates. In particular, the final row of Table 6 shows that in 85% of LMGs where the model estimates were more than 1.96 SDs away from the direct estimates, the new model performed better than the old model.

4.2.5. Residual Plots

Figure 1 compares the performance of the old and new models. Each point represents a jurisdiction across all 73 LMGs. The residual value here is the difference between “Direct” and "Model" estimates. LEPprop denotes the proportion of voting-age citizens who have limited English proficiency (or LEP / CIT), while ILLrat represents the illiteracy rate among citizens with limited English proficiency (or ILL / LEP). If all the points in this residual plot were on the 45-degree line, that would indicate that there was no difference between the old and new models. The closer the points are to a horizontal line in the center of the plot, the better the new model is performing for those jurisdictions. The fact that both plots show several points below the 45-degree lines and above the horizontal line at the center indicates that the old model has larger residuals than the new model for those jurisdictions. This pattern is consistent through all sample size bins.

5. Discussion

In this paper, a new statistical methodology was proposed for the Bayesian multinomial logit model for US Voting Rights Act Section 203 determinations. Our methodological contribution involved adopting PG augmentation to linearize the nonlinear optimization problem, resulting in closed-form solutions for posterior distributions. This enhanced computational stability in the Gibbs sampling steps, which enabled us to fit one model for all jurisdictions. The former 2021 VRA determinations used different models for different jurisdictions—that is, if one model did not fit, then other models were tried for the same jurisdiction data.

Our method was developed based on the works of George and McCulloch [8] and Polson et al. [6]. George and McCulloch [8] did not suggest formal convergence criteria or diagnostics. Instead, they relied on empirical observations of convergence. Polson et al. [6] implemented the PG sampling algorithm in the R package BayesLogit. Note that while convergence diagnostics such as trace plots can indicate non-convergence, they cannot definitively prove convergence, especially given the substantial number of parameters in our analysis. The ultimate evidence of convergence was the improvement in the new model’s diagnostic measures over the 2021 VRA model.

Using SSVS requires the specification of the prior parameters

c_{i}

and

τ_{i}

. Setting

τ_{i}

to 0.5 and

c_{i}

to 10 produced desirable diagnostic results for all LMGs, as discussed in Section 4. In exchange for the additional effort required to identify suitable SSVS priors, the new PG-based Bayesian model demonstrates improved efficiency compared to the previous STAN-based Bayesian model developed for the 2021 VRA project. For example, the new model produced parameter estimates for the Bangladeshi LMG’s 517 jurisdictions in approximately 8 min. In contrast, the 2021 Bayesian VRA model for this LMG could not be estimated at all using STAN. The use of other Bayesian variable selection procedures, such as Bayesian LASSO [9], is a topic of interest for future research.

The estimates from the new model can be viewed as Bayesian model averages, because the posterior parameters were sampled and averaged over various models that had different sets of selected covariates. The overall results for the 2021 VRA data showed that the new model, which used stochastic search variable selection(SSVS) [8] separately in each LMG, outperformed the previous model, which used the same covariates in every LMG. The diagnostic data presented for the Bangladeshi and Sri Lankan LMGs are examples of how the new model yielded better results than the 2021 VRA models, especially in jurisdictions with small sample sizes of between one and four voting-age persons.

The nonlinearity of the logistic link function of the MLN model and the unseen latent random effects variables were computationally challenging issues. Both the frequentist adaptive Gaussian quadrature procedure and Bayesian inference can handle random effects, as was demonstrated in previous 2021 VRA determinations [1]. The frequentist model was fit by using adaptive Gaussian quadrature to optimize the likelihood function, which required the random effects variable to be averaged out. This approach makes it technically difficult to adopt variable selection procedures, since the entire score function would have to be modified for such an additional procedure. In contrast, Bayesian methods treat the random effects as just another parameter, which can be drawn in Gibbs sampling. STAN was used to sample the random effects and their covariance matrix for the 2021 VRA Bayesian model, but it did not converge for smaller LMGs. Our new Bayesian model used PG augmentation to obtain closed-form solutions for posterior distributions, which made implementing Bayesian stochastic search variable selection computationally feasible in more LMGs. This lead to noticeable improvements in our overall diagnostic results.

Author Contributions

Conceptualization, J.K. and A.C.H.; data curation, A.C.H.; formal analysis, J.K. and A.C.H.; methodology, J.K.; project administration, J.K.; software, J.K. and A.C.H.; supervision, J.K.; validation, J.K.; visualization, J.K.; writing—original draft preparation, J.K.; writing—review and editing, A.C.H. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The datasets presented in this article are not readily available because they are protected by Title 13, U.S.C., in the interests of respondent privacy.

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A. Derivation of the $β$ Posterior [6]

Without loss of generality, we omit the outcome category indicator k from variable notations in this section. This means that

β_{k}

will be written as

β

,

U_{k}

will be written as U,

Ω_{k}

will be written as

Ω

, etc. Recall that Z=

{(z_{1}, \dots ., z_{N})}^{T}

,

z_{j}

denote

(y_{j} - n_{j} / 2) / ω_{j}

, and suppose

ω_{j}

is sampled from

P G (n_{j}, X_{j} β)

. Then,

P (Z | β, X) P (β | b, B)

is proportional to

\begin{matrix} exp (- \frac{1}{2} [{(Z - X β)}^{T} Ω (Z - X β) + {(β - b)}^{T} B^{- 1} (β - b)]) \\ = & exp (- \frac{1}{2} [{(Z - X β)}^{T} {(Ω^{\frac{1}{2}})}^{T} Ω^{\frac{1}{2}} (Z - X β) + {(β - b)}^{T} B^{- 1} (β - b)]) \\ = & exp (- \frac{1}{2} [{(Ω^{\frac{1}{2}} (Z - X β))}^{T} (Ω^{\frac{1}{2}} (Z - X β)) + {(β - b)}^{T} B^{- 1} (β - b)]) \\ = & exp (- \frac{1}{2} [{(Z^{*} - X^{*} β)}^{T} (Z^{*} - X^{*} β) + {(β - b)}^{T} B^{- 1} (β - b)]) \\ = & exp (- \frac{1}{2} [{(Z^{*} - X^{*} \hat{β} + X^{*} \hat{β} - X^{*} β)}^{T} (Z^{*} - X^{*} \hat{β} + X^{*} \hat{β} - X^{*} β) + {(β - b)}^{T} B^{- 1} (β - b)]) \\ = & exp (- \frac{1}{2} [{(Z^{*} - X^{*} \hat{β})}^{T} (Z^{*} - X^{*} \hat{β}) + {(X^{*} β - X^{*} \hat{β})}^{T} (X^{*} β - X^{*} \hat{β}) + {(β - b)}^{T} B^{- 1} (β - b)]) \\ = & exp (- \frac{1}{2} [{(Z^{*} - X^{*} \hat{β})}^{T} (Z^{*} - X^{*} \hat{β}) + {(β - \hat{β})}^{T} X^{* T} X^{*} (β - \hat{β}) + {(β - b)}^{T} B^{- 1} (β - b)]) \\ = & exp (- \frac{1}{2} [{(Z - X \hat{β})}^{T} Ω (Z - X \hat{β}) + {(β - \hat{β})}^{T} X^{T} Ω X (β - \hat{β}) + {(β - b)}^{T} B^{- 1} (β - b)]) \\ \propto & exp (- \frac{1}{2} [{(β - \hat{β})}^{T} X^{T} Ω X (β - \hat{β}) + {(β - b)}^{T} B^{- 1} (β - b)]) \\ = & exp (- \frac{1}{2} {(β - V_{β}^{- 1} m_{β})}^{T} V_{β} (β - V_{β}^{- 1} m_{β})) \end{matrix}

where

V_{β}

is

X^{T} Ω X

+

B^{- 1}

and

m_{β}

is

X_{p \times N}^{T} Ω_{N \times N} X_{N \times p} {\hat{β}}_{p \times 1}

+

B^{- 1} b

. Note that

X^{T} Ω X \hat{β}

is

X^{T} Ω Z

because

\hat{β}

is such that

X^{* T} (Z^{*} - X^{*} \hat{β})

= 0, where

A^{*}

=

Ω^{\frac{1}{2}} A

and because

X^{* T} (Z^{*} - X^{*} \hat{β})

is the same as

X^{T} Ω Z - X^{T} Ω X \hat{β}

.

Thus, the posterior distribution of

β

|Z,X is

N (V_{β}^{- 1} m_{β}, V_{β}^{- 1}),

where

V_{β}

is

X^{T} Ω X

+

B^{- 1}

and

m_{β}

is

X^{T} Ω z

+

B^{- 1} b

, which is

X^{T} κ

+

B^{- 1} b

, where

κ_{i}

=

(y_{i} - n_{i} / 2)

.

Appendix B. Derivation of the $β$ Posterior for the VRA Model

Without loss of generality, we omit the outcome category indicator k from variable notations in this section. This means that

β_{k}

will be written as

β

,

U_{k}

will be written as U,

Ω_{k}

will be written as

Ω

, etc. Recall X is the matrix of covariates. Let U denote the

N \times 1

matrix of random effects variables,

N (b, B)

denote a standard normal prior for

β

, and the likelihood function

P (Z | β, U, X)

denote

N (U + X β, Ω^{- 1})

. Then,

P (Z | β, U, X) P (β | b, B)

is proportional to

\begin{matrix} exp (- \frac{1}{2} [{(Z - U - X β)}^{T} Ω (Z - U - X β) + {(β - b)}^{T} B^{- 1} (β - b)]) \\ = & exp (- \frac{1}{2} [{(\tilde{Z} - X β)}^{T} {(Ω^{\frac{1}{2}})}^{T} Ω^{\frac{1}{2}} (\tilde{Z} - X β) + {(β - b)}^{T} B^{- 1} (β - b)]) \\ = & exp (- \frac{1}{2} [{(Ω^{\frac{1}{2}} (\tilde{Z} - X β))}^{T} (Ω^{\frac{1}{2}} (\tilde{Z} - X β)) + {(β - b)}^{T} B^{- 1} (β - b)]) \\ = & exp (- \frac{1}{2} [{(Z^{*} - X^{*} β)}^{T} (Z^{*} - X^{*} β) + {(β - b)}^{T} B^{- 1} (β - b)]) \\ = & exp (- \frac{1}{2} [{(Z^{*} - X^{*} \hat{β} + X^{*} \hat{β} - X^{*} β)}^{T} (Z^{*} - X^{*} \hat{β} + X^{*} \hat{β} - X^{*} β) + {(β - b)}^{T} B^{- 1} (β - b)]) \\ = & exp (- \frac{1}{2} [{(Z^{*} - X^{*} \hat{β})}^{T} (Z^{*} - X^{*} \hat{β}) + {(x^{*} β - X^{*} \hat{β})}^{T} (X^{*} β - X^{*} \hat{β}) + {(β - b)}^{T} B^{- 1} (β - b)]) \\ = & exp (- \frac{1}{2} [{(Z^{*} - X^{*} \hat{β})}^{T} (Z^{*} - X^{*} \hat{β}) + {(β - \hat{β})}^{T} X^{* T} X^{*} (β - \hat{β}) + {(β - b)}^{T} B^{- 1} (β - b)]) \\ = & exp (- \frac{1}{2} [{(\tilde{Z} - X \hat{β})}^{T} Ω (\tilde{Z} - X \hat{β}) + {(β - \hat{β})}^{T} X^{T} Ω X (β - \hat{β}) + {(β - b)}^{T} B^{- 1} (β - b)]) \\ \propto & exp (- \frac{1}{2} [{(β - \hat{β})}^{T} X^{T} Ω X (β - \hat{β}) + {(β - b)}^{T} B^{- 1} (β - b)]) \\ = & exp (- \frac{1}{2} {(β - V_{β}^{- 1} m_{β})}^{T} V_{β} (β - V_{β}^{- 1} m_{β})) \end{matrix}

where

V_{β}

is

X^{T} Ω X

+

B^{- 1}

and

m_{β}

is

X^{T} Ω X \hat{β}

+

B^{- 1} b

,

A^{*}

denotes

Ω^{\frac{1}{2}} A

for a variable A, and y and n denote

y_{1}

,…,

y_{N}

and

n_{1}

,…,

n_{N}

, respectively, as described in Section 2. Note that

X^{T} Ω X \hat{β}

is

X^{T} (y - n / 2)

−

X^{T} Ω U

because

\hat{β}

is such that

X^{* T} (Z^{*} - X^{*} \hat{β})

= 0, and hence

X^{T} Ω \tilde{Z}

=

X^{T} Ω X \hat{β}

. Now,

X^{T} Ω \tilde{Z}

is

X^{T} Ω (Z - U)

and

X^{T} (y - n / 2)

−

X^{T} Ω U

. Thus,

X^{T} (y - n / 2)

−

X^{T} Ω U

=

X^{T} Ω X \hat{β}

.

Appendix C. Derivation of $P (U | M, X, β, ω)$

Let us assume a Gaussian prior distribution

N (0, Σ)

for the random effects variables

u_{j}

(=

u_{j, 1}

,

u_{j, 2}

,

u_{j, 3}

) for jurisdiction j. Recall that

M_{j}

denoted the counts of persons with language conditions. The conjugate Gaussian posterior for

u_{j}

is proportinal to

P (M_{j} | u_{j}) \times P (u_{j} | Σ)

, as shown as follows.

\begin{matrix} \frac{{(exp (η_{j, 1}))}^{\sum_{k = 2}^{4} M_{j, k}}}{{(1 + exp (η_{j, 1}))}^{\sum_{k = 1}^{4} M_{j, k}}} \frac{{(exp (η_{j, 2}))}^{\sum_{k = 3}^{4} M_{j, k}}}{{(1 + exp (η_{j, 2}))}^{\sum_{k = 2}^{4} M_{j, k}}} \frac{{(exp (η_{j, 3}))}^{\sum_{k = 4}^{4} M_{j, k}}}{{(1 + exp (η_{j, 3}))}^{\sum_{k = 3}^{4} M_{j, k}}} \times exp (- \frac{1}{2} u_{j}^{T} Σ^{- 1} u_{j}) \\ = & N (M_{j, 1}^{*} | η_{j, 1}, ω_{j, 1}^{- 1}) \times N (M_{j, 2}^{*} | η_{j, 2}, ω_{2}^{- 1}) \times N (M_{j, 3}^{*} | η_{j, 3}, ω_{3}^{- 1}) \times exp (- \frac{1}{2} u_{j}^{T} Σ^{- 1} u_{j}) \\ \propto & exp (- \frac{1}{2} [ω_{j, 1} {(M_{j, 1}^{*} - η_{j, 1})}^{2} + ω_{j, 2} {(M_{j, 2}^{*} - η_{j, 2})}^{2} + ω_{j, 3} {(M_{j, 3}^{*} - η_{j, 3})}^{2} + u_{j}^{T} Σ^{- 1} u_{j}]) \\ = & exp (- \frac{1}{2} [ω_{j, 1} {(u_{j, 1} - M_{j, 1}^{* u})}^{2} + ω_{j, 2} {(u_{j, 2} - M_{j, 2}^{* u})}^{2} + ω_{j, 3} {(u_{j, 3} - M_{j, 3}^{* u})}^{2} + u_{j}^{T} Σ^{- 1} u_{j}]) \\ = & exp (- \frac{1}{2} [{(u_{j} - M_{j}^{* u})}_{[1 \times 3]}^{T} Ω_{j [3 \times 3]}^{*} {(u_{j} - M_{j}^{* u})}_{[3 \times 1]} + {(u_{j} - 0)}^{T} Σ^{- 1} (u_{j} - 0)]) \\ \propto & N (V_{j, u}^{- 1} m_{j, u}, V_{j, u}^{- 1}) \end{matrix}

where

V_{j, u}

=

Ω_{j}^{*}

+

Σ^{- 1}

,

m_{j, u}

=

Ω_{j}^{*} M_{j}^{* u}

,

Ω_{j}^{*}

=

d i a g (ω_{j, 1}, ω_{j, 2}, ω_{j, 3})

,

M_{j, k}^{* u}

=

M_{j, k}^{*}

−

X_{j} β_{k}

,

M_{j, 1}^{*}

=

ω_{1}^{- 1} (\sum_{k = 2}^{4} M_{j, k} - \frac{1}{2} \sum_{k = 1}^{4} M_{j, k})

,

M_{j, 2}^{*}

=

ω_{2}^{- 1} (\sum_{k = 3}^{4} M_{j, k} - \frac{1}{2} \sum_{k = 2}^{4} M_{j, k})

, and

M_{j, 3}^{*}

=

ω_{3}^{- 1} (\sum_{k = 4}^{4} M_{j, k} - \frac{1}{2} \sum_{k = 3}^{4} M_{j, k})

, and

η_{j, k} \equiv X_{j}^{T} β_{k}

+

u_{j, k}

. The last two lines of the equations result by the algebra in Appendix E.

Appendix D. Derivation of $P (Σ | U)$

Assume a multivariate normal distribution

M V N (0, Σ)

on U, which is

\begin{matrix} P (U | Σ) = & \prod_{j = 1}^{N} {(2 π)}^{- \frac{1}{2}} {| Σ |}^{- \frac{1}{2}} exp (- \frac{1}{2} u_{j}^{T} Σ^{- 1} u_{j}) \\ \propto & {| Σ |}^{- \frac{N}{2}} exp (- \frac{1}{2} \sum_{j = 1}^{N} u_{j}^{T} Σ^{- 1} u_{j}) \\ = & {| Σ |}^{- \frac{N}{2}} exp (- \frac{1}{2} t r (\sum_{j = 1}^{N} u_{j} u_{j}^{T} Σ^{- 1})) . \end{matrix}

An inverse Wishart

I W (V_{0}, s_{0})

for the prior distribution with known scale matrix

V_{0}

and the degrees of freedom

s_{0}

for a

p \times p

covariance matrix

Σ

is

\begin{matrix} P (Σ) = & \frac{| V_{0} |^{\frac{1}{2} s_{0}}}{2^{s_{0} p / 2} Γ_{p} (s_{0} / 2)} {| Σ |}^{- \frac{1}{2} (s_{0} + p + 1)} exp (- \frac{1}{2} t r (V_{0} Σ^{- 1})) \end{matrix}

The conditional posterior

P (Σ | U)

is proportional to

\begin{matrix} P (U | Σ) \times P (Σ) \\ \propto & {| Σ |}^{- \frac{1}{2} N} exp (- \frac{1}{2} t r (\sum_{j = 1}^{N} u_{j} u_{j}^{T} Σ^{- 1})) \times {| Σ |}^{- \frac{1}{2} (s_{0} + p + 1)} exp (- \frac{1}{2} t r (V_{0} Σ^{- 1})) \\ = & {| Σ |}^{- \frac{1}{2} (N + s_{0} + p + 1)} exp (- \frac{1}{2} t r (\sum_{j = 1}^{N} u_{j} u_{j}^{T} Σ^{- 1} + V_{0} Σ^{- 1})) \\ = & {| Σ |}^{- \frac{1}{2} (N + s_{0} + p + 1)} exp (- \frac{1}{2} t r (\{\sum_{j = 1}^{N} u_{j} u_{j}^{T} + V_{0}\} Σ^{- 1})) . \end{matrix}

Appendix E. Algebraic Basics: Sum of Quadratic Matrices

Appendix E.1. Sum of Two Quadratic Matrices

\begin{matrix} {(x - μ_{A})}^{T} Σ_{A}^{- 1} (x - μ_{A}) + {(x - μ_{B})}^{T} Σ_{B}^{- 1} (x - μ_{B}) \\ = & x^{T} Σ_{A}^{- 1} x - 2 μ_{A}^{T} Σ_{A}^{- 1} x + μ_{A}^{T} Σ_{A}^{- 1} μ_{A} + x^{T} Σ_{B}^{- 1} x - 2 μ_{B}^{T} Σ_{B}^{- 1} + μ_{B}^{T} Σ_{B}^{- 1} μ_{B} \\ = & x^{T} (Σ_{A}^{- 1} + Σ_{B}^{- 1}) x - 2 {(Σ_{A}^{- 1} μ_{A} + Σ_{B}^{- 1} μ_{B})}^{T} x + μ_{A}^{T} Σ_{A}^{- 1} μ_{A} + μ_{B}^{T} Σ_{B}^{- 1} μ_{B} \\ = & x^{T} V x - 2 m^{T} x + R \\ = & {(x - V^{- 1} m)}^{T} V (x - V^{- 1} m) - m^{T} V^{- 1} m + R \\ \to & N (V^{- 1} m, V^{- 1}) \end{matrix}

where V=

Σ_{A}^{- 1} + Σ_{B}^{- 1}

and m =

Σ_{A}^{- 1} μ_{A} + Σ_{B}^{- 1} μ_{B}

. Thus,

μ_{A} | x \sim

N (μ_{A}, Σ_{A})

and

x \sim

N (μ_{B}, Σ_{B})

, so

x | μ_{A} \sim N (V^{- 1} m, V^{- 1})

.

V^{- 1}

=

{(Σ_{A}^{- 1} + Σ_{B}^{- 1})}^{- 1}

and m =

Σ_{A}^{- 1} μ_{A} + Σ_{B}^{- 1} μ_{B}

.

Appendix E.2. Sum of N Quadratic Matrices

Consider x to be a vector of the VRA’s three disjoint outcomes. Then,

\begin{matrix} \sum_{j = 1}^{N} {(x - μ_{j})}^{T} Σ_{j}^{- 1} (x - μ_{j}) \\ = & {(x - μ_{1})}^{T} Σ_{1}^{- 1} (x - μ_{1}) + {(x - μ_{2})}^{T} Σ_{2}^{- 1} (x - μ_{2}) + \dots + {(x - μ_{N})}^{T} Σ_{N}^{- 1} (x - μ_{N}) \\ = & x^{T} Σ_{1}^{- 1} x - 2 μ_{1}^{T} Σ_{1}^{- 1} x + x^{T} Σ_{2}^{- 1} x - 2 μ_{2}^{T} Σ_{2}^{- 1} + \dots + x^{T} Σ_{N}^{- 1} x - 2 μ_{N}^{T} Σ_{N}^{- 1} + R \\ = & x^{T} (Σ_{1}^{- 1} + Σ_{2}^{- 1} + \dots + Σ_{N}^{- 1}) x - 2 {(Σ_{1}^{- 1} μ_{1} + Σ_{2}^{- 1} μ_{2} + \dots + Σ_{N}^{- 1} μ_{N})}^{T} x + R \\ = & x^{T} V_{N} x - 2 m_{N}^{T} x + R \\ = & {(x - V_{N}^{- 1} m_{N})}^{T} V_{N} (x - V_{N}^{- 1} m_{N}) + R \\ \to & N (V^{- 1} m, V^{- 1}) \end{matrix}

V_{N}

=

\sum_{j = 1}^{N} Σ_{j}^{- 1}

and

m_{N}

=

\sum_{j = 1}^{N} Σ_{j}^{- 1} μ_{j}

. Thus,

x | μ_{1}, \dots, μ_{N} \sim N (V_{N}^{- 1} m_{N}, V_{N}^{- 1})

.

References

Slud, E.V.; Franco, C.; Hall, A.; Kang, J. Statistical Methodology (2021) for Voting Rights Act, Section 203 Determinations. 2022. Available online: https://www.census.gov/library/working-papers/2022/adrm/RRS2022-06.html (accessed on 9 April 2024).
Joyce, P.M.; Malec, D.; Little, R.J.; Gilary, A.; Navarro, A.; Asiala, M.E. Statistical Modeling Methodology for the Voting Rights Act Section 203 Language Assistance Determinations. J. Am. Stat. Assoc. 2014, 109, 36–47. [Google Scholar] [CrossRef]
Slud, E.V.; Ashmead, R.; Joyce, P.; Wright, T. Statistical Methodology (2016) for Voting Rights Act, Section 203 Determinations. 2018. Available online: https://www.census.gov/library/working-papers/2018/adrm/RRS2018-12.html (accessed on 9 April 2024).
Pinheiro, J.C.; Bates, D.M. Approximations to the Log-Likelihood Function in the Nonlinear Mixed-Effects Model. J. Comput. Graph. Stat. 1995, 4, 12–35. [Google Scholar] [CrossRef]
Carpenter, B.; Gelman, A.; Hoffman, M.D.; Lee, D.; Goodrich, B.; Betancourt, M.; Brubaker, M.; Guo, J.; Li, P.; Riddell, A. Stan: A probabilistic programming language. J. Stat. Softw. 2017, 76, 1–32. [Google Scholar] [CrossRef] [PubMed]
Polson, N.G.; Scott, J.G.; Windle, J. Bayesian Inference for Logistic Models Using Polya Gamma Latent Variables. J. Am. Stat. Assoc. 2013, 108, 1339–1349. [Google Scholar] [CrossRef]
Polson, N.G.; Scott, J.G.; Windle, J. Package: BayesLogit 2.1. 2019. Available online: https://cran.r-project.org/web/packages/BayesLogit/index.html (accessed on 9 April 2024).
George, E.I.; McCulloch, R.E. Variable Selection via Gibbs Sampling. J. Am. Stat. Assoc. 1993, 88, 881–889. [Google Scholar] [CrossRef]
Park, T.; Casella, G. The Bayesian Lasso. J. Am. Stat. Assoc. 2008, 103, 681–686. [Google Scholar] [CrossRef]

Figure 1. Residual plots.

Table 1. Hypothetical VRA proportions for an LMG.

Jurisdiction	# of Voting-Aged Persons	CITprop	LEPprop	ILLprop	$π_{CIT}$	$π_{LEP}$	$π_{ILL}$	$X_{j}$
Jurisdiction 1	1000	0.80	0.10	0.01	0.80	0.10	0.01	$X_{1}$
Jurisdiction 2	10	0.95	0.25	0.02	0.92	0.19	0.07	$X_{2}$
...	...	...	...	...	...	...	...	...
Jurisdiction N	1	1	0	0	0.93	0.10	0.01	$X_{N}$

Table 2. Likelihood contributions.

Binomial Prob.	Ratio	Equivalent Prob.	Log-Likelihood Contribution for a Jurisdiction
$v_{1}$	CIT/VOT	$1 - p_{1}$	$M_{1} log (p_{1}) + (n - M_{1}) log (1 - p_{1})$
$v_{2}$	LEP/CIT	$1 - \frac{p_{2}}{1 - p_{1}}$	$M_{2} log (\frac{p_{2}}{1 - p_{1}}) + (n - M_{1} - M_{2}) log (1 - \frac{p_{2}}{1 - p_{1}})$
$v_{3}$	ILL/LEP	$1 - \frac{p_{3}}{1 - p_{1} - p_{2}}$	$M_{3} log (\frac{p_{3}}{1 - p_{1} - p_{2}}) + (n - M_{1} - M_{2} - M_{3}) log (1 - \frac{p_{3}}{1 - p_{1} - p_{2}})$

Table 3. Comparing new Bayes model and older results for Bangladeshi LMG.

	[1, 4]	[5, 12]	[13, 25]	[26, 50]	[51, 200]	[201, $1 \times 10^{6}$ ]
Delta (Old)	611.9	9.7	−143.0	−47.5	−319.9	−64.3
PctRel $Δ$ (Old)	47.2	0.4	−6.2	−1.9	−3.1	−0.4
Stdiz $Δ$ (Old)	3.2	0.0	−0.5	−0.2	−0.6	−0.1
Delta (New)	223.6	−117.3	−126.1	−62.4	42.4	8.5
PctRel $Δ$ (New)	17.3	−5.3	−5.5	−2.5	0.4	0.0
Stdiz $Δ$ (New)	1.2	−0.4	−0.5	−0.2	0.1	0.0

Table 4. Comparing new Bayes model and older results for Sri Lankan LMG.

	[1, 4]	[5, 12]	[13, 25]	[26, 50]	[51, 200]	[201, $1 \times 10^{6}$ ]
Delta (Old)	228.5	−143.3	−55.9	−8.3	−49.6	−11.4
PctRel $Δ$ (Old)	54.3	−18.4	−6.7	−1.9	−3.6	−1.4
Stdiz $Δ$ (Old)	2.5	−1.0	−0.3	−0.1	−0.2	−0.1
Delta (New)	87.2	−174.3	11.5	62.9	−2.5	−9.7
PctRel $Δ$ (New)	20.7	−22.4	1.4	14.4	−0.2	−1.2
Stdiz $Δ$ (New)	0.9	−1.2	0.1	0.6	0.0	−0.1

Table 5. Comparing new Bayes model and older results across all LMGs.

	[1, 4]	[5, 12]	[13, 25]	[26, 50]	[51, 200]	[201, $1 \times 10^{6}$ ]
Old	104.9	50.1	43	28.7	21.7	12.9
New	79.7	44.0	39	22.0	17.0	8.5

Table 6. Comparing new Bayes (denoted by “New”) and older results across all LMGs.

	(0, 4]	(4, 12]	(12, 25]	(25, 50]	(50, 200]	(200, $1 \times 10^{6}$ ]	All
# LMGs with \|Stdiz $Δ$ \|>1.96 (Old)	15	5	2	1	1	0	24
# LMGs with \|Stdiz $Δ$ \|>1.96 (New)	7	6	2	1	0	0	16
# LMGs with \|Stdiz $Δ$ \|>1.96 (Old and New)	7	4	2	1	0	0	14
# LMGs with \|Stdiz $Δ$ \|>1.96 (Old or New)	15	7	2	1	1	0	26
# LMGs with \|Stdiz $Δ$ \|>1.96 (Improved)	14	4	2	1	1	0	22
% LMGs with \|Stidz $Δ$ \|>1.96 (Improved)	93%	57%	100%	100%	100%	NA	85%

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Hall, A.C.; Kang, J. Inference with Pólya-Gamma Augmentation for US Election Law. Mathematics 2025, 13, 945. https://doi.org/10.3390/math13060945

AMA Style

Hall AC, Kang J. Inference with Pólya-Gamma Augmentation for US Election Law. Mathematics. 2025; 13(6):945. https://doi.org/10.3390/math13060945

Chicago/Turabian Style

Hall, Adam C., and Joseph Kang. 2025. "Inference with Pólya-Gamma Augmentation for US Election Law" Mathematics 13, no. 6: 945. https://doi.org/10.3390/math13060945

APA Style

Hall, A. C., & Kang, J. (2025). Inference with Pólya-Gamma Augmentation for US Election Law. Mathematics, 13(6), 945. https://doi.org/10.3390/math13060945

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Inference with Pólya-Gamma Augmentation for US Election Law

Abstract

1. Introduction

1.1. A Motivating Example

1.2. Utility of Pólya-Gamma Augmentation in Bayesian Methods

2. The Pólya-Gamma Distribution for Binomial VRA Datasets

2.1. The 2021 VRA Model

2.2. The Pólya-Gamma (PG) Distribution

2.3. A Simplified Application of Pólya-Gamma Augmentation

3. New MLN Model with PG Augmentation

3.1. Basic Setup

3.2. Model Likelihood and Priors

3.3. Conditional Posterior Distributions for Regression Coefficients and PG Variables

3.4. Conditional Posterior Distribution for the Variable Selection Parameters

3.5. Conditional Posterior Distribution for Random Effects Covariance Σ

3.6. Conditional Posterior Distribution for Random Effects Variables

3.7. Gibbs Sampling Algorithm

4. 2021 VRA Data Analysis Results

4.1. Data Description

4.2. Model Comparison

4.2.1. Bangladeshi LMG Results

4.2.2. Sri Lankan LMG Results

4.2.3. Overall Results

4.2.4. Improvement Rates

4.2.5. Residual Plots

5. Discussion

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Appendix A. Derivation of the β Posterior [6]

Appendix B. Derivation of the β Posterior for the VRA Model

Appendix C. Derivation of P U | M , X , β , ω

Appendix D. Derivation of P ( Σ | U )

Appendix E. Algebraic Basics: Sum of Quadratic Matrices

Appendix E.1. Sum of Two Quadratic Matrices

Appendix E.2. Sum of N Quadratic Matrices

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

3.5. Conditional Posterior Distribution for Random Effects Covariance $Σ$

Appendix A. Derivation of the $β$ Posterior [6]

Appendix B. Derivation of the $β$ Posterior for the VRA Model

Appendix C. Derivation of $P (U | M, X, β, ω)$

Appendix D. Derivation of $P (Σ | U)$