An Algorithm for the Conditional Distribution of Independent Binomial Random Variables Given the Sum

Ayres, Kelly; Rigdon, Steven E.

doi:10.3390/math13132155

Open AccessFeature PaperArticle

An Algorithm for the Conditional Distribution of Independent Binomial Random Variables Given the Sum

by

Kelly Ayres

and

Steven E. Rigdon

^*

College for Public Health and Social Justice, Saint Louis University, Saint Louis, MO 62034, USA

^*

Author to whom correspondence should be addressed.

Mathematics 2025, 13(13), 2155; https://doi.org/10.3390/math13132155

Submission received: 21 May 2025 / Revised: 22 June 2025 / Accepted: 22 June 2025 / Published: 30 June 2025

Download

Browse Figures

Versions Notes

Abstract

We investigate Metropolis–Hastings (MH) algorithms to approximate the distribution of independent binomial random variables conditioned on the sum. Let

X_{i} \sim BIN (n_{i}, p_{i})

. We want the distribution of

[X_{1}, \dots, X_{k}]

conditioned on

X_{1} + \dots + X_{k} = n

. We propose both a random walk MH algorithm and an independence sampling MH algorithm for simulating from this conditional distribution. The acceptance probability in the MH algorithm always involves the probability mass function of the proposal distribution. For the random walk MH algorithm, we take this distribution to be uniform across all possible proposals. There is an inherent asymmetry; the number of moves from one state to another is not in general equal to the number of moves from the other state to the one. This requires a careful counting of the number of possible moves out of each possible state. The independence sampler proposes a move based on the Poisson approximation to the binomial. While in general, random walk MH algorithms tend to outperform independence samplers, we find that in this case the independence sampler is more efficient.

Keywords:

markov chain monte carlo; independence sampler; random walk metropolis–hastings algorithm; poisson approximation to the binomial

MSC:

60-C05; 05-04

1. Introduction

Implementation of Markov chain Monte Carlo (MCMC) algorithms often requires sampling from nonstandard probability distributions. In this article, we describe a method for (1) approximating the conditional distribution of the sum of independent binomials given the sum, or (2) simulating a value from this conditional distribution. We develop and compare two MCMC algorithms for accomplishing this.

This study was motivated by an imputation problem using the data augmentation algorithm [1]. Ayres et al. [2] proposed use of the data augmentation algorithm to impute county-level vaccination frequencies in the United States. In order to obtain these imputed values, it is necessary to simulate from the distribution of independent binomial random variables conditioned on the sum. In the formulation of the problem, we have independent binomial random variables whose sum is known. The method proposed in this paper has direct application to the imputation step of the data augmentation algorithm for the vaccine imputation problem. The problem of approximating the reproduction number of an infectious disease also requires the simulation of independent binomials conditioned on the sum.

Let

X_{i} \overset{indep .}{\sim} BIN (n_{i}, p_{i}), i = 1, 2, \dots, k

and let

X = X_{1} + X_{2} + \dots + X_{k}

denote the sum. The goal is to sample from the conditional distribution of

[X_{1}, X_{2}, \dots, X_{k}]

given

X = n

. The Metropolis–Hastings (MH) algorithm [3,4] can be used to sample from the conditional probability mass function (PMF)

p (x_{1}, x_{2}, \dots, x_{k} | n)

. The algorithm creates a Markov chain whose steady state distribution is the desired distribution. General methods that include the MH algorithm are called Markov chain Monte Carlo (MCMC) methods. The MH algorithm was unnoticed in the statistical community until Geman and Geman [5] applied it to a Bayesian analysis of image restoration. While most applications of the MH algorithm, or MCMC in general, involve the approximation of a posterior distribution in a Bayesian analysis, the method can be used to sample from almost any probability distribution. The MCMC revolution of the late 1980s opened the door to almost any Bayesian analysis, provided we could accept a collection of simulated observations from the posterior distribution in lieu of an analytic expression for the posterior. Gilks et al. [6] was influential in explaining the theory of MCMC and giving a number of applications. See [7] for a modern treatment of the various MCMC algorithms available today. For our situation, we apply the MH algorithm in two ways to simulate the conditional distribution of binomial random variables given the sum.

The MH algorithm begins with a plausible value for the vector

x = [x_{1}, x_{2}, \dots, x_{k}]

, in the sense that

x_{i} \geq 0, i = 1, 2, \dots, k

, and

\sum_{i = 1}^{k} x_{i} = n

. We then propose a move to another state by adding 1 to one randomly selected component of

x

and subtracting 1 from another selected at random from

x

. Given the current state of the Markov chain, we select one ordered pair of components with the condition that we can subtract 1 from the first and add 1 to the second and reach another valid state. Note that for some vectors, there may be some components that are 0, in which case we cannot subtract 1, and there may be others that equal

min (n, n_{i})

, in which case we cannot add 1.

A related problem is that if

Y_{i} \overset{indep .}{\sim} POIS (λ_{i})

, then conditioned on

Y_{1} + \dots + Y_{k} = n

the vector

[Y_{1}, Y_{2}, \dots, Y_{k}]

is multinomial with size n and probabilities

λ_{i} / (λ_{1} + \dots + λ_{k})

,

i = 1, \dots, k

. To see this, write the conditional as the ratio of the joint probability mass function (PMF)

p (y_{1}, y_{2}, \dots, y_{k}, n)

and the marginal of the given,

p (n)

:

\begin{matrix} p (y_{1}, \dots, y_{k} | n) & = \frac{p (y_{1}, \dots, y_{k}, n)}{p (n)} \\ = \frac{\prod_{i = 1}^{k} \frac{λ_{i}^{y_{i}} e^{- λ_{i}}}{y_{i}!}}{\frac{{(\sum_{i = 1}^{k} λ_{i})}^{n} exp (- \sum_{i = 1}^{k} λ_{i})}{n!}} \\ = \frac{n!}{y_{1}! \dots y_{k}!} {(\frac{λ_{1}}{\sum_{i = 1}^{k} λ_{i}})}^{y_{1}} \times \dots \times {(\frac{λ_{k}}{\sum_{i = 1}^{k} λ_{k}})}^{y_{k}} \end{matrix}

(1)

which is the PMF for the multinomial distribution and has constraints

y_{i} \geq 0

for all i and

\sum_{i = 1}^{k} y_{i} = n

.

If the successes in the various Bernoulli trials that give rise to the Xs have small probability

p_{i}

and the number of trials

n_{i}

is large, then the binomial distribution is well approximated by the Poisson; that is, if

X_{i} \sim BIN (n_{i}, p_{i})

, then

X_{i} \overset{approx .}{\sim} POIS (n_{i} p_{i})

. Thus, as an approximation, we can say that

[X_{1}, X_{2}, \dots, X_{k}] | \sum_{i = 1}^{k} X_{i} = n \overset{approx .}{\sim} MULT (n, [\frac{n_{1} p_{1}}{\sum_{i = 1}^{k} n_{i} p_{i}}, \frac{n_{2} p_{2}}{\sum_{i = 1}^{k} n_{i} p_{i}}, \dots, \frac{n_{k} p_{k}}{\sum_{i = 1}^{k} n_{i} p_{i}}]) .

(2)

We propose two applications of the MH algorithm with a starting value obtained using the Poisson approximation in (2).

The details of the random walk MH algorithm require a careful enumeration of the number of neighbors of each possible state in the finite Markov chain. In the next section, we address this combinatorics problem for the random walk MH algorithm.

The MH algorithm begins with an initial plausible value from the target distribution, which in our case is the conditional distribution of the vector

[X_{1}, X_{2}, \dots, X_{k}]

conditioned on the sum

X = X_{1} + X_{2} + \dots + X_{k} = n

. Call this initial value

x^{(0)} = [x_{1}^{(0)}, x_{2}^{(0)}, \dots, x_{k}^{(0)}]

. We then propose a move to a new value, called

x^{*}

. This proposal distribution, which can depend on the current state

x^{(0)}

of the Markov chain, is denoted

q (x^{*} | x^{(0)})

. This move is accepted with probability

α = min (1, \frac{p (x^{*} | n)}{p (x^{(0)} | n)} \times \frac{q (x^{(0)} | x^{*})}{q (x^{*} | x^{(0)})}) .

(3)

The next value of

x

, called

x^{(1)}

, is

x^{(1)} = \{\begin{matrix} x^{*}, & with probability α \\ x^{(0)} & with probability 1 - α . \end{matrix}

(4)

Thus, we either take the proposal

x^{*}

(with probability

α

) or we stay at the current state. This process then continues successively until the desired number of iterations, typically in the thousands to hundreds of thousands, has been achieved. The theory says that the steady state distribution of the Markov chain

x^{(0)} \to x^{(1)} \to x^{(2)} \to x^{(3)} \to \dots x^{(t)} \to x^{(t + 1)} \to \dots

is the target distribution

p (x | n)

. We should continue this iterative method until we believe we have reached the steady state distribution; this is called the burn-in. If we want a single sample from the distribution of

[x_{1}, x_{2}, \dots, x_{k}]

given n, we can simulate one additional step. If we are interested in the target distribution in full, we can simulate an additional large number of steps and use the resulting proportions as our estimate for the target distribution.

We describe two types of the MH algorithm. One is a random walk, where a move is proposed from the current state by simultaneously increasing and decreasing the count by one in two components. The second type of MH algorithm is an independence sampler, whereby the proposal made at each step is independent of the current state vector. In most cases where the MH algorithm is applied, the random walk works better, but in this circumstance the independence sampler seems to reach the steady state quicker and is the preferred method.

Section 2 describes the random walk MH algorithm, and the subsequent section, Section 3, gives a concrete example. Section 4 describes the independence sampling MH algorithm and Section 5 presents an example. Conclusions are given in Section 6.

2. Random Walk MH Algorithm

In our implementation, we propose to select one component of

[x_{1}^{(0)}, x_{2}^{(0)}, \dots, x_{k}^{(0)}]

and subtract 1 from it and another element from

[x_{1}^{(0)}, x_{2}^{(0)}, \dots, x_{k}^{(0)}]

and add 1 to it. We must be careful, though, because some components of

x^{(0)}

may be 0, in which case we cannot select one of them to subtract 1. Also, some components may achieve the maximum possible value,

min (n, n_{i})

, in which case we cannot add 1. A careful counting of the possibilities is needed.

For example, consider the simple problem of

k = 3

, and

\begin{matrix} X_{1} \sim BIN (5, p_{1}) \\ X_{2} \sim BIN (5, p_{2}) \\ X_{3} \sim BIN (6, p_{3}) \end{matrix}

We want to sample from the conditional distribution of

[X_{1}, X_{2}, X_{3}]

conditioned on

X_{1} + X_{2} + X_{3} = 5

. Here

n = 5

,

k = 3

,

n_{1} = n_{2} = 5

,

n_{3} = 6

. The number of nonnegative integer solutions of

x_{1} + x_{2} + x_{3} = 5

is equal to

(\binom{5 + 3 - 1}{2}) = 21

. This uses the familiar “stars and bars” method of counting the number of nonnegative integer solutions to such problems; see Section 6.5 of [8]. The 21 solutions are

005 104 014 203 113 023 302 212 122 032 401

311 221 131 041 500 410 320 230 140 050

We must be careful when counting the number of possible moves from a given state, because no component can go below 0 or above

min (n, n_{i})

. For example, if our current state is 023, we cannot choose the first component to subtract 1, because that would drop us below 0. We could add 1 to the first component and subtract 1 from either of the other components. A little analysis suggests that the following states are reachable from 023 in one step:

014 113 122 032

By contrast, if our current state is 122 there are no restrictions on which components we could choose; in this case we could choose any two components and subtract 1 from one and add 1 to the other. The states

113 212 221 131 032 023

are reachable from 122. Even though 023 and 122 are reachable from one another, the probability of going from 023 to 122 is 1/4, while the probability of going from 122 to 023 is 1/6. This asymmetry must be accounted for in the MH algorithm. The graph of all 21 states with edges indicating reachability in one step is shown in the left panel of Figure 1.

The middle and right panels of Figure 1 are the graphs for other scenarios where some of the components have upper restrictions. These nodes lie on a subset of the k-dimensional simplex.

Define the following

\begin{matrix} k & = dimension of x \\ ℓ & = number of x_{i} that satisfy 0 < x_{i} < min (n, n_{i}) \\ m & = number of x_{i} that are equal to 0 \end{matrix}

(5)

The number of components of

x

that are “maxed out”, i.e., equal to

min (n, n_{i})

, is equal to

k - ℓ - m

.

To illustrate the counting methods required to determine the number of possible moves from a given state, i.e., the number of neighbors in the network, suppose the current state is the vector

x^{(0)} = [0, 2, 10, 0, 10, 4, 0, 2, 3, 7]

where

n_{1} = n_{2} = n_{3} = n_{4} = n_{5} = 10

,

n_{6} = n_{7} = n_{8} = n_{9} = 20

, and

n_{10} = 40

. For this starting value,

k = 10, ℓ = 5

, and

m = 3

. The number of possible moves by selecting two components satisfying

0 < x_{i} < min (n, n_{i})

, and then selecting one of the two components to subtract 1, is

N_{1} = (\binom{5}{2}) \times 2 = 20 .

The number of possible moves obtained by selecting one component that is equal to 0 and one that is greater than 0 is

N_{2} = 3 \times (10 - 3) = 21 .

Note that once the two components are selected, there is only one way to move to a valid state: add 1 to the component that was 0, and subtract 1 from the other component.

The last way to select a move is to select one of the components that is maxed out (i.e., equal to

min (n, n_{i})

), and one that is not maxed out and also not equal to 0. (We must disallow selecting a 0 for the second choice, otherwise this would be a state covered by case 2). Again, we have no choice but to subtract 1 from the maxed out component and add on to the component that is not maxed out. This can be calculated in

N_{3} = (10 - 5 - 3) \times 5 = 10

ways. Thus, the number of possible moves from

x^{(0)}

is

N = N_{1} + N_{2} + N_{3} = 20 + 21 + 10 = 51 .

The next theorem gives an enumeration of the number of possible moves in general.

Theorem 1.

The number of possible moves given the current state

x

is

N = ℓ (ℓ - 1) + m (k - m) + (k - ℓ - m) m

Proof.

Suppose the current state is

x

with k,

ℓ

, and m as defined in (5). The states that can be reached in one step from

x

are those obtained by exactly one of the following rules:

Select two among the ℓ components that are strictly between 0 and $min (n, n_{i})$ . Within the two selected, choose one to subtract 1; then add 1 to the other. (Note: there are no restrictions on subtracting or adding 1 to any of the components in $x$ because they all satisfy $0 < x_{i} < min (n, n_{i})$ ).
Select one component among the m zeros (i.e., values of $x_{i}$ that are equal to 0). Add 1 to this component, and then select one among the $k - m$ nonzero components to subtract 1.
Select one component among the $k - ℓ - m$ that are maxed out (i.e., values of $x_{i}$ that are equal to $min (n, n_{i})$ ). Subtract 1 from this component and then add 1 to one of the m components that are not maxed out and also not 0, thus avoiding double counting here and in case 2.

Other ways of selecting two components from among the k do not lead to any reachable states. For example, we cannot select two components that are equal to 0. If we did this, we could not subtract 1 from either component. Similarly, we cannot select two components that are maxed out, since we would be unable to add 1 to either.

Once we recognize the three ways to select two components that lead to another reachable state, we can count them directly since they all involve two steps. The number of ways of applying the first rule is

N_{1} = ℓ (ℓ - 1) .

The number of ways to apply the second rule is

N_{2} = m (k - m) .

Finally, the number of ways to apply rule three is

N_{3} = (k - ℓ - m) m .

To get to a reachable state from

x

we must apply one of the three rules, so we conclude that the number of reachable states is

N = N_{1} + N_{2} + N_{3} = ℓ (ℓ - 1) + m (k - m) + (k - ℓ - m) m .

□

The algorithm for selecting one possible move from the state

x

, with all possible moves having equal probability, is given as Algorithm 1 below.

Algorithm 1: Propose a Move to a Neighbor

Input: A state vector

x = [x_{1}, x_{2}, \dots, x_{k}]

with the properties that

x_{i} \geq 0, i = 1, 2, \dots, k

and

\sum_{i = 1}^{k} x_{i} = n

.

Output: One feasible state vector selected at random, i.e., with equal probability, from the set of neighbors of

x

.

1:

Compute

N_{1}

,

N_{2}

, and

N_{3}

, defined in Theorem 1. Let

N = N_{1} + N_{2} + N_{3}

.

2:

Simulate

U \sim UNIF (0, 1)

2a:: If $0 \leq U < N_{1} / N$ , then select two components at random from among the members of the set ${i | 0 < x_{i} < min (n, n_{i})}$ . Select one of these two with probability 0.5 and subtract 1; add 1 to the other. Call the result $x^{*}$ .
2b:: If $N_{1} / N \leq U < (N_{1} + N_{2}) / N$ then randomly select one component of $x$ from among the m components in the set ${i | x_{i} = 0}$ . Add 1 to this component. Select at random one component from among those in ${i | x_{i} > 0}$ ; this set contains $k - m$ elements. Subtract 1 from this component. Call the result $x^{*}$ .
2c:: If $(N_{1} + N_{2}) / N \leq U \leq 1$ then select one component at random from among the $k - ℓ - m$ elements of the set ${i | x_{i} = min (n, n_{i})}$ , i.e., the set of those components that are maxed out; subtract 1 from this component. Then select one component from among the m that are not maxed out and not 0; add 1 to this component. Call the result $x^{*}$

3:

Return

x^{*}

.

3. Example of the Random Walk MH Algorithm

Consider the case of

k = 10

, with

\begin{matrix} X_{1} \sim BIN (10, 0.2) & X_{2} \sim BIN (10, 0.3) & X_{3} \sim BIN (10, 0.3) & X_{4} \sim BIN (10, 0.4) \\ X_{5} \sim BIN (10, 0.3) & X_{6} \sim BIN (20, 0.1) & X_{7} \sim BIN (20, 0.1) & X_{8} \sim BIN (20, 0.1) \\ X_{9} \sim BIN (20, 0.1) & X_{10} \sim BIN (40, 0.4) \end{matrix}

The constraint is that

\sum_{i = 1}^{10} x_{i} = n = 38

. Suppose that the starting vector is

x^{(0)} = [0, 2, 10, 0, 10, 4, 0, 2, 3, 7] .

For this vector, the number of components for which

0 < x_{i} < min (n, n_{i})

is

ℓ = 5

. The number of zeros is

m = 3

and the number of maxed out components is

k - ℓ - m = 2

(the third and fifth components). Direct calculation yields

\begin{matrix} N_{1} & = ℓ (ℓ - 1) = 5 \times 4 = 20 \\ N_{2} & = m (k - m) = 3 \times 7 = 21 \\ N_{3} & = (k - ℓ - m) m = 2 \times 3 = 6 . \end{matrix}

(6)

Thus, there are

N = 20 + 21 + 6 = 47

neighbors of

x^{(0)}

, each of which is equally likely to be selected as the proposal in the MH algorithm.

The cutoff values for selecting which kind of move to take are

(\frac{N_{1}}{N}, \frac{N_{1} + N_{2}}{N}, 1) = (0.4255, 0.8723, 1)

The first uniform random variate is

U = 0.7209

, which leads to the rule to select one 0, to which we add 1, and then select one nonzero component to subtract 1. The algorithm selected

x_{7} = 0

as the component to add 1 and

x_{3} = 10

as the non-zero component to subtract 1. The proposed state is therefore

x^{*} = [0, 2, 9, 0, 10, 4, 1, 2, 3, 7] .

This proposed state has 60 neighbors.

The log-likelihood at

x^{(0)}

is

- 46.83645

and the log-likelihood at the proposed vector

x^{*}

is

- 42.88806

. The acceptance probability is therefore

\begin{matrix} α & = min (1, \frac{p (x^{*})}{p (x^{(0)})} \times \frac{q (x^{(0)}) | x^{*})}{q (x^{*} | x^{(0)})}) \\ = min (1, \frac{exp (- 42.88806)}{exp (- 46.83645)} \frac{1 / 60}{1 / 47}) \\ = min (1, 40.62) \\ = 1 . \end{matrix}

Therefore, this move is accepted with probability 1. We set

x^{(1)} = x^{*} .

The steps of the MH algorithm are then repeated with

x^{(1)}

,

x^{(2)}, \dots

according to Algorithm 2.

Figure 2 shows the first 5, 50, 500, 5000, and 50,000 iterations of the random walk Markov chain. Convergence to the steady state is slow, but seems to occur by about 20,000 iterations. The slow convergence is likely due to the small changes in the proposed move from

x

to

x^{*}

in the MH algorithm.

Algorithm 2: Random Walk MH

Input: Vectors

[n_{1}, n_{2}, \dots, n_{k}]

and

[p_{1}, p_{2}, \dots, p_{k}]

, the constraint on the sum, n, and the desired number N of samples.

Output: A sequence of vectors

x_{1}, x_{2}, \dots, x_{N}

whose steady state distribution is the conditional distribution of

X

given

\sum_{i = 1}^{k} X_{i} = n

.

1:: Let $x^{(0)}$ be any nonnegative integer solution to $x_{1} + x_{2} + \dots + x_{k} = n$ . Set $j = 1$ . A good starting value is a sample from the multinomial distribution given in (2).
2:: Sample a proposal $x^{*}$ using Algorithm 1.
3:: Compute the acceptance probability

$α = min (1, \frac{p (x^{*})}{p (x^{(j - 1)})} \times \frac{q (x^{(j - 1)} | x^{*})}{q (x^{*} | x^{(j - 1)})})$
4:: Simulate $U \sim UNIF (0, 1)$ . If $U < α$ , accept the proposed move and set $x^{(j)} = x^{*}$ ; otherwise set $x^{(j)} = x^{(j - 1)}$ .
5:: Set $j \leftarrow j + 1$ . If $j < N$ , go to step 2. Otherwise stop.

4. An Independence Sampler

The Poisson approximation to the binomial suggests that the distribution of

[X_{1}, X_{2}, \dots,

X_{k}]

conditioned on the sum

\sum_{i = 1}^{k} X_{i} = n

should be approximately multinomial as given in (2). This result suggests that we could propose

x^{*}

from the multinomial distribution in (2) and then accept or reject this proposal. This is an example of an independence sampler, where the proposal distribution

q (x^{*} | x)

does not depend on the current state

x

.

With this framework, the proposal PMF is

q (x^{*} | x) = \frac{1}{M}

where

M

is equal to the number of nonnegative integer solutions of

\sum_{i = 1}^{k} x_{i} = n

, which is

M = (\binom{n + k - 1}{k - 1}) .

More importantly, the function q is symmetric in the sense that

q (x^{*} | x) = q (x | x^{*})

. Thus, in the acceptance probability for the MH algorithm, the ratio

q (x | x^{*}) / q (x^{*} | x)

is 1.

The independence sampler MH algorithm is given as Algorithm 3.

Algorithm 3: Independence Sampler MH

Input: Vectors

[n_{1}, n_{2}, \dots, n_{k}]

and

[p_{1}, p_{2}, \dots, p_{k}]

, the constraint on the sum, n, and the desired number N of samples.

Output: A sequence of vectors

x_{1}, x_{2}, \dots, x_{N}

whose steady state distribution is the conditional distribution of

X

given

\sum_{i = 1}^{k} X_{i} = n

.

1:: Let $x^{(0)}$ be any nonnegative integer solution to $x_{1} + x_{2} + \dots + x_{k} = n$ . Set $j = 1$ .
2:: Sample a proposal $x^{*}$ from the multinomial distribution given in (2).
3:: Compute the acceptance probability

$α = min (1, \frac{p (x^{*})}{p (x^{(j - 1)})} \times \frac{q (x^{(j - 1)} | x^{*})}{q (x^{*} | x^{(j - 1)})}) = min (1, \frac{p (x^{*})}{p (x^{(j - 1)})})$
4:: Simulate $U \sim UNIF (0, 1)$ . If $U < α$ , accept the proposed move and set $x^{(j)} = x^{*}$ ; otherwise set $x^{(j)} = x^{(j - 1)}$ .
5:: Set $j \leftarrow j + 1$ . If $j < N$ , go to step 2. Otherwise stop.

5. Example of the Independence Sampler MH Algorithm

Consider the same example given in Section 3 where we have

k = 10

and the constraint is

\sum_{i = 1}^{10} x_{i} = n = 38

. Suppose, as before, that the starting vector is

x^{(0)} = [0, 2, 10, 0, 10, 4, 0, 2, 3, 7] .

The multinomial distribution from (2) is

MULT (38, [0.0513, 0.0769, 0.0769, 0.1026, 0.07692, 0.0513, 0.05128, 0.0513, 0.0513, 0.4103])

(7)

The proposed vector is sampled from this multinomial distribution, and is found to be

x^{*} = [3, 5, 4, 5, 2, 0, 1, 2, 2, 14] .

The acceptance probability is then

\begin{matrix} α & = min (1, \frac{p (x^{*})}{p (x^{(0)})}) \\ = min (1, \frac{exp (- 16.71331)}{exp (- 46.83645)}) \\ = min (1, exp (30.12314)) \\ = 1 . \end{matrix}

Thus, we accept the move and set

x^{(1)} = [3, 5, 4, 5, 2, 0, 1, 2, 2, 14] .

It is possible for the proposed vector to be infeasible. This can occur when the proposal satisfies

x_{i}^{*} > n_{i}

. For the example given previously,

X_{1} \sim BIN (10, 0.2)

. With the proposal from (7), the marginal for

X_{1}

is

BIN (38, 0.0513)

so

x = 11

is an unlikely but possible outcome, even though the unconditional distribution is

BIN (10, 0.2)

from which

x_{1} = 11

is impossible. The algorithm handles this infeasible proposal by giving zero likelihood to it. In other words, if a move is proposed to an infeasible vector, the move is never accepted because the numerator in the acceptance probability is 0. For the example discussed previously in this section, we ran 1,000,000 simulations using the independence sampler and obtained zero infeasible

x_{i}^{*}

values. It is important to recognize that this possibility exists and that the algorithm does not crash when it is encountered.

If we continue this algorithm we obtain a sequence of

x

whose steady state distribution is the target. The first 5, 50, 500, and 5000 simulations are shown in Figure 3. Convergence to the steady state is faster with the independence sampler.

6. Summary and Conclusions

This study was motivated by a problem in allocating county-level vaccine counts. Many states in the United States were diligent about recording the county of residence for each vaccine recipient, while other states had a high proportion of recipients whose county of residence was unknown. The problem, described in [2], was to allocate the state-wide unknowns to the various counties. Assigning these proportionally to the counties based on population is problematic, because a number of demographic and political variables are known to affect the willingness to get vaccinated. To address this, we applied a version of the data augmentation algorithm of Tanner and Wong [1] to simultaneously impute the missing county-level frequencies and estimate parameters in a logistic regression model for vaccine willingness. As part of the data augmentation algorithm, we are required to simulate from the distribution of binomials conditioned on the sum.

Users of the algorithm described in this article may have one of two goals: (1) approximating the conditional probability mass function of the vector

X = [X_{1}, X_{2}, \dots, X_{k}]

given

\sum_{i = 1}^{k} X_{i} = n

, and (2) simulating one observation from the conditional distribution. For the first goal, the algorithm would have to be run for quite some time even after the steady state is (essentially) reached. In other applications, e.g., [2], the problem is to simulate one observation from this conditional distribution. The need arises while applying the imputation step of the data augmentation algorithm. In this latter case, the use would return a single observation, say the very last simulated value.

Users who want to approximate the joint PMF of the conditional distribution will be limited to small values of n and k. The number of states in the Markov chain,

(\binom{n + k - 1}{k - 1})

, is equal to the number of points in the support of the conditional PMF for

X_{1}, X_{2}, \dots, X_{k}

. For large n and k, the number of states is too large to obtain a reasonable number of “visits” to each state, which is needed to approximate the conditional PMF. On the other hand, if the goal is to simulate from the conditional PMF of

X_{1}, X_{2}, \dots, X_{k}

given

\sum_{i = 1}^{k} X_{i} = n

, the algorithm will scale well because each step of the MCMC involves simulation of just k random variables. The problems that motivated this study involved the latter scenario, where we wanted a single simulation from the conditional PMF.

Algorithms similar to those described here could be used to approximate the joint distribution of any set of discrete random variables whose support is a subset of the integers. The speed of convergence of the independence sampler will depend on how close the true (unknown) distribution is to multinomial distribution.

Author Contributions

Conceptualization, S.E.R.; methodology, K.A. and S.E.R.; coding K.A. and S.E.R.; data curation, S.E.R.; writing—original draft preparation, K.A. and S.E.R.; writing—review and editing, K.A. and S.E.R.; visualization, K.A. and S.E.R.; supervision, S.E.R. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The code for these algorithms is available on request from the second author.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

MH	Metropolis–Hastings
PMF	probability mass function
MCMC	Markov chain Monte Carlo
BIN	Binomial distribution
MULT	Multinomial distribution
POIS	Poisson distribution

References

Tanner, M.A.; Wong, W.H. The calculation of posterior distributions by data augmentation. J. Am. Stat. Assoc. 1987, 82, 528–540. [Google Scholar] [CrossRef]
Ayres, K.; Zhao, L.; Megahed, F.M.; Jones-Farmer, L.A.; Burroughs, T.E.; Rigdon, S.E. COVID Vaccine Data Quality, Cleaning, and Management. Qual. Eng. 2025; accepted. [Google Scholar]
Metropolis, N.; Rosenbluth, A.W.; Rosenbluth, M.N.; Teller, A.H.; Teller, E. Equation of state calculations by fast computing machines. J. Chem. Phys. 1953, 21, 1087–1092. [Google Scholar] [CrossRef]
Hastings, W.K. Monte Carlo sampling methods using Markov chains and their applications. Biometrika 1970, 57, 97–109. [Google Scholar] [CrossRef]
Geman, S.; Geman, D. Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images. IEEE Trans. Pattern Anal. Mach. Intell. 1984, 6, 721–741. [Google Scholar] [CrossRef] [PubMed]
Gilks, W.R.; Richardson, S.; Spiegelhalter, D. Markov Chain Monte Carlo in Practice; Chapman and Hall/CRC: New York, NY, USA, 1995. [Google Scholar]
Gelman, A.; Carlin, J.B.; Stern, H.S.; Dunson, D.B.; Vehtari, A.; Rubin, D.B. Bayesian Data Analysis, 3rd ed.; Chapman and Hall/CRC: New York, NY, USA, 2021. [Google Scholar]
Rosen, K. Discrete Mathematics and Its Applications, 7th ed.; McGraw-Hill: New York, NY, USA, 2012. [Google Scholar]

Figure 1. States in a Markov chain for the case of

k = 3

and

x_{1} + x_{2} + x_{3} = n = 5

. In the left panel there are no upper restrictions on the value of any component of

x

because each

n_{i}

is 5 or greater. In the middle panel, the first component is restricted to being less than or equal to 4, causing the removal of the node 500. In the right panel, the first component is restricted to be 3 or less while the other components are restricted to be 4 or less.

Figure 1. States in a Markov chain for the case of

k = 3

and

x_{1} + x_{2} + x_{3} = n = 5

. In the left panel there are no upper restrictions on the value of any component of

x

because each

n_{i}

is 5 or greater. In the middle panel, the first component is restricted to being less than or equal to 4, causing the removal of the node 500. In the right panel, the first component is restricted to be 3 or less while the other components are restricted to be 4 or less.

Figure 2. Trace plots for state vectors using the random walk sampler. (a) first 5 iterations, (b) first 50 iterations, (c) first 500 iterations, (d) first 5000 iterations, and (e) first 50,000 iterations. The lines, which vary in color, describe the trace for the components of

x^{(j)}

.

Figure 2. Trace plots for state vectors using the random walk sampler. (a) first 5 iterations, (b) first 50 iterations, (c) first 500 iterations, (d) first 5000 iterations, and (e) first 50,000 iterations. The lines, which vary in color, describe the trace for the components of

x^{(j)}

.

Figure 3. Trace plots for state vectors using the independence sampler. (a) first 5 iterations, (b) first 50 iterations, (c) first 500 iterations, and (d) first 5000 iterations. The lines, which vary in color, describe the trace for the components of

x^{(j)}

.

Figure 3. Trace plots for state vectors using the independence sampler. (a) first 5 iterations, (b) first 50 iterations, (c) first 500 iterations, and (d) first 5000 iterations. The lines, which vary in color, describe the trace for the components of

x^{(j)}

.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Ayres, K.; Rigdon, S.E. An Algorithm for the Conditional Distribution of Independent Binomial Random Variables Given the Sum. Mathematics 2025, 13, 2155. https://doi.org/10.3390/math13132155

AMA Style

Ayres K, Rigdon SE. An Algorithm for the Conditional Distribution of Independent Binomial Random Variables Given the Sum. Mathematics. 2025; 13(13):2155. https://doi.org/10.3390/math13132155

Chicago/Turabian Style

Ayres, Kelly, and Steven E. Rigdon. 2025. "An Algorithm for the Conditional Distribution of Independent Binomial Random Variables Given the Sum" Mathematics 13, no. 13: 2155. https://doi.org/10.3390/math13132155

APA Style

Ayres, K., & Rigdon, S. E. (2025). An Algorithm for the Conditional Distribution of Independent Binomial Random Variables Given the Sum. Mathematics, 13(13), 2155. https://doi.org/10.3390/math13132155

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

An Algorithm for the Conditional Distribution of Independent Binomial Random Variables Given the Sum

Abstract

1. Introduction

2. Random Walk MH Algorithm

3. Example of the Random Walk MH Algorithm

4. An Independence Sampler

5. Example of the Independence Sampler MH Algorithm

6. Summary and Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI