Abstract
A diffusion-taking value in probability-measures on a graph with vertex set V, is studied. The masses on each vertex satisfy the stochastic differential equation of the form on the simplex, where are independent standard Brownian motions with skew symmetry, and is the neighbour of the vertex i. A dual Markov chain on integer partitions to the Markov semigroup associated with the diffusion is used to show that the support of an extremal stationary state of the adjoint semigroup is an independent set of the graph. We also investigate the diffusion with a linear drift, which gives a killing of the dual Markov chain on a finite integer lattice. The Markov chain is used to study the unique stationary state of the diffusion, which generalizes the Dirichlet distribution. Two applications of the diffusions are discussed: analysis of an algorithm to find an independent set of a graph, and a Bayesian graph selection based on computation of probability of a sample by using coupling from the past.
Keywords:
Bayesian graph selection; coupling from the past; integer partition; interacting particle system; independent set finding; measure-valued diffusion MSC:
60K35; 05C81; 60J70; 60J90; 65C05
1. Introduction
Consider a finite graph consisting of vertices , and edges E. Throughout this paper, a graph is undirected and connected. The neighbour of the vertex is denoted by , where means that i and j are adjacent. The degree of the vertex i is denoted by , which is the cardinality of the set . An independent set of is a subset of V such that no two are adjacent. In other words, a set of vertices is independent if and only if it is a clique in the graph complement of .
If two vertices of a graph have precisely the same neighbour, throughout this paper, we call the graph obtained by identifying these two vertices with keeping the adjacency a reduced graph of .
Let be the totality of probability measures on the simplex
equipped with the topology of weak convergence. Itoh et al. [1] discussed a diffusion-taking value in probability measures on graph , whose state is identified with probability measure and the masses on each vertex
which starts from and satisfies the stochastic differential equation of the form
where are independent standard Brownian motions with skew symmetry . The generator of the diffusion L operates on a function as
where , , , and otherwise, and is the totality of functions with a continuous derivative up to the second order and compact support in .
We say a face of the simplex corresponds to a set of vertices if the face is the interior of the convex hall of denoted by , where is the set of standard basis of the vector space . If U consists of a single vertex, say , should be read as . An observation on the diffusion is as follows.
Proposition 1.
Every point in a face of the simplex corresponding to an independent set of a graph is a fixed point of the stochastic differential Equation (1). Namely,
is a fixed point.
Proof.
Itoh et al. [1] discussed the diffusion as an approximation of the following discrete stochastic model. Consider a system of N particles, where each particle is assigned to one vertex in V, and there is a continuous-time Markov chain
Here, is the number of particles assigned to the vertex at time s. At each instant of time, two of the N particles are chosen uniformly randomly. If particles at vertices are chosen, one of the particles is chosen with equal probabilities and assigned to another vertex. This causes a transition from to or with equal probabilities. This stochastic model seems to have various applications. Tainaka et al. [2] discussed this stochastic model as a model of a speciation process caused by geography. Ben-Haim et al. [3] considered a slightly modified version, in which a transition from to , occurs on the one-dimensional graph, where , . They called it a “compromise process” because if we regard the vertices as political positions, then a transition is a compromise. The process converges weakly to the diffusion : if , then as in the space of right continuous functions with left limits (see Theorem 10.3.5 of [4]).
Apart from an approximation of the discrete stochastic model discussed above, the diffusion seems to appear in various contexts. The diffusion on a complete graph appears as an approximation of a quite different discrete stochastic model, called the Wright–Fisher model in population genetics, which evolves by repetition of multinomial sampling of a fixed number of particles (see, e.g., Chapter 10 of [4], for details). The class of measure-valued diffusions is called Fleming–Viot processes. Such diffusions appear as prior distributions in Bayesian statistics.
As we will see below, the support of an extremal stationary state of the semigroup associated with the diffusion is a face of which corresponds to an independent set of . In this sense, the diffusion can be regarded as independent set-finding in graph theory. Some problems related with independent set-finding, such as the maximum independent set problem, are known to be NP-hard, so it is believed that there is no efficient algorithm to solve them. Therefore, algorithms to find maximal independent sets are useful to obtain practical solutions. For example, Luby [5] discussed a parallel algorithm for finding maximal independent sets that was derived from a stochastic algorithm to find a maximal independent set. His algorithm is based on a step to find an independent set based on random permutation executed by processors in time for large .
Let us assume there exists a strongly continuous Markov semigroup associated with the diffusion governed by the generator (2) such that
where is the totality of continuous functions on , and
The existence of such a semigroup for complete graphs was proven by Ethier [6]. For the solution of the stochastic differential Equation (1), we have
Let us denote by the adjoint semigroup on induced by . Consider the totality of all fixed points of :
We call each element of a stationary state of . A stationary state satisfies
The set is non-empty and convex. The totality of the extremal elements of stationary states is denoted by . Namely, a stationary state is uniquely represented as
for some , . In Theorem 1, we see that support of an extremal stationary state of the diffusion is a face of corresponding to an independent set of .
In this paper, we use the term support in a sloppy sense. Namely, positivity of a stationary state is not assumed unless otherwise stated. In fact, Proposition 1 implies that if the diffusion starts from any point x in , the stationary state is , or an atom at x. In this situation, we say the support is . In other words, if a stationary state does not have probability mass anywhere in an open set, then we say the set is not the support of the stationary state. Some examples are as follows.
Example 1.
Let , which is a complete graph consisting of r vertices. Since each vertex is maximally independent, a stationary state is represented as , where . For the solution of the stochastic differential Equation (1), is the absorption probability for the vertex . Since is a martingale, we know .
Example 2.
Let for an even positive integer r, which is a cycle graph consisting of r vertices, i.e., . The maximal independent sets are the set of all even integer vertices and that of all odd integer vertices. When , we have independent sets , , , , , and . The supports of extremal stationary states are the faces , , , and . The totality of the extremal stationary states is , where and are densities (not necessarily strictly positive) on and , respectively. Therefore, a stationary state is represented as .
Example 3.
Let for positive integers r and s, which is a complete bipartite graph consisting of two disjointed and maximally independent sets of r and s vertices. For a graph whose maximal independent sets are and , a stationary state may be represented as .
Obtaining an explicit expression for the stationary states is a challenging problem. Itoh et al. [1] successfully obtained an explicit expression for the stationary states for a star graph , where a star graph , is a complete bipartite graph , and the vertices of are numbered such that , . A stationary state may be represented as . If we identify vertices , the star graph is reduced to a complete graph . Using the arguments for a complete graph in Example 1, we know and . Itoh et al. [1] obtained an explicit expression for the diffusion starting from :
by using martingales introduced in Section 2. This result is for a specific graph but can be applied to other graphs reducible to . For example, the four-cycle graph discussed in Example 2 can be reduced to . Explicit expressions for , are immediately obtained.
This paper is organized as follows. In Section 2, the martingales used by Itoh et al. [1] are revisited in a slightly generalized form. An interpretation of the martingales is presented in Section 3. A duality relation between the Markov semigroup associated with the diffusion and a Markov chain on the set of ordered integer partitions is established. The dual Markov chain is studied and used to show that the support of an extremal stationary state of the adjoint semigroup is an independent set of the graph. In Section 4, we investigate the diffusion with a linear drift, which gives a killing of the dual Markov chain on a finite integer lattice. The Markov chain is studied and used to study the unique stationary state of the diffusion, which generalizes the Dirichlet distribution. In Section 5, two applications of the diffusions are discussed: analysis of an algorithm to find an independent set of a graph and Bayesian graph selection based on computation of the probability of a sample by using coupling from the past. Section 6 is devoted to discussion of open problems.
2. Invariants among Moments
For a graph , , an element with is denoted by . We use multi-index notation; a monomial is simply written as .
For star graphs , Itoh et al. [1] noticed the following homogeneous polynomials of arbitrary order :
are martingales, where the sum is taken over all ordered positive integer partitions of n satisfying with , and with , . This result is generalized for a generic graph; an example is a reducible graph for which vertices in an independent set can be identified (a reduced graph is defined in Section 1).
Proposition 2.
Let be an independent set of a graph sharing an adjacent vertex. The homogeneous polynomials of any order :
is a martingale with respect to the natural filtration generated by the solution of the stochastic differential Equation (1), where with , .
Proof.
Applying Itô’s formula to monomials , we have
where the vertex j is adjacent to all vertices of . Then,
where
The right side of Equation (7) is proportional to
and it vanishes because the second summation does not depend on the index i and . □
Proposition 2 gives invariants among n-th order moments of the marginal distribution of the solution of the stochastic differential Equation (1) at a given time. More precisely, such a moment is represented as
Itoh et al. [1] used the invariants to derive the expression (6) for masses on atoms in the star graph .
Corollary 1.
Let be an independent set of a graph sharing an adjacent vertex. For moments of each order , we have
where with , .
A small example follows.
Example 4.
Let , which is the cycle graph consisting of four vertices discussed in Example 2. A maximal independent set of , shares 1 or 3 of the adjacent vertices. The totality of ordered positive integer partitions of four are , , and , which correspond to the fourth-order moments , , and , respectively. They constitute an invariant:
The existence of invariants among same-order moments is interesting, but we are also interested in computation of each moment. They can be computed by simple algebra since each order moment satisfies a system of differential equations:
for each , where is the totality of the ordered positive integer partitions of an integer n with r positive integers:
However, it is obvious that solving the system (9) becomes prohibitive as the cardinality of the set grows. Computation of moments via stochastic simulation is discussed in Section 5.
3. Dual Process on Integer Partitions
To study diffusions governed by the generator (2), we employ a tool called duality, which is a familiar tool in the study of interacting particle systems (see, e.g., Chapter 2 of [7]).
Consider a graph and let , be a continuous-time Markov chain on the set of ordered non-negative integer partitions of n with , which is denoted by , by the rate matrix :
where
The backward equation for the transition probability is
We have the following duality relation between the Markov semigroup and the Markov chain .
Lemma 1.
Proof.
Noting that
and the operation (4), we see that satisfies the differential equation
for each . This is uniquely solved by means of the Feynman–Kac formula, and the assertion follows. □
Since the total number of particles is kept, i.e., , , the killing rate (13) is bounded. The killing rate is not positive definite; however, a key observation is that if the support of a monomial a denoted by is an independent set of , then the killing rate is non-positive: . The converse is not always true.
To illustrate the Markov chain , let us ask specific questions. What is the moment of for the cycle graph discussed in Example 2? For a chain that starts from , there are two possible transitions: the one is absorbed into the state and the other is absorbed into the state , where the rates are unities (see Figure 1).
Figure 1.
Possible transitions of the chain on the cycle graph starting from .
The waiting time for the occurrence of either of these two transitions follows the exponential distribution of rate two. Since and , the right side of the duality relation (12) can be computed as
where s is the time that one of the two possible transitions occurs. The first term corresponds to the case that no transition occurs until time t. Let us call a transition event a collision.
Remark 1.
An analogous dual Markov chain of the diffusion approximation of a kind of Wright–Fisher model was effectively used by Shiga [8,9], where a transition occurs with the same rate as in (10). Such a transition event is called “coalescent”. In contrast to a collision, the total number of particles decreases by a coalescent.
Here, we have a simple observation about the invariants among moments discussed in Section 2. If a chain starts from a state a such that is a maximal independent set of the graph , then the killing rate is non-positive, and the duality relation (12) is reduced to
Corollary 1 implies cancellation of the second term among moments. Moreover, considering a case that the diffusion starts from a point x in the face corresponding to , by Proposition 1, we know that after a collision, must contain a vertex that is not contained in .
Let us ask another question. What is the moment of for the star graph discussed in Section 1? Some consideration reveals that a chain starting from a will never be absorbed, and transitions occur among the three states: a, , and . Since and , the duality relation (12) gives
where is 1 if the argument is true and zero otherwise. Solving the backward Equation (11), we immediately obtain
However, computation of the right side of Equation (14) does not seem easy because the expectation depends on a sample path of the chain . Nevertheless, the moments can be obtained easily by solving the system of differential equations (9). In fact, we have
where
The observations above lead to the following proposition on the fate of the Markov chain .
Proposition 3.
Consider the Markov chain on the set of the ordered non-negative integer positive partitions .
- If a chain starts from a state a satisfying , then it is absorbed into an element of ;
- If a chain starts from a state a satisfying , then the transition probability converges to the uniform distribution on the set of ordered positive integer partitions
Proof.
- (i)
- A state a is absorbing if and only if the row vector of the rate matrix (10) is zero, which implies . Consider the set of vertices . Then, , is a death process and is absorbed into the state at Markov time with respect to the Markov chain , where for . Let us show that the process eventually decreases if . If , at least one vertex, say , satisfies . If the vertex j is connected with a vertex in , say k, the transition occurs with positive probability, and it makes . Otherwise, the vertex j should be connected with at least one vertex in , say l. The transition makes . If the vertex l is connected with a vertex in , the assertion is shown. Otherwise, the vertex l should be connected with at least one vertex in . The proof is completed by repeating this procedure until we reach a vertex in that is connected with a vertex in .
- (ii)
- If a chain starts from a state in the set , the argument for , i.e., is a death process, shows that in a finite time the chain reaches a state, say a, in the set and never exits from . For such a case, consider restarting the chain from the state a. For simplicity, we consider the case of . Then, a state can be identified with the unique vertex satisfying . Since we are considering a connected graph, there exists a path for any with some , where , . Since all of the transitions occur with rate unity, the sample path has a positive probability. Hence, the Markov chain is irreducible and ergodic, and there exists the unique stationary distribution on the set . Since the uniform distribution on satisfies the backward Equation (11), it is the stationary distribution. Cases of can be shown in a similar manner by showing that up to particles can be moved from one vertex to another vertex.
□
Proposition 3 shows that if the Markov chain starts from a state a satisfying , convergence of the chain can be divided into two phases: (1) exit from the set and (2) convergence to the uniform distribution on the set . The largest eigenvalue of the rate matrix is zero. To consider the mixing, we have to know the second-largest eigenvalue, say . By spectral decomposition of the transition probability, the mixing time of Phase 2, or the infimum of t satisfying
for is less than for some constant , where is the total variation distance between probability measures and .
Example 5.
Consider the case that (see the proof of Proposition 3 ()). It can be shown that the rate matrix reduces to the negative of the Laplacian matrix of , whose second- largest eigenvalue is called the algebraic connectivity of . For a connected graph, it is known that the largest eigenvalue is zero and the second-largest eigenvalue is bounded below by [10], where is the diameter of , or the maximum of the shortest path lengths between any pair of vertices in . For the star graph and the fourth-order monomials discussed above, the rate matrix is the negative of the Laplacian matrix:
and the eigenvalues are 0, , and , where , . The two eigenvalues appear in the transition probabilities (15). For a generic graph with large r, the mixing time is .
Assessment of Phase 1 seems harder. The death process , with is not a Markov process. The death rate is bounded below:
To obtain a rough estimate of the right side, let us suppose the state follows the uniform distribution on the set of ordered positive integer partitions , where
Then, the expectation of the lower bound is , where . When n is large, the dominant contribution to the expectation of the waiting time for the exit comes from the period , and it is . Hence, the expectation of the waiting time for the exit would be .
Shiga [8,9] and Shiga and Uchiyama [11] studied structures of extremal stationary states of the diffusion approximation of a kind of Wright–Fisher model in for a countable set S by using its dual Markov chain. Extremal states of the adjoint Markov semigroup on induced by associated with the diffusion can be studied by using the dual Markov chain . Note that positivity of a stationary state is not assumed, as explained in Section 1.
Theorem 1.
The support of an extremal stationary state of the adjoint Markov semigroup is a face of the simplex corresponding to an independent set of the graph , namely, .
Proof.
Consider a Markov chain with rate matrix (10) starting from a state . According to Proposition 3 , such an a is an absorbing state and the chain stays there. Lemma 1 gives
A stationary state satisfies for any . If , this condition reduces to
Therefore, if is not an independent set, then . Since we are considering a connected graph , V is not an independent set of . The condition reduces to . Since
the condition excludes from the support of . Suppose there exists a vertex such that is still not an independent set. Set a is such that if and otherwise for each . Since
the condition excludes from the support of . Repeating this procedure yields an independent set for some , and the face is not excluded from the support of . □
The steps in the above proof appear in the following example.
Example 6.
Let , which is a cycle graph consisting of four vertices. The support of an extremal stationary state appearing in Example 2 is confirmed as follows. Remove the vertex from the vertex set V. Since is not an independent set, the face is excluded from the support of extremal stationary states. Then, remove from . Since is an independent set, the face is the support of an extremal stationary state.
A direct consequence of Theorem 1 on the moments is as follows.
Corollary 2.
For each , if the limit of an n-th order moment of the diffusion on a graph is positive, namely,
then is an independent set of .
4. Diffusion with Linear Drift
In this section, we consider the diffusion taking value in probability measures on a graph , satisfying the following stochastic differential equation with linear drift:
for . The drift term, , gives a killing of the dual process with a linear rate. As shown below, behaviours of the diffusion and the dual Markov chain are significantly different from those without drift discussed in previous sections.
In Itoh et al.’s discrete stochastic model described in Section 1, this drift corresponds to adding the following dynamics: at each instant of time, one of N particles is chosen uniformly randomly and assigned to another vertex chosen uniformly randomly with rate . In the Wright–Fisher model, this drift corresponds to a mutation mechanism [4].
Let be a continuous-time Markov chain on a finite integer lattice, or the set of non-negative integers
by the rate matrix :
where
The backward equation for the transition probability is
The following duality relation between the Markov semigroup associated with the diffusion denoted by and the Markov chain can be shown in the same manner as Lemma 1.
Lemma 2.
The Markov semigroup and the Markov chain with the rate matrix (17) satisfy
for each , where the killing rate is
In contrast to the rate matrix in (10), particles are erased, and it makes the total number of particles decrease. It is clear from the rate matrix (17) that 0 is the unique absorbing state.
Proposition 4.
Let , which is a Markov time with respect to the Markov chain with the rate matrix (17). Then,
Proof.
Since if and only if , we consider the Markov chain of the cardinality . According to the rate matrix (17), it is a linear death process with rate . Noting that is the convolution of exponential random variables of rates , , we have the assertion. □
To illustrate the Markov chain , let us ask a specific question. What is the moment of from the cycle graph ? For a chain that starts from a, there are four possible sample paths (Figure 2):
Figure 2.
Possible sample paths of the chain on the cycle graph starting from .
- (i)
- No particles are erased;
- (ii)
- Particle 1 is erased, but Particle 2 survives;
- (iii)
- Particle 2 is erased, but Particle 1 survives;
- (iv)
- Both particles are erased.
The waiting time for either of the two particles to be erased follows the exponential distribution of rate . Since and , the right side of the duality relation (18) can be computed as
where and are the times that a particle is erased.
The stationary state of the adjoint Markov semigroup on induced by consists of the unique probability measure.
Theorem 2.
For the adjoint Markov semigroup , there exists the unique stationary state satisfying
for every .
Proof.
Since the Markov chain with the rate matrix (17) is absorbed into 0, Lemma 2 and Proposition 4 give
for all . Since for each , there exists a unique probability measure satisfying for all . □
The stationary state converges weakly to a common limit as irrespective of the graph.
Corollary 3.
The stationary state of the adjoint Markov semigroup satisfies
Proof.
Since the killing rate (19) is bounded and for large , the leading contribution to the expression (20) can be evaluated by the death process considered in Proposition 4, whose waiting time follows the exponential distribution of rate . Let . We have
where is a constant satisfying for all b satisfying . In the same way, is bounded below. The assertion follows by taking the limit of these bounds. □
Moreover, the stationary state has a continuous and strictly positive density.
Theorem 3.
For the adjoint Markov semigroup , the unique stationary state is absolutely continuous with respect to the Lebesgue measure on and admits a probability density that is strictly positive in and is of -class.
Proof.
We first show that has a density of -class. By Theorem 2, we have
for each . Therefore, has a -density represented as
We next show that the density is strictly positive in . Consider an approximation of by polynomials:
satisfying . Suppose there exist a point satisfying . Since the polynomials are strictly positive, for any small positive constants and , there exists an N such that
and is covered by open balls:
for all . Since is smooth, for every point , we can find a ball containing x and for some constant c. This implies , , but it contradicts the fact that , followed by the expression (20) because the killing rate is bounded and the Markov time satisfies by Proposition 4. □
An immediate consequence is the following corollary, which is an analogous result to Corollary 2.
Corollary 4.
The moments of the stationary state of the adjoint Markov semigroup are positive, namely,
The moments of the stationary state can be obtained by the condition for the stationary state (5). It gives a system of recurrence relations:
for each with the boundary condition . In contrast to the system of ordinary differential equations (9), this system is not closed among the moments of the same order. Prior to solving the system for the moment of a given monomial a, we have to solve the systems for the moments of lower orders than a. Therefore, it seems a formidable task to solve System (22). Diffusion on a complete graph is an exception.
Example 7.
Let , which is the complete graph consisting of r vertices discussed in Example 1. The unique solution of the system of recurrence relations (22) is
Moreover, since this expression is the moments of the symmetric Dirichlet distribution of parameter α, the stationary state is the Dirichlet distribution:
Remark 2.
The limit of the moments (23) is known as the Dirichlet-multinomial distribution up to multiplication of the multinomial coefficient. Renumbering the set by and taking the limit and with , the expression (23) reduces to the form
which is known as the Ewens sampling formula, or the exchangeable partition probability function of the Dirichlet prior process in Bayesian statistics (see, e.g., [12] for an introduction). Karlin and McGregor [13] derived this formula by using a system of recurrence relations based on coalescents mentioned in Remark 1. In this sense, we have found an alternative system of recurrence relations (22) the Formula (24) satisfies based on collisions.
5. Applications
In this section, we present applications of the results developed in previous sections.
5.1. Finding Independent Sets of Graphs
Itoh et al.’s discrete stochastic model described in Section 1 stops when the set of vertices occupied by at least one particle constitutes an independent set of a graph. The model is summarized as the following procedure.
The cardinality of the set of vertices to which at least one particle is assigned decreases, and the set eventually reduces to an independent set of . The integer M is needed to confirm that we cannot choose particles from neighboring vertices. If M is sufficiently large, Algorithm 1 provides an independent set with high probability.
| Algorithm 1 Finding an independent set of a graph |
|
A natural question is how many steps are needed to find an independent set. Answering this question seems hard, but regarding the diffusion satisfying the stochastic differential Equation (1) as an approximation of the procedure of Algorithm 1, we can deduce some rough idea. Because of the scaling in the diffusion limit, the unit time in the diffusion corresponds to the iterations of Steps 3 and 4 of Algorithm 2.
According to the argument of Proposition 1, a sample path of the diffusion starting from a point is absorbed into lower dimensional faces and is eventually absorbed into a face corresponding to an independent set.
Proposition 5.
Let be a set of vertices that is not an independent set of a graph . For a sample path of the diffusion starting from a point , the Markov time
satisfies
where is the edge set of the induced subgraph of consisting of U, is the boundary of , and is a constant depending on x.
Proof.
By the argument in the proof of Theorem 1, we have
while . Choose . □
The author does not find any other property of the Markov time for generic graphs, but the diffusion on a complete graph is an exception; the probability distribution function can be obtained exactly.
Proposition 6.
Let be a complete graph . For a face , , the distribution of the Markov time (25) is represented as
Proof.
The inclusion–exclusion argument shows the following expression:
where is the total mass on and the summations are taken over the totality of distinct indices chosen from . For , an explicit expression can be obtained by solving a backward equation for the diffusion (Equation (4.15) in [14]):
A complete graph can be reduced to by any partition of the vertex set to two vertex sets (the reduction is defined in Section 1). For example, is reducible to consisting of Vertices , where the Vertex is obtained by identifying Vertices 1 and 2. In the same way, is also reducible to and . Therefore, an expression of is obtained by the right side of (27) by replacing with . The inclusion–exclusion argument gives
where both sides are zero if . Substituting the expression obtained by the expression (27) into (26) and collecting terms by using the equality (28), the assertion follows. □
According to Proposition 6, the probability distribution function of the exit time from is asymptotically for large t, where . Let a sequence of vertex sets occupied by at least one particle in Algorithm 2 be denoted by V, , , …, for a vertex . If the exit time for followed the exponential distribution of mean (of course, this is not exactly true), the expectation that the waiting time of a sample path is absorbed into the vertex would be for large r. This rough argument suggests that the expectation of the computation cost of Algorithm 2 would be for a complete graph because an iteration of Steps 3 and 4 can be executed in . Luby’s algorithm for finding an independent set described in Section 1 demands using processors.
5.2. Bayesian Graph Selection
Consider a sampling of particles of size n from the unique stationary state of the adjoint Markov semigroup on induced by the Markov semigroup associated with the diffusion that appeared in Theorem 2 such that particles of a graph are taken from the vertex . We assume the probability of taking a sample does not depend on the order of particles, namely, they are exchangeable. Such probabilities constitute the multinomial distribution, namely, a probability measure on ordered non-negative integer partitions of n as
satisfying . The moment defined by (21) is the expectation of the sample probability up to the multinomial coefficient:
Before proceeding to discuss computational issues, we present a motivating problem in Bayesian statistics. The expected sample probability (29) is a mixture of multinomial distributions of parameters x over the stationary state of the adjoint Markov semigroup . In statistical terminology, the sample probability and the expectation (29) are called the likelihood and the marginal likelihood, respectively, and is called the prior distribution for the parameter .
Suppose we are interested in selecting a graphical model consisting of four vertices from three candidate models: a star graph , a cycle graph , and a complete graph (Figure 3).
Figure 3.
Three candidate graphical models: a star graph , a cycle graph , and a complete graph .
For this purpose, we employ stationary states as the prior distributions. As we have seen in Example 7, the prior distribution for is the Dirichlet distribution, but for the prior distributions for other graphs, closed form expressions of the distribution function are not known. Suppose we have a sample consisting of two particles. If it is , by solving the recurrence relation (22), we obtain the expected sample probabilities under , , and as
respectively. On the other hand, if the sample is , they are
If is small, supports , while does not support . This is reasonable because the vertex set is an independent set of but not an independent set of and . On the other hand, the set is not an independent set of , but it is an independent set of and . The ratio of marginal likelihoods is called the Bayes factor, which is a standard model-selection criterion in Bayesian statistics (see, e.g., Section 6.1 of [15]). If a sample is , the Bayes factor of to or is . Therefore, is supported if is small, while all graphs are equivalent if is large. We do not discuss details of statistical aspects, including how to choose , but it is worthwhile to mention that positive improves the stability of model selection, especially for small samples. In fact, if is small, the Bayes factor drastically changes by adding a sample unit. Suppose we have a sample and take an additional sample unit. If it is , the expected sample probabilities for the sample under , , and are
respectively. The Bayes factor of to is unity, which means that the graphs and are equivalent. This conclusion is quite different from that deduced from the sample . In the limit of a large , by Corollary 3, the expected sample probability of any graph follows the unique limit distribution—the multinomial distribution.
Now let us discuss computation of expected sample probabilities. A closed-form expression of the stationary state of the adjoint Markov semigroup is not known for generic graphs; nevertheless, we can compute the expected sample probabilities of any graph by solving the system of recurrence relations (22). Solving the system becomes prohibitive as the sample size n grows, but the following algorithm, which is a byproduct of Theorem 2, provides an unbiased estimator of the expected sample probability.
By Corollary 4, the output of Algorithm 2 is an unbiased estimator of , which gives the expected sample probability (29) by multiplying the multinomial coefficient.
| Algorithm 2 Estimating the sample probability, or the marginal likelihood |
|
An attractive property of Algorithm 2 as a Markov chain Monte Carlo is that it is a direct sampler; namely, it generates random variables independently and exactly follow the target distribution. In fact, this algorithm can be regarded as a variant of a direct sampling algorithm called coupling from the past (see, e.g., Chapter 25 of [16] for a concise summary). Regarding a sample as being generated from the infinite past and that time is going backward, the time when all particles are erased is the time when the sample path can be regarded as that which came from the infinite past because the sample path does not depend on any events older than the time. We have the following estimate of steps needed to complete the procedure.
Proposition 7.
For a sample of size n, the steps needed to complete Algorithm 2 to obtain an unbiased estimator of the expected sample probability (29) are for large .
Proof.
For a state a satisfying , the probability that a particle is erased at the next step is bounded below:
Therefore, the waiting time to erase a particle is stochastically smaller than the waiting time of an event following the geometric distribution of waiting time . The sum of the waiting times from to is . Steps 4 and 6 demand steps for large . Therefore, the assertion follows. □
We have focused on the diffusion with drift, but moments of the marginal distribution of the diffusion without drift at a given time, i.e., (8), can be computed by an analogue to Algorithm 2. The problem reduces to solving the system of differential equations (9). We omit the discussion, but a similar problem for a complete graph was discussed in [17].
6. Discussion
We have studied diffusions taking value in probability measures on a graph whose masses on each vertex satisfy the stochastic differential equations of the forms (1) and (16) by using their dual Markov chains on integer partitions and on finite integer lattices, respectively. Many problems remain to be solved, especially for (1). First of all, a formal proof of the existence of the semigroup associated with the generator (2) should be established, which demands pathwise uniqueness of the solution of (1). As we have emphasized in the text, some arguments, especially those after Propositions 3 and 6, are rough and restrictive. They could be improved. Stationary states of the Markov semigroup need further studies. A counterpart of Theorem 1.5 of [11] or Theorem 3 on regularity of stationary states could be established by detailed analysis of the diffusion. Further properties of the diffusion such as absorption probabilities into a stationary state and the waiting times are interesting. To obtain explicit expressions of them is challenging, but such expressions would be helpful for further understanding the diffusion.
Two applications of the diffusions are discussed: analysis of an algorithm to find an independent set of a graph, and Bayesian graph selection based on computation of expected sample probability by using coupling from the past. Further applications and targets for modelling may exist.
For the coalescents mentioned in Remark 1, the properties of a “genealogy” of a sample, which is a sample path of the dual Markov chain, are intensively studied because a genealogy itself is used as a stochastic model of DNA sequence variation (see, e.g., [18]). Random graphs such as Figure 1 and Figure 2 are counterparts of such genealogies. Study of the properties of such random graphs would be interesting.
Funding
The author is supported in part by JSPS KAKENHI grant 20K03742 and was supported in part by JST Presto Grant, Japan.
Data Availability Statement
Not applicable.
Acknowledgments
The author thanks Yoshiaki Itoh for introducing their work [1] to him. An earlier version of this work was presented at a workshop on coalescent theory at the Centre de Recherches Mathématiques, University of Montreal, in October 2013.
Conflicts of Interest
The author declares no conflict of interest.
References
- Itoh, Y.; Mallows, C.; Shepp, L. Explicit sufficient invariants for an interacting particle system. J. Appl. Prob. 1998, 35, 633–641. [Google Scholar] [CrossRef]
- Tainaka, K.; Itoh, Y.; Yoshimura, J.; Asami, T. A geographical model of high species diversity. Popul. Ecol. 2006, 48, 113–119. [Google Scholar] [CrossRef]
- Ben-Haim, E.; Krapivsky, P.L.; Redner, S. Bifurcations and patterns in compromise process. Physica 2003, 183, 190–204. [Google Scholar]
- Ethier, S.N.; Kurtz, T.G. Markov Processes: Characterization and Convergence; Wiley: Hoboken, NJ, USA, 1986. [Google Scholar]
- Luby, M. A simple parallel algorithm for the maximal independent set finding problem. SIAM J. Comput. 1986, 15, 1036–1053. [Google Scholar] [CrossRef]
- Ethier, S.N. A class of degenerate diffusion processes occurring in population genetics. Comm. Pure Appl. Math. 1976, 29, 483–493. [Google Scholar] [CrossRef]
- Liggett, T.M. Interacting Particle Systems; Springer: Berlin/Heidelberg, Germany, 1985. [Google Scholar]
- Shiga, T. An interacting system in population genetics. J. Math. Kyoto Univ. 1980, 20, 213–242. [Google Scholar] [CrossRef]
- Shiga, T. An interacting system in population genetics, II. J. Math. Kyoto Univ. 1980, 20, 723–733. [Google Scholar] [CrossRef]
- Mohar, B. The Laplacian Spectrum of Graphs. In Graph Theory, Combinatorics, and Applications; Alavi, Y., Chartrand, G., Oellermann, O.R., Schwenk, A.J., Eds.; Wiley: Hoboken, NJ, USA, 1991; Volume 2, pp. 871–898. [Google Scholar]
- Shiga, T.; Uchiyama, K. Stationary state and their stability of the stepping stone model involving mutation and selection. Prob. Theor. Rel. Fields 1986, 73, 87–117. [Google Scholar] [CrossRef]
- Mano, S. Partitions, Hypergeometric Systems, and Dirichlet Processes in Statistics; Springer: Tokyo, Japan, 2018. [Google Scholar]
- Karlin, S.; McGregor, J. Addendum to paper of W. Ewens. Theor. Popul. Biol. 1972, 3, 113–116. [Google Scholar] [CrossRef]
- Kimura, M. Diffusion models in population genetics. J. Appl. Prob. 1964, 1, 177–232. [Google Scholar] [CrossRef][Green Version]
- Bernardo, J.M.; Smith, A.F.M. Bayesian Theory; Wiley: Chichester, UK, 1994. [Google Scholar]
- Levin, D.A.; Peres, Y. Markov Chains and Mixing Times, 2nd ed.; American Mathematical Society: Providence, RI, USA, 2017. [Google Scholar]
- Mano, S. Duality between the two-locus Wright–Fisher diffusion model and the ancestral process with recombination. J. Appl. Probab. 2013, 50, 256–271. [Google Scholar] [CrossRef][Green Version]
- Durrett, R. Probability Models for DNA Sequence Evolution, 2nd ed.; Springer: New York, NY, USA, 2008. [Google Scholar]
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).


