A measure-on-graph-valued diffusion: a particle system with collisions, and their applications

A diffusion taking value in probability measures on a graph with a vertex set $V$, $\sum_{i\in V}x_i\delta_i$, is studied. The masses on each vertices satisfy the stochastic differential equation of the form $dx_i=\sum_{j\in N(i)}\sqrt{x_ix_j}dB_{ij}$ on the simplex, where $\{B_{ij}\}$ are independent standard Brownian motions with skew symmetry and $N(i)$ is the neighbour of the vertex $i$. A dual Markov chain on integer partitions to the Markov semigroup associated with the diffusion is used to show that the support of an extremal stationary state of the adjoint semigroup is an independent set of the graph. We also investigate the diffusion with a linear drift, which gives a killing of the dual Markov chain on a finite integer lattice. The Markov chain is used to study the unique stationary state of the diffusion, which generalizes the Dirichlet distribution. Two applications of the diffusions are discussed: analysis of an algorithm to find an independent set of a graph, and a Bayesian graph selection based on computation of probability of a sample by using coupling from the past.


introduction
Consider a finite graph G = (V, E) consisting of vertices V = {1, . . . , r}, r ∈ {2, 3, . . .} and edges E. Throughout this paper, a graph is undirected and connected. The neighbour of the vertex i ∈ V is denoted by N(i) := {j ∈ V : j ∼ i}, where j ∼ i means that i and j are adjacent. The degree of the vertex i is denoted by d i := |N(i)|, which is the cardinality of the set N(i). An independent set of G is a subset of V such that no two of which are adjacent. In other words, a set of vertices is independent if and only if it is a clique in the graph complement of G.
If two vertices of a graph G has precisely the same neighbour, throughout this paper, we call the graph obtained by identifying these two vertices with keeping the adjacency a reduced graph of G.
Let P(∆ r−1 ) be the totality of probability measures on the simplex equipped with the topology of weak convergence. Itoh et al. [6] discussed a diffusion taking value in probability measures on a graph G, whose state is identified with a probability measure r i=1 x i (t)δ i and the masses on each vertices {x(t) = (x 1 (t), . . . , x r (t)) ∈ ∆ r−1 , P x : t ≥ 0} ∈ P(∆ r−1 ) starts from x(0) = x and satisfies the stochastic differential equation of the form where {B ij } i∼j∈V are independent standard Brownian motions with skewsymmetry B ij = −B ji . The generator of the diffusion L operates on a function f ∈ C 2 0 (∆ r−1 ) as where σ ii (x) = x i j∈N (i) x j , σ ij (x) = −x i x j , i ∼ j, and σ ij (x) = 0 otherwise, and C 2 0 (∆ r−1 ) is the totality of functions with continuous derivative up to the second order and compact support in ∆ r−1 .
We say a face of the simplex ∆ r−1 corresponds to a set of vertices U ⊂ V if the face is the interior of the convex hall of {e i : i ∈ U} denoted by int(conv{e i : i ∈ U}), where {e 1 , . . . , e r } is the set of standard basis of the vector space R r . If U consists of a single vertex, say e i , int(conv{e i }) should be read as e i . An observation on the diffusion is as follows.
Proposition 1.1. Every point in a face of the simplex ∆ r−1 corresponding to an independent set V I V of a graph G is a fixed point of the stochastic differential equation (1). Namely, is a fixed point.
Proof. For each i ∈ V , x i (t) is a martingale with respect to the natural filtration generated by the diffusion (x(t), P x ). The condition (3) implies x i = 0 for each i / ∈ V I , E(x i (t)) = 0 and thus x i (t) = 0, ∀t ≥ 0 almost surely. Then, for each i ∈ V I , (1) reduces to dx i (t) = 0 and we have x i (t) = x i , ∀t ≥ 0. Therefore, (3) is a fixed point of (1).
Here, n i (s) is the number of particles assigned to the vertex i ∈ V at time s. At each instant of time, two of the N particles are chosen at uniformly random. If particles at i ∼ j vertices are chosen, one of the particles is chosen with equal probabilities and assigned to another vertex. It causes a transition from (i, j) to (i, i) or (j, j) with equal probabilities. This stochastic model seems to have various applications. Tainaka et al. [18] discussed this stochastic model as a model of a speciation process caused by geography. Ben-Haim et al. Apart from an approximation of the discrete stochastic model discussed above, the diffusion (x(t), P x ) seems to appear in various contexts. The diffusion on a complete graph appears as an approximation of a quite different discrete stochastic model, called the Wright-Fisher model in population genetics, which evolves by repetition of multinomial sampling of fixed number of particles (see, e.g., Chapter 10 of [5], for the details). The class of measurevalued diffusions are called Fleming-Viot processes. Such diffusions appear as prior distributions in Bayesian statistics.
As we will see below, the support of an extremal stationary state of the semigroup associated with the diffusion (x(t), P x ) is a face of ∆ r−1 corresponds to an independent set of G. In this sense, the diffusion can be regarded as an independent set finding in graph theory. Some problems related with independent set finding, such as the maximum independent set problem, is known to be NP-hard, so it is believed that there is no efficient algorithm to solve it. Therefore, algorithms to find maximal independent sets are useful to obtain practical solutions. For example, Luby [11] discussed a parallel algorithm for finding maximal independent sets, which was derived from a stochastic algorithm to find a maximal independent set. His algorithm is based on a step to find an independent set based on random permutation executed by O(|E|) processors in time O(1) for large |E|.
Let us assume there exists a strongly continuous Markov semigroup {T t : t ≥ 0} associated with the diffusion (x(t), P x ) governed by the generator (2) such that where C(∆ r−1 ) is the totality of continuous functions on ∆ r−1 , and The existence of such semigroup for complete graphs was proven by Ethier [4]. For the solution of the stochastic differential equation (1), we have Let us denote by {T * t : t ≥ 0} the adjoint semigroup on P(∆ r−1 ) induced by {T t : t ≥ 0}. Consider the totality of all fixed points of {T * t }: We call each element of S a stationary state of {T * t }. A stationary state ν satisfies ν, Lf = Lf (x)ν(dx) = 0, ∀f ∈ C 2 0 (∆ r−1 ).
The set S is non-empty and convex. The totality of the extremal elements of stationary states will be denoted by S ext . Namely, a stationary state ν is uniquely represented as for some p ∈ ∆ s−1 , s ∈ {2, 3, . . .}. In Theorem 3.5 we see that a support of an extremal stationary state of the diffusion (x(t), P x ) is a face of ∆ r−1 corresponding to an independent set of G.
In this paper, we will use the term support in a sloppy sense. Namely, positivity of a stationary state is not assumed otherwise stated. In fact, Proposition 1.1 implies that if the diffusion (x(t), P x ) starts from any point x in int(conv{e i : i ∈ V I }), the stationary state is δ x , or an atom at x. In this situation, we say the support is int(conv{e i : i ∈ V I }). In other words, if a stationary state does not have probability mass anywhere of an open set, then we say the set is not the support of the stationary state. Some examples are as follows.
To obtain an explicit expression for the stationary states is a challenging problem. Itoh et al. [6] successfully obtained an explicit expression for the stationary states for a star graph S 2 , where a star graph S r−1 , r ≥ 3 is a complete bipartite graph K 1,r−1 , and the vertices of S r−1 will be numbered such that 1 ∼ i, i ∈ {2, . . . , r}. A stationary state may be represented as 3 i=1 p i δ e i + p 2,3 ν 2,3 . If we identify the vertices {2, 3}, the star graph S 2 is reduced to a complete graph K 2 . With using arguments for a complete graph in Example 1.2, we know p 1 = x 1 and p 2 + p 2,3 ν 2,3 + p 3 = x 2 + x 3 . Itoh et al [6] obtained an explicit expression for the diffusion starts from x / ∈ {e 1 , e 2 , e 3 , int(conv{e 2 , e 3 })}: by using martingales introduced in Section 2. This result is for a specific graph, but can be applied to other graphs reducible to S 2 . For example, the four-cycle graph C 4 discussed in Example 1.3 can be reduced to S 2 . Explicit expressions for p i , i ∈ {1, 2, 3, 4} are immediately obtained. This paper is organized as follows. In Section 2 the martingales used by Itoh et al. [6] are revisited in a slightly generalized form. An interpretation of the martingales is presented in Section 3. A duality relation between the Markov semigroup associated with the diffusion and a Markov chain on the set of ordered integer partitions is established. The dual Markov chain is studied and used to show that the support of an extremal stationary state of the adjoint semigroup is an independent set of the graph. Section 4 we investigate the diffusion with a linear drift, which gives a killing of the dual Markov chain on a finite integer lattice. The Markov chain is studied and used to study the unique stationary state of the diffusion, which generalizes the Dirichlet distribution. In Section 5 two applications of the diffusions are discussed: analysis of an algorithm to find an independent set of a graph, and a Bayesian graph selection based on computation of probability of a sample by using coupling from the past. Section 6 is devoted to discussion of open problems.

Invariants among moments
For a graph G = (V, E), r = |V |, an element a ∈ N V 0 with |a| := a 1 +· · ·+a r < ∞ will be denoted by a = a 1 e 1 + · · ·+ a r e r . We will use multi-index notation; a monomial i x a i i is simply written as x a . For star graphs S r−1 Itoh et al. [6] noticed the following homogeneous polynomials of arbitrary order n ≥ r: a 2 +···+ar=n,a≥1 n − r + 1 a − 1 n a (cx(t)) a , n a = n! a 2 ! · · · a r ! are martingales, where the sum is taken over all ordered positive integer partitions of n satisfying a 2 + · · · + a r = n with a i ≥ 1, i ∈ {2, . . . , r} and c 2 + · · · + c r = 0 with c i ∈ R, i ∈ {2, . . . , r}. This result is generalized for a generic graph; an example is a reducible graph in which vertices in an independent set can be identified (a reduced graph is defined in Section 1).
V be an independent set of a graph G sharing an adjacent vertex. The homogeneous polynomials of any order n ≥ |V I | + 1: is a martingale with respect to the natural filtration generated by the solution (x(t), P x ) of the stochastic differential equation (1), Proof. Applying Itô's formula to monomials x a = i∈V I x a i i , we have where the vertex j is adjacent to all vertices of V I . Then, where The right hand side of the equation (7) is proportional to and it vanishes because the second summation does not depend on the index i and i∈V I c i = 0.
Proposition 2.1 gives invariants among n-th order moments of the marginal distribution of the solution of the stochastic differential equation (1) at a given time. More precisely, such a moment is represented as Itoh et al. [6] used the invariants to derive the expression (6) for masses on atoms in the star graph S 2 .
Corollary 2.2. Let V I V be an independent set of a graph G sharing an adjacent vertex. For moments of each order n ≥ |V I | + 1, we have i∈V I a i =n,a≥1 A small example is as follows.
Example 2.3. Let G = C 4 , which is the cycle graph consisting of four vertices discussed in Example 1.3. A maximal independent set of C 4 , V I = {2, 4}, shares 1 or 3 as the adjacent vertices. The totality of ordered positive integer partitions of four are (a 2 , a 4 ) = (1, 3), (2, 2), and (3, 1), which correspond to the fourth order moments m e 2 +3e 4 (t), m 2e 2 +2e 4 (t), and m 3e 2 +e 4 (t), respectively. They constitute an invariant: The existence of invariants among same order moments is interesting, but we are also interested in computation of each moments. They can be computed by simple algebra, since each order moments satisfy a system of differential equations: for each a ∈ Π ≥0 n,r , where Π ≥0 n,r is the totality of the ordered positive integer partitions of an integer n with r positive integers: n,r := {a ∈ N r 0 : a 1 + · · · + a r = n}.
However, it is obvious that solving the system (9) becomes prohibitive as the cardinality of the set Π ≥0 n,r grows. Computation of moments via stochastic simulation will be discussed in Section 5.

Dual process on integer partitions
To study diffusions (x(t), P x ) governed by the generator (2), we employ a tool called duality, which is a familiar tool in study of interacting particle systems (see, e.g., Chapter 2 of [9]).
Consider a graph G = (V, E) and let (a(t), P a ), a(0) = a be a continuoustime Markov chain on the set of ordered non-negative integer partitions of n with r = |V |, which is denoted by Π ≥0 n,r , by the rate matrix {R a,b }: where The backward equation for the transition probability P a (a(t) = ·) is We have the following duality relation between the Markov semigroup {T t } and the Markov chain (a(t), P a ).
Lemma 3.1. The Markov semigroup {T t } associated with the generator (2) and the Markov chain (a(t), P a ) with the rate matrix (10) satisfy for each a ∈ Π ≥0 n,r , where the killing rate is Proof. Noting that and the operation (4), we see that g(t, a) = T t x a satisfies the differential equation for each a ∈ Π ≥0 n,r . This is uniquely solved by means of the Feynman-Kac formula and the assertion follows. Since the total number of particles is kept, i.e. |a(t)| = a 1 (t)+· · ·+a r (t) = |a|, ∀t, the killing rate (13) is bounded. The killing rate is not positive definite, however, a key observation is that if the support of a vector a denoted by supp(a) := {i ∈ V : a i > 0} is an independent set of G, then the killing rate is non-positive: k(a) ≤ 0. The converse is not always true.
To illustrate the Markov chain (a(t), P a ), let us ask specific questions. What is the moment of 2e 2 + e 4 for the cycle graph C 4 discussed in Example 1.3? For a chain starts from 2e 2 + e 4 , there are two possible transitions; the one is absorbed into the state e 1 + e 2 + e 4 and the other is absorbed into the state e 2 + e 3 + e 4 , where the rates are unities (see Figure 1). The waiting time for the occurrence of either of these two transitions follows the exponential distribution of rate two. Since k(a) = −2 and k(e 1 +e 2 +e 4 ) = k(e 2 +e 3 +e 4 ) = 2, the right hand side of the duality relation (12) can be computed as where s is the time that one of the two possible transitions occurs. The first term corresponds to the case that no transition occurs until time t. Let us call a transition event collision.
Remark 3.2. Analogous dual Markov chain of the diffusion approximation of a kind of Wright-Fisher model was effectively used by Shiga [15,16], where a transition a → a − e i occurs with the same rate as in (10). Such a transition event is called "coalescent". In contrast to a collision, the total number of particles decreases by a coalescent.
Here, we have a simple observation about invariants among moments discussed in Section 2. If a chain (a(t), P a ) starts from a state a such that supp(a) is a maximal independent set V I of the graph G, then the killing rate is non-positive and the duality relation (12) is reduced to k(a(s))ds ; collisions occur . Corollary 2.2 implies cancellation of the second term among moments. Moreover, considering a case that the diffusion (x(t), P x ) starts from a point x in the face corresponding to V I , by Proposition 1.1, we know that after a collision supp(a(t)) must contain a vertex which is not contained in V I .
Let us ask another question. What is the moment of a = 2e 1 + e 2 + e 3 for the star graph S 2 discussed in Section 1? Some consideration reveals that a chain (a(t), P a ) starts from a never be absorbed, and transitions occur among the three states: a, b = e 1 + 2e 2 + e 3 , and c = e 1 + e 2 + 2e 3 . Since k(a) = 2 and k(b) = k(c) = 1, the duality relation (12) gives where 1(A) is 1 if the argument A is true and zero otherwise. Solving the backward equation (11), we immediately obtain However, computation of the right hand side of the equation (14) seems not easy, because the expectation depends on a sample path of the chain (a(s) : 0 ≤ s ≤ t). Nevertheless, the moments can be obtained easily by solving the system of differential equations (9). In fact, we have The observations above lead to the following proposition on the fate of the Markov chain (a(t), P a ). (ii) If a chain starts from a state a satisfying n = |a| > r, then the transition probability P a (a(t) = ·) converges to the uniform distribution on the set of ordered positive integer partitions Proof. (i) A state a is absorbing if and only if the row vector of the rate matrix (10) Otherwise, the vertex j should be connected with at least one vertex in V \ V 0 (a(t)), say l. The transition a(t) → a(t + ) = a(t) − e j + e l makes a l (t + ) ≥ 2. If the vertex l is connected with a vertex in V 0 (a(t)), the assertion is shown. Otherwise, the vertex l should be connected with at least one vertex in V \ {V 0 (a(t)) ∪ j}. The proof is completed by repeating this procedure until we reach a vertex in V \ V 0 (a(t)) which is connected with a vertex in V 0 (a(t)).
(ii) If a chain starts from a state in the set Π ≥0 n,r \ Π n,r , the argument for (i), i.e., z(t) is a death process, shows that in a finite time the chain reaches a state, say a, in the set Π n,r and never exits from Π n,r . For such a case, consider restarting the chain from the state a. For simplicity, we consider the case of n = r + 1. Then, a state a(t) can be identified with the unique vertex i ∈ V satisfying a i (t) = 2. Since we are considering a connected graph, there . . , s. Since all of the transitions occur with rate unity, the sample path has a positive probability. Hence, the Markov chain is irreducible and ergodic, and there exists the unique stationary distribution on the set Π n,r . Since the uniform distribution on Π n,r satisfies the backward equation (11), it is the stationary distribution. Cases of n > r + 1 can be shown in a similar manner by showing that up to n−r particles can be moved from a vertex to another vertex. Proposition 3.3 shows that if the Markov chain (a(t), P a ) starts from a state a satisfying |a| > |V |, convergence of the chain can be divided to two phases: (1) exits from the set Π ≥0 n,r \ Π n,r , and (2) converges to the uniform distribution on the set Π n,r . The largest eigenvalue of the rate matrix is zero. To consider the mixing, we have to know the second largest eigenvalue, say λ 2 . By the spectral decomposition of the transition probability, the mixing time of the phase 2, or the infimum of t satisfying max a∈Πn,r P a (a(t) = ·) − |Π n,r | −1 TV < ǫ for ǫ > 0 is less than cλ −1 2 log ǫ −1 for some constant c > 0, where µ − ν TV is the total variation distance between probability measures µ and ν.
Example 3.4. Consider the case that n = r + 1 (see the proof of Proposition 3.3 (ii)). It can be shown that the rate matrix {R a,b } reduces to the negative of the Laplacian matrix of G, whose second largest eigenvalue is called algebraic connectivity of G. For a connected graph, it is known that the largest eigenvalue is zero and the second largest eigenvalue is bounded below by 4/(rdiam(G)) [14], where diam(G) is the diameter of G, or the maximum of the shortest path lengths between any pair of vertices in G. For the star graph S 2 and the forth order monomials discussed above, the rate matrix is the negative of the Laplacian matrix: and the eigenvalues are 0, −1(< −2/3), and −3, where r = 3, diam(S 2 ) = 2.
The two eigenvalues appear in the transition probabilities (15). For a generic graph with large r, the mixing time is O(rdiam(G)) log ǫ −1 .
Assessment of the phase 1 seems harder. The death process z(t) = |V 0 (a(t))|, t > 0 with z(0) ≤ r − 1 is not a Markov process. The death rate is bounded below: To obtain a rough estimate of the right hand side, let us suppose the state a(t) follows the uniform distribution on the set of ordered positive integer partitions Π n,r−z , where Then, the expectation of the lower bound is (n − r + z) r−z+2 /(r − z) 3 , where (n) i = n(n + 1) · · · (n + i − 1). When n is large, the dominant contribution to the expectation of the waiting time for the exit comes from the period {t : z(t) = z(0)}, and it is O(n z(0)−r−2 ). Hence, the expectation of the waiting time for the exit would be O(n −3 ).
Shiga [15,16] and Shiga and Uchiyama [17] studied structures of extremal stationary states of the diffusion approximation of a kind of Wright-Fisher model in [0, 1] S for a countable set S by using its dual Markov chain. Extremal states of the adjoint Markov semigroup {T * t } on P(∆ r−1 ) induced by {T t } associated with the diffusion (x(t), P x ) can be studied by using the dual Markov chain (a(t), P a ). Note that positivity of a stationary state is not assumed, as explained in Section 1. Proof. Consider a Markov chain (a(t), P a ) with rate matrix (10) starts from a state a(0) = a ∈ {0, 1} V . According to Proposition 3.3 (i), such a is an absorbing state and the chain stays there. Lemma 3.1 gives A stationary state ν ∈ P(∆ r−1 ) satisfies ν, T t x a = T * t ν, x a = ν, x a for any x a . If a ∈ {0, 1} V , this condition reduces to Therefore, if supp(a) is not an independent set, then ν, x a = 0. Since we are considering a connected graph G, V is not an independent set of G. The condition reduces to ν, x 1 · · · x r = 0. Since the condition ν, x 1 · · · x r = 0 excludes int(conv{e i : i ∈ V }) from the support of ν. Suppose there exists a vertex j 1 such that V (1) = V \{j 1 } is still not an independent set. Set a such that a i = 1 if i ∈ V (1) and a i = 0 otherwise for each i ∈ {1, . . . , r}. Since the condition ν, x a = 0 excludes int(conv{e i : i ∈ V (1) }) from the support of ν. Repeating of this procedure yields an independent set V I = V \{j 1 , . . . , j s } for some s ∈ N, and the face int(conv{e i : i ∈ V I }) is not excluded from the support of ν.
The steps in the above proof appear in the following example. A direct consequence of Theorem 3.5 on the moments is as follows.
Corollary 3.7. For each n ∈ N, if the limit of an n-th order moments of the diffusion (x(t), P x ) on a graph G is positive, namely,

Diffusion with a linear drift
In this section we consider the diffusion (x(t),P x ) taking value in probability measures on a graph G = (V, E), r = |V | satisfying the following stochastic differential equation with a linear drift: for α ∈ R >0 . The drift term, α(1−rx i )dt/2, gives a killing of the dual process with a linear rate. As shown below, behaviors of the diffusion and the dual Markov chain are significantly different from those without drift discussed in previous sections. In Itoh et al.'s discrete stochastic model described in Section 1, this drift corresponds to adding the following dynamics: at each instant of time, one of N particles is chosen at uniformly random and assigned to another vertex chosen at uniformly random with rate α(r−1)/(N −1). In the Wright-Fisher model, this drift corresponds to a mutation mechanism [5].
Let (ã(t),P a ) be a continuous-time Markov chain on a finite integer lattice, or the set of non-negative integers −R a,aPa (ã(t)).
The following duality relation between the Markov semigroup associated with the diffusion (x(t),P x ) denoted by {T t : t ≥ 0}, and the Markov chain (ã(t),P a ) can be shown in the same manner as Lemma 3.1.
for each a ∈ I, where the killing rate is In contrast to the rate matrix {R a,b } in (10), particles are erased and it makes the total number of particles |ã(t)| =ã 1 (t) + · · · +ã r (t) decreases. It is clear from the rate matrix (17) that 0 is the unique absorbing state. Proposition 4.2. Let τ := inf{t > 0 :ã(t) = 0}, which is a Markov time with respect to the Markov chain (ã(t),P a ) with the rate matrix (17). Then, Proof. Sinceã(t) = 0 if and only if |ã(t)| = 0, we consider the Markov chain of the cardinality |ã(t)|. According to the rate matrix (17), it is a linear death process with rate α|ã(t)|/2. Noting that τ is the convolution of exponential random variables of rates αi/2, i ∈ {1, . . . , |a|}, we have the assertion.
To illustrate the Markov chain (ã(t),P a ), let us ask a specific question. What is the moment of a = e 1 + e 2 from the cycle graph C 4 ? For a chain starts from a, there are four possible sample paths ( Figure 2): (i) No particle is erased; (ii) The particle 1 is erased but the particle 2 survives; (iii) The particle 2 is erased but the particle 1 survives; (iv) Both particles are erased.
The waiting time of either of two particles is erased follows the exponential distribution of rate α. Sincek(a) = 1 + 3α andk(e 1 ) =k(e 2 ) = 3α/2, the right hand side of the duality relation (18) can be computed as where s > 0 and u > s are the times that a particle is erased. The stationary state of the adjoint Markov semigroup {T * t } on P(∆ r−1 ) induced by {T t } consists of the unique probability measure. (ã(s))ds ; t ≤ τ for all a ∈ I. Since lim t→∞ δ x ,T t x a = lim t→∞ T * t δ x , x a for each x ∈ ∆ r−1 , there exists a unique probability measure ν α satisfying lim t→∞Tt f = ν α , f for all f ∈ C(∆ r−1 ).
Moreover, the stationary state ν α has a continuous and strictly positive density.
Theorem 4.5. For the adjoint Markov semigroup {T * t }, the unique stationary state ν α ∈ P(∆ r−1 ) is absolutely continuous with respect to the Lebesgue measure on ∆ r−1 and admits a probability density that is strictly positive in int(∆ r−1 ) and of C ∞ (∆ r−1 )-class.
We next show that the density p α is strictly positive in int(∆ r−1 ). Consider an approximation of p α (x) by polynomials: . Suppose there exist a pointx ∈ int(∆ r−1 ) satisfying p α (x) = 0. Since the monomialx i is strictly positive, for any small positive constants ǫ 1 and ǫ 2 there exist N such that is covered by open balls: for all n > N. Since p α is smooth, for every point x ∈ int(∆ r−1 ) we can find a ball containing x and p α (x) < ǫ 1 + cǫ 2 for some constant c. This implies p α (x) = 0, ∀x ∈ int(∆ r−1 ), but it contradicts to the fact that ν α , x a > 0, ∀a ∈ I followed by the expression (20), because the killing ratek(ã) is bounded and the Markov time satisfiesP(τ < ∞) = 1 by Proposition 4.2.
An immediately consequence is the following corollary, which is an analogous result to Corollary 3.7.
The moments of the stationary state can be obtained by the condition for the stationary state (5). It gives a system of recurrence relations: for each a ∈ I with the boundary condition m 0 = 1. In contrast to the system of ordinary differential equations (9), this system is not closed among the moments of the same order. In prior to solve the system for the moment of a given monomial a, we have to solve the systems for the moments of lower orders than a. Therefore, it seems a formidable task to solve the system (22). The diffusion on a complete graph is an exception.
Example 4.7. Let G = K r , which is the complete graph consisting of r vertices discussed in Example 1.2. The unique solution of the system of recurrence relations (22) is Moreover, since this expression is the moments of the symmetric Dirichlet distribution of parameter α, the stationary state is the Dirichlet distribution: Remark 4.8. The limit of the moments (23) is known as the Dirichlet-multinomial distribution up to multiplication of the multinomial coefficient. Renumbering the set supp(a) by {1, . . . , l} and taking the limit α → 0 and r → ∞ with rα = θ > 0, the expression (23) reduces to the form which is known as the Ewens sampling formula, or the exchangeable partition probability function of the Dirichlet prior process in Bayesian statistics (see, e.g., [13] for an introduction). Karlin and McGregor [7] derived this formula by using a system of recurrence relations based on coalescents mentioned in Remark 3.2. In this sense, we have found an alternative system of recurrence relations (22) the formula (24) satisfies based on collisions.

Applications
In this section, we present applications of developed results in previous sections.

Finding independent sets of graphs
Itoh et al.'s discrete stochastic model described in Section 1 stops when the set of vertices occupied by at least one particle constitutes an independent set of a graph. The model is summarized as the following procedure. Output: A candidate of an independent set of G.
Step 1: Assign particles to vertices such that at least one particle is assigned to each vertex.
Step 3: Choose two distinct particles uniformly random. Let i ∈ V and j ∈ V be the vertices to which the particles are assigned.
Step 4: If i ∼ j, then assign both particles to i or j with probability 1/2, and go to Step 2. Otherwise, c ← c + 1.
Step 5: If c < M, go to Step 3.
Step 6: Output the list of vertices to which at least one particle is assigned.
The cardinality of the set of vertices to which at least one particle is assigned decreases, and the set eventually reduces to an independent set of G. The integer M is needed to confirm that we cannot choose particles from neighboring vertices. If M is sufficiently large, Algorithm 5.1 provides an independent set with high probability.
A natural question is how many steps is needed to find an independent set. To answer this question seems hard, but regarding the diffusion satisfying the stochastic differential equation (1) as an approximation of the procedure of Algorithm 5.1, we can deduce some rough idea. Because of the scaling in the diffusion limit, the unit time in the diffusion corresponds to the N(N − 1)/2 iterations of Steps 3 and 4 of Algorithm 5.4.
According to the argument of Proposition 1.1, a sample path of the diffusion (x(t), P x ) starts form a point x ∈ int(∆ r−1 ) is absorbed into lower dimensional faces and eventually absorbed into a face corresponding to an independent set. Proposition 5.2. Let U ⊂ V be a set of vertices which is not an independent set of a graph G. For a sample path of the diffusion (x(t), P x ) starts from a point x ∈ int(conv{e i : i ∈ U}), the Markov time where E U is the edge set of the induced subgraph of G consisting of U, Bd(conv{e i : i ∈ U}) is the boundary of conv{e i : i ∈ U}, and c x ∈ (0, 1] is a constant depending on x.
Proof. By the argument in the proof of Theorem 3.5, we have The author does not find any other property of the Markov time τ U for generic graphs, but the diffusion on a complete graph is an exception; the probability distribution function can be obtained exactly.
Proof. The inclusion-exclusion argument shows the following expression: A complete graph can be reduced to K 2 by any partition of the vertex set to two vertex sets (the reduction is defined in Section 1). For example, K 3 is reducible to K 2 consisting of vertices {1 + 2, 3}, where the vertex 1 + 2 is obtained by identifying vertices 1 and 2. In the same way, K 3 is also reducible to {1 + 3, 2} and {2 + 3, 1}. Therefore, an expression ofp i 1 ,...,iu (t) is obtained by the right hand side of (27) with replacing x 1 by x i 1 + · · · + x iu . The inclusion-exclusion argument gives where the both sides are zero if l < s. Substituting the expression obtained by the expression (27) into (26) and collecting terms by using the equality (28), the assertion follows.
According to Proposition 5.3, the probability distribution function of exit time from int(∆ s−1 ) is asymptotically 1 − (2s − 1)!!2 s−1 xe −s(s−1)t/2 for large t, where (2s − 1)!! = (2s − 1)(2s − 3) · · · 1. Let a sequence of vertex sets occupied by at least one particle in Algorithm 5.4 be denoted by V , U (r−1) , U (r−2) , . . ., U (1) = {i} for a vertex i ∈ V . If the exit time for U (s) followed the exponential distribution of mean |E U (s) | = s(s − 1)/2 (of course, not exactly true), the expectation of the waiting time of a sample path is absorbed into the vertex would be 2(1−r −1 ) for large r. This rough argument suggests that the expectation of the computation cost of Algorithm 5.4 would be O(rN 2 ) for a complete graph, because an iteration of Steps 3 and 4 can be executed in O(r). Luby's algorithm for finding an independent set described in Section 1 demands O(1) with using O(r 2 ) processors.

Bayesian graph selection
Consider sampling of particles of size n from the unique stationary state of the adjoint Markov semigroup {T * } on P(∆ r−1 ) induced by the Markov semigroup {T t } associated with the diffusion (x(t),P x ) appeared in Theorem 4.3 such that a i particles of a graph G = (V, E) are taken from the vertex i ∈ V . We assume the probability of taking a sample does not depend on the order of particles, namely, exchangeable. Such probabilities constitute the multinomial distribution, namely, a probability measure on ordered non-negative integer partitions of n as q a := n a x a , a ∈ Π ≥0 n,r , r = |V |, satisfying a q a (t) = 1. The moment m a defined by (21) is the expectation of the sample probability q a up to the multinomial coefficient: Figure 3: Three candidate graphical models: a star graph S 3 , a cycle graph C 4 , and a complete graph K 4 .
Before proceeding to discuss computational issue, we present a motivating problem in Bayesian statistics. The expected sample probability (29) is a mixture of multinomial distributions of parameters x over the stationary state ν α of the adjoint Markov semigroup {T * }. In statistical terminology, the sample probability q a and the expectation (29) are called the likelihood and the marginal likelihood, respectively, and ν α is called the prior distribution for the parameter x ∈ ∆ r−1 .
Suppose we are interested in selecting a graphical model consisting of four vertices from three candidate models: a star graph S 3 , a cycle graph C 4 , and a complete graph K 4 (Figure 3). For this purpose, we employ stationary states ν α as the prior distributions. As we have seen in Example 4.7, the prior distribution for K 4 is the Dirichlet distribution, but for the prior distributions for other graphs closed form expressions of the distribution function are not known. Suppose we have a sample consisting of two particles. If it is e 1 + e 3 , by solving the recurrence relation (22), we obtain the expected sample probabilities under S 3 , C 4 , and K 4 as respectively. On the other hand, if the sample is e 2 + e 4 , they are If α is small, e 1 + e 3 supports C 4 , while e 2 + e 4 does not support K 4 . This is reasonable, because the vertices set {1, 3} is an independent set of C 4 , but not an independent set of S 3 and K 4 . On the other hand, the set {2, 4} is not an independent set for K 4 , but an independent set of S 3 and C 4 . The ratio of marginal likelihoods is called the Bayes factor, which is a standard model selection criterion in Bayesian statistics (see, e.g., Section 6.1 of [2]). If a sample is e 1 + e 3 , the Bayes factor of C 4 to S 3 or K 4 are 1 + 1/(4α). Therefore, C 4 is supported if α is small, while all graphs are equivalent if α is large. We do not discuss details of statistical aspects including how to choose α, but it is worthwhile to mention that positive α improves stability of model selection especially for small sample. In fact, if α is small, the Bayes factor drastically changes by adding a sample unit. Suppose we have a sample e 1 + e 3 and take an additional sample unit. If it is e 2 , the expected sample probabilities for the sample e 1 + e 2 + e 3 under S 3 , C 4 , and K 4 are 3α(1 + 12α) 32(1 + 3α)(1 + 4α) , 3α(1 + 12α) 32(1 + 3α)(1 + 4α) , and 3α 2 4(1 + 2α)(1 + 4α) , respectively. The Bayes factor of C 4 to S 3 is unity, which means that the graphs C 4 and S 3 are equivalent. This conclusion is quite different from that deduced from the sample e 1 + e 3 . In the limit of large α, by Corollary 4.4, the expected sample probability of any graph follows the unique limit distribution, the multinomial distribution. Now let us discuss computation of expected sample probabilities. A closed form expression of the stationary state ν α of the adjoint Markov semigroup {T * } is not known for generic graphs, nevertheless, we can compute the expected sample probabilities of any graph by solving the system of recurrence relations (22). Solving the system becomes prohibitive as the sample size n grows, but following algorithm, which is a byproduct of Theorem 4.3, provides an unbiased estimator of the expected sample probability. Input: A sample taken from a graph G = (V, E) consisting of a vector a ∈ Π n,r , r = |V | and the parameter value α > 0.
Step 2: Get a random number T following the exponential distribution of mean r i=1 d i a i (a i − 1)/2 + α|a|/2.
Step 5: Get a random number U following the uniform distribution on [0, 1].
Step 6: If U falls in the j-th interval of a i (a i − 1), let a i ← a i − 1, a j ← a j + 1. Otherwise, if U falls in the interval of αa i , let a i ← a i − 1.
By Corollary 4.6, the output of Algorithm 5.4 is an unbiased estimator of m a (α), which gives the expected sample probability (29) by multiplying the multinomial coefficient.
An attractive property of Algorithm 5.4 as a Markov chain Monte Carlo is that it is a direct sampler, namely, it generates random variables independently and exactly follow the target distribution. In fact, this algorithm can be regarded as a variant of a direct sampling algorithm called coupling from the past (see, e.g., Chapter 25 of [10] for a concise summary). Regarding a sample as being generated from the infinite past and the time is going backward, the time when all particles are erased is the time when the sample path can be regarded as that comes from infinite past, because the sample path does not depend on any events older than the time. We have the following estimate of steps needed to complete the procedure.
Proposition 5.5. For a sample of size n, the expected number of steps needed to complete Algorithm 5.4 to obtain an unbiased estimator of the expected sample probability (29) is O{|E|(n + n(n − 1)/(2α))} for large |E|.
Proof. For a state a satisfying |a| = m, the probability that a particle is erased at the next step is bounded below: Therefore, the waiting time of an erase of a particle is stochastically smaller than the waiting time of an event following the geometric distribution of waiting time (m + α − 1)/α. Sum of the waiting times from m = 1 to m = n is O(n + n(n − 1)/(2α)). Steps 4 and 6 demand d 1 + · · · + d r + r = 2|E| + r = O(|E|) steps for large |E|. Therefore, the assertion follows.
We have focused on the diffusion with drift, but moments of the marginal distribution of the diffusion without drift at a given time, i.e., (8), can be computed by an analog to Algorithm 5.4. The problem reduces to solving the system of differential equations (9). We omit the discussion, but a similar problem for a complete graph K 4 was discussed in [12].

Discussion
We have studied diffusions taking value in probability measures on a graph whose masses on each vertices satisfy the stochastic differential equations of the forms (1) and (16) by using their dual Markov chains on integer partitions and on finite integer lattices, respectively. Many problems remain to be solved, especially for (1). First of all, a formal proof of the existence of the semigroup {T t } associated with the generator (2) should be established, which demands pathwise uniqueness of the solution of (1). As we have emphasized in the text, some arguments, especially those after Propositions 3.3 and 5.3, are rough and restrictive. They could be improved. Stationary states of the Markov semigroup need further studies. A counter part of Theorem 1.5 of [17] or Theorem 4.5 on regularity of stationary states could be established by detailed analysis of the diffusion. Further properties of the diffusion, such as absorption probabilities into a stationary state and the waiting times are interesting. To obtain explicit expressions of them are challenging, but such expressions would be helpful for further understanding of the diffusion.
Two applications of the diffusions are discussed: analysis of an algorithm to find an independent set of a graph, and a Bayesian graph selection based on computation of expected sample probability by using coupling from the past. Further applications and targets for modeling may exist.
For coalescents mentioned in Remark 3.2, properties of a "genealogy" of a sample, which is a sample path of the dual Markov chain, are intensively studied because a genealogy itself is used as a stochastic model of DNA sequence variation (see, e.g., [3]). Random graphs such as Figures 1 and 2 are counterparts of such genealogies. Study of properties of such random graphs would be interesting.