Undirected Structural Markov Property for Bayesian Model Determination

: This paper generalizes the structural Markov properties for undirected decomposable graphs to arbitrary ones. This helps us to exploit the conditional independence properties of joint prior laws to analyze and compare multiple graphical structures, while being able to take advantage of the common conditional independence constraints. This work provides a theoretical support for full Bayesian posterior updating about the structure of a graph using data from a certain distribution. We further investigate the ratio of graph law so as to simplify the acceptance probability of the Metropolis–Hastings sampling algorithms.


Introduction
A probabilistic graphical model (PGM) or a structured probabilistic model (SPM) is a statistical model that consists of a graph and a distribution family for which the graph encodes the conditional independence information between random variables. Such models always associate with independence models, arise naturally in multivariate analysis and can provide certain versatility and convenience in analyzing complex data with large scales, while independence models are the sets of conditional independence constraints encoded by graphs via the global Markov property.
It is known that different classes of graphs with different interpretations of independence have been developed in the past decades, and the reader can refer to [1][2][3][4] for details. One of the most important classes of graphs in graphical models is undirected graphs (UGs). Their corresponding Markov models are often known as undirected graphical models or Markov networks [1,2]. These models have been found to have many applications in a wide range of areas such as econometrics, medical science, artificial intelligence [5][6][7] and so on. Our research in this paper is related to the work in the area of the structure determination of these models with the Bayesian method.
The main objective of Bayesian structure learning is to learn the structure of a graph from data. Meanwhile, Bayesian structure learning requires a clear illustration of a prior distribution about graphical structures, which is termed as a graph law. Statisticians have proposed some approaches to calculate the prior law of a graph. The simplest graph law is the uniform distribution in [8]. Additionally, the Erdős-Rényi random graph model is also used to indicate the graph law in [9]. Furthermore, a characterization of graph law with the form of exponential family is proposed by [10]. However, how to simplify this prior law is a significant task for us, especially in the posterior inference of graphical structures. In view of this, the structural Markov property is first proposed for the purpose of characterizing the conditional independence of the structure of a graph. The structural Markov properties require that the structures of distinct components of graphs are conditionally independent given the existence of a separating component; see [10]. These properties reflect the conditional independence at the structural level. It has been proved that a graph law is structural Markov if and only if it is a member of the clique exponential family given the support condition as the set of decomposable undirected graphs; see [10]. Further, a weaker support condition of equivalent characterization for graph laws is given via closure operation of graphical structures in [11].
Indeed, the structural Markov property is an extension of the hyper Markov property, which was proposed in [12] and reflects the global Markov property at the parameter level. These hyper Markov properties are used to describe the conditional independence properties of a distribution of random variables or statistical quantities in graphical models. The hyper Markov laws arise naturally as sampling distributions of maximum likelihood estimators and as prior or posterior distributions in Bayesian inference.
Recently, a weaker version of the structural Markov properties for decomposable graphs was introduced in [13], where the authors provided an analogous clique-separator factorization for the graph law. These weakly structural Markov properties require that the separator is complete. It has been shown that this provides a more flexible family of graph prior laws to use in full Bayesian posterior updating.
It should be pointed out that all the work in [8,10,13] only focuses on decomposable graphical models. However, based on conditional independence and graphical separation, the structural Markov properties might be extended to non-decomposable undirected graphical models. The aim of this paper is filling this gap in the field of graphical models. Further, we focus on a full Bayesian method for the posterior updating of graph laws via the observed data from a certain distribution, and we also prove that this full Bayesian posterior of graph law is feasible and reasonable. Finally, as examples, we illustrate our theory with detailed investigations of two significant cases based on the graphical Gaussian models and the multinomial models, respectively.
The outline of this paper is organized as follows. In Section 2, we introduce the terminologies and conceptions used in this paper. Section 3 first investigates the structural Markov properties for non-decomposable graphs, and then exploits the joint prior laws of a random sample distribution for full Bayesian inference. Section 4 gives two examples such as the inverse Wishart distribution and the Dirichlet distribution to study the posterior updating of graph laws in details. Further, we discuss some details about the computation for the structural Markov graph laws in Section 5. Finally, in Section 6, we give the conclusion of this paper.

Preliminaries
For terms and symbols, we follow the references [10,12] as many theoretical frameworks of this paper are constructed and developed based on them. Several concrete notions and terminologies used in this paper will be given in the following for clarity and consistency.

Graphical Terminologies and Notation
is an unordered pair. A graph G is said to be an undirected graph if all its edges are undirected. Unless otherwise specified, here G is always assumed to be undirected, simple and connected throughout the paper.
For A ⊆ V(G), an induced subgraph of G on A will be denoted by All subgraphs in this paper are induced subgraphs. A is complete (or a clique) if any two different vertices u, v ∈ A are adjacent, i.e., (u, v) ∈ E(G). A graph is a clique if its vertex set is a clique. For A, B ⊆ V(G), a clique G A is a maximal clique if G B is incomplete for any superset B ⊃ A. Two vertices u and v are considered to be neighbors if (u, v) ∈ E(G). For A ⊆ V(G), the boundary bd(A) is the set of vertices in V(G) \ A that are neighbors of vertices in A. G can be collapsible onto A if every connected component of V(G) \ A has a complete boundary in G.
For any subsets A, B and C of V(G), we say that C separates A from B, and write A |= B|C[G], if any path in G between some u ∈ A and v ∈ B contains a vertex in C. Usually, we call C a separator of A and B. Separators that are cliques are called clique separators.
For any disjoint subsets A, B and S of V(G), we say (A, B, S) forms a decomposition if , and (iii) S is a clique separator in G. A decomposition (A, B, S) is said to be proper if the sets A ∪ S and B ∪ S are both proper subsets of V(G). Definition 1 ([14]). Let G = (V, E) be an undirected graph. A graph G is reducible if its vertex set contains a clique separator, otherwise G is said to be prime. E.g., G is prime if G is a clique, while G is reducible if G is a disconnected graph. An induced subgraph G U is a maximal prime subgraph of G if it satisfies In Figure 1, it is easy to find that G 1 is prime since there is no clique separator in G 1 . However, G 2 is reducible because of a clique separator S = {a, c} in G 2 . In particular, G is decomposable if G A∪S and G B∪S are complete, or they are both decomposable subgraphs of G. Note that the prime decomposition of arbitrary undirected graphs is a generalization of that of chordal graphs. For instance, in Figure 2, G is a nondecomposable undirected graph with V(G) = {a, b, c, d, e}, which involves two maximal prime subgraphs G U 1 and G U 2 , with U 1 = {a, b, c} and U 2 = {b, c, d, e}, respectively, and a clique separator S = U 1 ∩ U 2 = {b, c}. It is obvious that ({a}, {d, e}, {b, c}) forms a prime decomposition of G. Additionally, we find that G U 1 is complete since all its pairs of vertices are joined, while G U 2 is incomplete because the vertices between b and d, or c and e, are not joined. It is worthwhile to point out that all the maximal prime subgraphs of an undirected graph can form a perfect sequence in a certain way. If there exists a proper decomposition of an undirected graph G, then G admits a perfect sequence (U 1 , U 2 , . . . , U k ) of maximal prime subgraphs, so that for each j = 2, . . . , k, there exists some h ∈ {1, 2, . . . , j − 1}, and we have where S j are clique separators actually. Specifically, G is decomposable if its all maximal prime subgraphs are complete (cliques).
In a PGM, a vertex v denotes a random variable X v , which takes values in a space X v . Let X = X V(G) = (X v ) v∈V(G) be a p-dimensional random vector on some product space ∏ v∈V(G) X v with P or θ representing its distribution. All the concerned distributions in the present paper are assumed to be positive and closed under marginalization and conditioning with respect to the type of a joint distribution family. For the sake of simplicity, we use P to represent the set of all positive distributions over X. For A, B ⊆ V(G), θ A will denote the marginal distribution of X A and θ B|A the conditional distribution of X B given Let U be the set of undirected graphs with fixed vertex set V(G). A probability distribution of a random graph G, which takes values in U, is said to be a law, denoted by G. Further, define U(A, B, S) to be the set of undirected graphs for which (A, B, S) is a prime decomposition.

Independence Model and Collapsibility
Given a finite set N, for A, B, C ⊆ N, an independence model, denoted by I, is the set of triplets of the form A, B|C , which are termed as conditional independence statements. A graphical independence model is an independence model induced by a graph. For a graph G ∈ U, the graphical independence model of G can be defined as Obviously, I(G) is the set of triples A, B|C , encoding its global Markov property over G.
It should be pointed out that the conditional independence of a statistical model in [15,16] shares the same properties of graph separation in [2], i.e., for a graphical independence model I(G), it has the following properties: 1.
for all A, B ⊆ V(G), A, B|A ∈ I(G), A, B|B ∈ I(G) and A, B|A ∩ B ∈ I(G); 2.
if A, B|C ∈ I(G), and U ⊆ A, then U, B|C ∈ I(G); 4.
if A, B|C ∈ I(G), and U ⊆ A, then A, B|C ∪ U ∈ I(G); 5.
if A, B|C ∈ I(G), and A, W|B ∪ C ∈ I(G), then A, B ∪ W|C ∈ I(G).
In particular, the following property holds when A, B, C are disjoint.
If A, B|C ∈ I(G) and A, C|B ∈ I(G), then A, B ∪ C|∅ ∈ I(G). Further, a graphical independence model I(G) has a natural projection operation on CI-collapsibility reflects the consistence of conditional independence relations induced by G D and those induced by G, but constrained on D.
We say a distribution P is Markov with respect to G if for A, B, C ⊆ V(G), it holds that represents the assertion that X A is independent of X B given X C under P.
In order to ensure that various distributions and those of statistical quantities are Markov with respect to G, we now are in a position to review the graphical models within the framework of undirected graphs. A graphical model, denoted by P (G), is a statistical model such that For the Markov distribution family P (G), we say that it is faithful to G if there exists a distribution P * ∈ P (G) such that I(P * ) = I(G), where All the graphical models concerned throughout this paper are assumed to be faithful to G. Such an assumption is called "Faithfulness Assumption" [17]. In fact, this assumption is broad and mild since Gaussian distribution families and multinomial distribution families satisfy the faithfulness assumption.
Moreover, a statistical model P (G) also admits a natural projection operation on D ⊆ V(G), denoted by P (G) D , which is defined as follows: Generally, P (G) D is not equal to P (G D ), but it is obviously shown that P (G) D ⊇ P (G D ).

Definition 4 (M-collapsibility). Let G be a fixed undirected graph in
M-collapsibility indicates that the marginal distribution family is identical to the distribution family induced by G D . Theorem 1. Let G be a fixed undirected graph in U and D ⊆ V(G). Then, the following statements are equivalent.

1.
G is graphical collapsible onto D;

Proof. See Appendix A.
Let H j = j i=1 U i denote the histories set for each j = 1, 2, . . . , k. By Theorem 1, we can obtain the following result. Proposition 1. Let G be a fixed graph in U and G has a perfect sequence (U 1 , U 2 , . . . , U k ) of maximal prime subgraphs. Then, the following statements hold for each j ∈ {1, 2, . . . , k}. 1.
G can be graphical collapsible onto H j ; 2.
Proof. This can be easily obtained from the meaning of collapsibility and Theorem 1.

Basic Concepts and Properties
We begin with the definition of the structural Markov property of [10]. Specifically, if G is decomposable in U, Definition 5 degenerates to that defined in [10]. The structural Markov property indicates that the structures of different induced subgraphs are conditionally independent when the event {G ∈ U(A, B, S)} happens; see Figure 3 as an illustration.
whenever S is complete and separates A from B in G.
Proof. By Definition 5, the existence of the remaining edges in G A∪S is independent of those in G B∪S since S is complete and separates A from B in G. Therefore, we are naturally left with a statement of marginal independence G A∪S |= G B∪S since the term G S is redundant. Hence, the result follows.
Proposition 2 indicates that different components of undirected graphs are conditionally independent provided that their corresponding separators are complete. In order to illustrate our results with detailed investigations, we give a non-decomposable graph G in which A ∩ B separates A from B while A ∩ B is incomplete in Figure 4. We can easily find that the two subgraphs G B and G A have possible common edges in A ∩ B, which make the existence of the remaining edges in G A dependent of those in G B . In other words, these dependencies will disappear as long as A ∩ B is complete.
It also implies that an arbitrary undirected graph can be denoted by the graph product of its induced subgraphs as The structural Markov property can be well-characterized by the above operation.

Proposition 3.
Let π be the density of a graph law G with respect to the counting measure on U.
For any subset C ⊆ A, define G

(C)
A to be the graph on A such that G

(C)
A is complete in C and empty otherwise. Proposition 4. Let G be a fixed graph in U and G has a perfect sequence (U 1 , U 2 , . . . , U k ) of maximal prime subgraphs. If G has a structural Markov graph law G with the density π, then the density π can be factorized as (1) Proof. See Appendix A.

Joint Distribution Law
In this section, we will investigate how the structural Markov laws interact with the hyper Markov laws when they are considered as the joint prior laws.
Hyper Markov laws are motivated by the property that graph decomposition allows one to decompose a prior or posterior distribution into the product of marginal distributions on corresponding maximal prime subgraphs. For a fixed graph G ∈ U(A, B, S), any prior or posterior distribution of θ ∈ P (G) is uniquely characterized by its marginals θ A∪S and θ B∪S , taking values in P (G A∪S ) and P (G B∪S ), respectively.
Following [12], to be specific, a probability distribution of a random distribution θ, which takes values in P (G), is said to be a law, denoted by L. For A ⊆ V(G), the marginal law of θ A will be denoted by L A and L B|A will denote the conditional law of θ B|A .
Here, we give the definitions of weak and strong hyper Markov properties. Definition 6 ([12], Weak and strong hyper Markov). Suppose that G is a fixed graph in U(A, B, S) and θ ∈ P (G). Let L(θ) be a law of θ. We say that L(θ) is weak hyper Markov Further, we say that L(θ) is strong hyper Markov over G if Let X be a random sample from θ ∈ P (G). The conditional independence property of the joint distribution law (P, L) for the pair (X, θ) on G ∈ U can be characterized as follows.
Proposition 5. Let G be a fixed undirected graph in U with a prime decomposition (A, B, S). X is a random sample from θ ∈ P (G). Then, the joint distribution law of (X, θ) satisfies: 1.
if L(θ) is weak hyper Markov with respect to G, then It is worth mentioning that the hyper Markov property does not hold for the cases where separators are not complete. For instance, the graph G 1 in Figure 1  Let Θ be the family of Markov distributions over U and L the family of hyper Markov laws over U. For the sake of discussion, it is necessary for us to reconsider the notion of hyper compatibility, which was first proposed by [10], to characterize families of laws for every graph.
Definition 7 (Hyper compatibility). Let L, L ∈ L be the laws of θ ∈ Θ with respect to G and G on U, respectively. For A ⊆ V(G), we say L is hyper compatible on U if L A (θ) = L A (θ) whenever G, G are collapsible onto A and G A = G A .
Here L is always assumed to be hyper compatible over U. Based on the arguments above, some significant conditional independence properties of such joint law (L, G) can be investigated as the following. Proof. See Appendix A.
Theorem 2 reflects the conditional independence properties at both parameter and structural level.
Further, for any G ∈ U, let X be a random sample from a distribution θ ∈ Θ on U. If G is assigned the prior law G and θ is assigned the prior law L, then a joint distribution law is thereby created for (X, θ, G).
Proof. See Appendix A.
The conditional independence property of any such joint distribution law of (X, θ, G) can be characterized as follows.

Proof. See Appendix A.
Theorem 3 reflects that a random sample can be determined by both hyper and structural parameters, which will play a significant role in full Bayesian inference.
Proof. It can be easily obtained from Theorem 3.
Corollary 1 can be considered as a generalization of Proposition 5 since G is a random undirected graph on U with a prime decomposition (A, B, S). Without loss of generality, when the event {G ∈ U(A, B, S)} happens, i.e., given a graph G with a prime decomposition (A, B, S), we can deduce from Corollary 1 that

Posterior Updating for Graph Law
Our research in this section aims to identify the structure of models via the Bayesian approach. Based on our results in Section 3.2, in the following, we will use data from a certain distribution to learn the structure of a graph.
We assume that G has a structural Markov graph law G over U. For θ ∈ Θ, let θ have a law from a hyper compatible family L. Let X (n) = (X 1 , X 2 , . . . , X n ) denote a random sample of n observations from θ. If we focus on the density of posterior graph law π(G|x (n) , θ) with its conjugated prior graph law π(G), then the full Bayesian posterior graph law follows: where Z is a normalizing constant and ϑ is a hyperparameter that characterizes the law of θ. In general, it is hardly for us to estimate the structure of a graph G since the hyper parameter ϑ is unknown.
In the following, we investigate the properties of structural Markov laws when used as priors for models.

Proposition 8.
If the prior graph law G(G) is structural Markov on U, then the posterior graph law, obtained by conditioning on data X (n) = x (n) , is structural Markov on U.
Proof. By the conditional independence and Theorem 3, we can easily find that G A∪S |= G B∪S |(X (n) , θ, {G ∈ U(A, B, S)}).

Proposition 9.
Assume that the prior graph law G(G) is structural Markov and L(θ) is strong hyper Markov on U. Then, the following properties hold: 1.
The posterior graph law obtained by conditioning on data X (n) = x (n) is structural Markov with respect to U; 2.
The marginal data distribution of X (n) is Markov with respect to U; 3.
The posterior law of θ conditioning on X (n) = x (n) is Markov with respect to U.

This implies (i).
To prove (ii), by the conditional independence and Theorem 3, we have Our Bayesian approaches call for a strong hyper Markov prior law on θ with respect to G ∈ U. By Proposition 9, the posterior law of θ, given G, has a density of the following form: where U is the set of maximal prime subgraphs of G and S is the set of corresponding clique separators. If G(G) is structural Markov and L(θ) is strong hyper Markov with respect to G, then the posterior graph law of G will be given by It is worthwhile to point out that (3) indicates that the posterior graph law of G will preserve the structural Markov property under the hyper compatible laws. This result coincides with Proposition 8. Further, this updating may be performed locally by (3), which implies that the posterior graph laws on each maximal prime subgraphs of G are only dependent of the posterior of hyper compatible laws on the maximal prime subgraph.

Graphical Gaussian Models and the Inverse Wishart Law
A graphical Gaussian model is defined by a p-dimensional multivariate Gaussian distribution with the expected value µ and covariance matrix Σ, i.e., P(X) = N p (µ, Σ).
For simplicity, we assume that the model has zero mean in the following. Define K = Σ −1 to be the precision matrix of G, where where M + p denotes the set of p × p positive definite matrices. For any matrix M ∈ M + p , M A will denote the |A| × |A| matrix obtained by (M uv ) (u,v)∈A 2 . It has been shown that the global, local and pairwise Markov properties are equivalent in graphical Gaussian models; see [2]. We therefore conclude that the graphical Gaussian distribution P is Markov with respect to G if and only if Let x (n) be observations of n × p sample matrix X (n) , a random sample of size n from the graphical Gaussian distribution N p (0, Σ), and let S = x (n) (x (n) ) T denote the observed sum-of-products matrix. Then, for any U ∈ U , where |U| is the cardinality of U, and |Σ U | is the determinant of Σ U . It is similar for The inverse Wishart distribution is also termed inverse Wishart law, denoted by IW(δ, Φ). It is as the prior for the graphical Gaussian distribution N p (0, Σ). Conditioning on (4), Σ has a hyper inverse Wishart prior law, denoted by H IW(δ, Φ). The marginal density (Σ U |Φ U ) is of the form It is already shown in [12] that the hyper inverse Wishart law satisfies the strong hyper Markov property, which would allow us to compute the posterior updating of Σ by the margins of maximal prime subgraphs of the graph G. That is, for any U ∈ U , We conclude that Consequently, if we assign a prior law of form (1) for G, then from Proposition 8 we can conclude that the posterior graph law of G, given data X (n) = x (n) from the Gaussian distribution N p (0, Σ), can be obtained through (3) with a density of the following form:

Multinomial Models and the Dirichlet Law
Suppose that all the variables (X 1 , X 2 , . . . , X p ) are discrete-valued. Let V(G) denote the contingency table by I = I 1 × I 2 × · · · × I p , where I h is a finite set for each h ∈ {1, 2, · · · , p}. An element i ∈ I is referred to as a cell in this table. Based on this, (X 1 , X 2 , . . . , X p ) will take value in finite sets I = (I 1 , I 2 , . . . , I p ). Indeed, I is a discretevalued random vector whose distribution θ is assumed to be Markov with respect to G. Then, where θ(i U ) ∈ (0, 1), θ(i S ) ∈ (0, 1) and ∑ i∈I θ(i) = 1.
Let x (n) be observations of X (n) , a random sample from θ. X (n) is an n × p matrix where each row denotes an observation of I. The distribution of X (n) is the multinomial distribution with index n and probabilities θ, denoted by M(n, θ). Then, the likelihood function p(x (n) |θ, G) has the form where I U = ∏ u∈U I u , θ U = (θ(i U )) i U ∈I U , and n(i U ) counts the number of elements of x (n) U from the marginal cell i U . It is similar for p(x (n) S |θ S ), S ∈ S. The Dirichlet distribution is also termed Dirichlet law, denoted by D(α), where α = (α(i)) i∈I are hyper parameters. It is used as the prior for multinomial distribution M(n, θ). It is shown that the Dirichlet law satisfies strong hyper Markov property; see [12]. Thus, we have and then the posterior law can be written as Based on the above arguments, we can conclude that L(θ U |x (n) U ) = D(α U + n U ). Further, if we assign a prior law of form (1) for G, by Proposition 8, the posterior graph law of G, given data X (n) = x (n) obtained from θ, has density in the following way:

Dataset Description
In this section, we present the results for one application to a real dataset. We analyze a labor force survey dataset, which is available from [18]. This dataset is used to analyze the multivariate associations among income, education and family background on 1002 males in the American labor force. Here, we briefly describe these variables in this dataset.

Experiments and Results
We consider the posterior graph law of G in Equation (5), a Gibbs sampler can then be formed by using the following conditional posteriors:
For the prior graph law of G, following from Example 3.5 in [10], we consider an Erdős-Rényi random graph model prior on each edge (u, v) with where the parameter ϕ ∈ (0, 1) is a prior probability of existing edges. In this case, we set ϕ = 0.5. We use the inverse Wishart law IW(δ, Φ) as a prior for the covariance matrix over the graph G, with δ = 7 and Φ = I 7 as an identity matrix here.
By using the function above, we simulate n = 1002 observations. The experimental results are implemented by R package for 5000 iterations with 2500 as burn-in as follows: The experimental results on this dataset are displayed in Figures 5 and 6. The estimated posterior probabilities of the size of the graphs are shown in the left of Figure 5, which shows that our algorithm mainly visits graphs with sizes between nine and twelve edges. The figure on the right exhibits the estimated posterior probabilities of all visited graphs with various sizes, and also shows that more than 15 different graphs are visited. The graph in Figure 6 is the selected graph with the highest posterior probability from these visited graphs.  The results also suggest that the respondents' income has relationships with their own education and age. It is also shown that the income of respondents' parents is only related to their education.

Computations
In this section, we aim to design an algorithm to take samples that we are interested in, such as decomposable undirected graphs, from the structural Markov graph law G on U.

Ratio for Graph Law
Model comparison plays an important role in statistical analysis, especially in solving the problem of the ratio of distributions of variables in different states. We consider a graph itself as a random variable into the construction of this ratio between two undirected graphs G and G, where G is obtained from G by removing or adding one edge. This ratio can be written as The main objective of this next section is to greatly simplify this complex calculation under the assumption that the graph law G is structural Markov on U. For the sake of convenience, we define η U = π(G U ) and ζ S = π(G S ) for U ∈ U and S ∈ S.
In Figure 7, it is a special case where G is obtained from G by removing the edge (u, v), which is exactly in one prime component U ∈ U of G.

Proposition 10.
Let G be a fixed graph in U and G has a perfect sequence (U 1 , U 2 , . . . , U k ) of maximal prime subgraphs. Suppose that G is obtained from G by removing the edge (u, v). Then, 1.
if u and v are contained in exactly one maximal prime subgraph U j of G, then 2. if u and v are contained in both two neighboring maximal prime subgraphs U j , U j+1 of G, then where W = U j ∪ U j+1 in G.

Proof. See Appendix A
In Figure 8, it is a certain case where G is obtained from G by adding the edge (u, v) within two neighboring prime components U i and U j of G such that u ∈ U i and v ∈ U j . Proposition 11. Let G be a fixed graph in U and G has a perfect sequence (U 1 , U 2 , . . . , U k ) of maximal prime subgraphs. Suppose that G is obtained from G by adding the edge (u, v). Then, 1.
if u and v are contained in exactly one incomplete prime subgraph U h , then

2.
if U i u and U j v are the two distinct maximal prime subgraphs of G, then there are some prime components U i = U h 1 , U h 2 , . . . , U h m = U j such that Proof. See Appendix A.
In particular, if G is a decomposable graph in U, then we have the following results.
Lemma 1 ([19]). Let G be a decomposable graph in U and G has a perfect cliques sequence (U 1 , U 2 , . . . , U k ). Suppose that G is decomposable, obtained from G by removing or adding one edge (u, v). Then, 1.
If G is obtained from G by removing the edge (u, v), then u and v must belong to a clique U j of G; 2.
If G is obtained from G by adding the edge (u, v), then there exist two different cliques U i u and U j v such that S = U i ∩ U j is complete and separates U i and U j .

Corollary 2.
Let G be a decomposable graph in U and G has a perfect cliques sequence (U 1 , U 2 , . . . , U k ). Suppose that G is decomposable, obtained from G by removing or adding one edge (u, v). Then, 1.
If G is obtained from G by removing the edge (u, v) within U j , then If G is obtained from G by adding the edge (u, v) such that u ∈ U i and v ∈ U j , then the ratio Λ(G : G) is Proof. We first give the proof of 1. If (u, v) ∈ E(G) and (u, v) / ∈ E(G ), by Lemma 1, the deleted edge (u, v) must belong to a single clique U j . It is worthwhile to point out that all of U u , U v and U 0 are cliques in G and G . Then, which combines with (A19) gives the result. The proof of 2 is similar.

Sampling Decomposable Graphs from Structural Markov Graph Laws
We now take a random graph on U as the initial state to design the Markov chain Monte Carlo (MCMC) sampler for sampling from a structural Markov graph law. This technique relies on small perturbations to the edge set of a graph, indicating that one edge could be removed or added.
A reversible jump MCMC sampler is introduced for posterior sampling of decomposable graphical models, which relies on making single edge additions and removals; see [8]. We now use this jump MCMC methodology for our sample from structural Markov law in further details.
Let G denote a state variable and G the destination variable where G is obtained from a random graph G by removing or adding one edge, and so G would take the chain to the destination G with probability q(G, G ), which ensures detailed balance with respect to the target distribution π(G). Then, the Metropolis-Hastings acceptance ratio can be written as In fact, the Equation (9) is not the only choice yielding detailed balance. In particular, in order to reduce the error caused by excessive proportion, we can make the following adjustment that In general, since the proposal kernel, which we will set as symmetric, that is, q(G, G ) = q(G , G). Consequently, it is indicated that the acceptance probability is only dependent on the relative densities, which will only require us to compute α(G , G) = min 1, Λ(G : G) .
We randomly select a pair of vertices u, v ∈ V(G). If (u, v) ∈ E(G), then it is removed.
, then it is added. Let G +(u,v) denote the graph, which is obtained from G by adding the edge (u, v), and similarly for G −(u,v) . Let G (t) denote the state of G at time t and let U * be the set of decomposable undirected graphs with vertex set V(G). We begin with an ER random graph as its initial state, and then a Metropolis-Hastings algorithm for sampling decomposable graphs from a structural Markov graph law G can be constructed in the following Algorithm 1: Algorithm 1 A Metropolis-Hastings algorithm for sampling decomposable graphs from a structural Markov graph law.
Input: An ER random graph G ∈ U. Output: A set of decomposable graph from U.
Set G (0) = G for t = 0, 1, 2, . . . . do if (u, v) ∈ E(G (t) ) and G −(u,v) ∈ U * then set G (t+1) = G −(u,v) with probability min Based on our results in Section 5.1, this algorithm implies that the acceptance probability can be obtained by only evaluating the marginal likelihood of corresponding subsets of V(G) at each step when sampling from a posterior graph law in Proposition 8 or Proposition 9.

Conclusions
The main contribution of this paper is to define the structural Markov properties of [10] for non-decomposable undirected graphs. It is shown that an arbitrary undirected graph can be primely decomposed into the sum of several prime subgraphs. Based on the prime decomposition of undirected graphs and conditional independence, the structural Markov properties can be naturally extended to arbitrary undirected graphs.
Then, we propose a full Bayesian method for estimating the structure of a graph. This method requires that our observed data are from a certain distribution. By using our results, we have shown that the computation of posterior updating of graph law can be determined by the prime components margins, which would make the computation of the posterior graph law greatly simplified.
It should be pointed that all our research only focuses on undirected graphs. However, other classes of graphs, such as chain graphs or ancestral graphs, may have more interesting and valuable properties that can reflect the conditional independence of the graph structure in the problem of models determination. In the future, we will detail them at length.

Conflicts of Interest:
The authors declare no conflict of interest.

Appendix A. Proofs of Some Main Theorems and Propositions
Proof of Theorem 1. The equivalence of (i) and (ii) can be implied by [Corollary 2.5] [20]. So, it suffices to show that (ii) ⇔ (iii). We first give the proof of (ii) ⇒ (iii). Firstly, we know that P (G) = {P : I(G) ⊆ I(P)}.
So, we implied that R ∈ P (G D ) by (A1) and (A2). From which, it follows that P (G) D ⊆ P (G D ). Hence, the result follows by P (G) D ⊇ P (G D ). Conversely, under the "Faithfulness Assumption", there is a P * ∈ P (G) such that I(P * ) = I(G), implying I(P * D ) = I(P * ) D = I(G) D . By M-collapsibility, we know that P * D ∈ P (G D ), which gives I(G D ) ⊆ I(P * D ). Hence, we have I(G D ) ⊆ I(G) D . The result follows since it is easy to obtain that I(G) D ⊆ I(G D ).
Proof of Proposition 3. By the graph product operation, since S is complete and separates A from B, then for any G, G ∈ U(A, B, S), (A, B, S) is a prime decomposition of the graph G A∪S ⊗ G B∪S with vertex set V(G), and so is G A∪S ⊗ G B∪S . They imply that (i) holds. As for (ii), if G is structural Markov on U, then π(G) = π(G A∪S ⊗ G B∪S ) = π(G A∪S |{G ∈ U(A, B, S)})π(G B∪S |{G ∈ U(A, B, S)}), and similarly we can have the same result for π(G ). From (i), we have π(G A∪S ⊗ G B∪S ) =π(G A∪S |{G ∈ U(A, B, S)}) × π(G B∪S |{G ∈ U(A, B, S)}), by combining with (A8).
So, the result follows from (A16) and (A18). Similar proof can be given for the case of the strong hyper Markov law.
Proof of Proposition 10. We first give the proof of (i). Suppose that G, G ∈ U. If G is structural Markov on U, then we have The proof of (ii) is given as follows. It is obvious that W is prime in G . Consequently, G has a perfect maximal prime subgraphs sequence (U 1 , . . . , U j−1 , W , U j+2 , . . . , U k ), and then the Equation (7) follows by using (i).
Proof of Proposition 11. The proof of (i) follows similar steps to that of Proposition 10. To give the proof of (ii), let T be the junction tree with vertices being all maximal prime subgraphs of G. The construction of T can be referred to [21]. Since u, v are in two different maximal prime subgraphs of G, we then connect the U i and U j in T . Then, we will obtain a unique cycle. Without a loss of generality, the vertices on this cycle are denoted by (U i = U h 1 , U h 2 , . . . , U h m = U j ), where U h t and U h t+1 are connected by an edge in T . Then, it is easy to see that U \ {U i = U h 1 , U h 2 , . . . , U h m = U j } ∪ T is the set of all the maximal prime subgraphs of G . So, by applying (i), Equation (8) follows.