Dynamic Programming BN Structure Learning Algorithm Integrating Double Constraints under Small Sample Condition

The Bayesian Network (BN) structure learning algorithm based on dynamic programming can obtain global optimal solutions. However, when the sample cannot fully contain the information of the real structure, especially when the sample size is small, the obtained structure is inaccurate. Therefore, this paper studies the planning mode and connotation of dynamic programming, restricts its process with edge and path constraints, and proposes a dynamic programming BN structure learning algorithm with double constraints under small sample conditions. The algorithm uses double constraints to limit the planning process of dynamic programming and reduces the planning space. Then, it uses double constraints to limit the selection of the optimal parent node to ensure that the optimal structure conforms to prior knowledge. Finally, the integrating prior-knowledge method and the non-integrating prior-knowledge method are simulated and compared. The simulation results verify the effectiveness of the method proposed and prove that the integrating prior knowledge can significantly improve the efficiency and accuracy of BN structure learning.


Introduction
Many problems in the real world face uncertainty factors, and artificial intelligence today deals with problems of uncertainty, such as image recognition, speech recognition, intelligent decision-making, and so on.A Bayesian Network (BN) [1], as a type of Graphical Model, has become a powerful tool to treat uncertainty problems because of its strict mathematical foundation, visual and understandable graphic topological structure, as well as a natural expression of reality problems.In recent years, Bayesian Networks have been successfully applied in various fields such as medical diagnosis [2,3], fault diagnosis [4], decision analysis [5], gene analysis [6,7], target identification [8], threat assessment [9,10], and system reliability analysis [11,12].
However, before a BN is used to solve problems in engineering practice, the BN structure needs to be constructed first.Compared with the approximate BN structure search algorithm based on constraint and a heuristic algorithm, the accurate solution of the BN structure learning has recently become a popular topic in academic research.The accurate solution includes the branch and bound method [13], integer programming [14,15], and Dynamic Programming (DP) [16,17].Although the traditional BN structure learning algorithm based on DP can obtain the global optimal solution, the acquired structure is inaccurate when the sample does not completely contain the information of the real structure, especially when the sample size is small.The complexity problem is also the bottleneck faced by the current DP method.However, in reality, there is a lot of deterministic prior knowledge available in BN modeling.The prior-knowledge distribution of the BN structure is learned to put forward a method of Bayesian network model averaging [18].The BN structure learning is transformed into a constrained objective function extremum problem with the Entropy 2022, 24 node order in [19].Campos [20] considers various deterministic constraints, analyzes the interaction between constraints, and realizes it by the hill climbing method and PC algorithm.The Nicholson [21] Incorporating expert elicited structural information in the CaMML causal discovery program.The results show that with prior knowledge, CaMML has excellent properties.Castelo [22] conducted BN structure learning by specifying the prior knowledge.Borboudakis [23] took the probability of edge and path existence as prior knowledge through rigorous mathematical derivation and conducted structured learning through the BD score and hill-climbing method.The node order prior knowledge is integrated into the process of dynamic programming in [24].The path constraints are used to learn the BN structure with integer programming [25].Li proposed a constraint-based hill-climbing approach to incorporate all these constraints [26].Cussens [27] considered integer linear programming (ILP) as constrained optimization and treated all constraints as cutting planes.
As can be seen, the use of prior knowledge can not only improve the learning accuracy but also the learning efficiency.However, no one has ever studied the use of edge and path prior knowledge in the process of DP structure learning.Therefore, this paper proposes a BN structure learning algorithm based on DP, which combines expert prior knowledge and sample information effectively.The proposed algorithm incorporates edge constraints and path constraints to limit the search process of DP and delete parts of the planning space, so that all search processes meet the prior-knowledge requirements, thus reducing the complexity of the algorithm.
The rest of the paper is organized as follows.Section 2 introduces the theoretical basis of a Bayesian Network.Section 3 introduces in detail the dynamic programming BN structure learning algorithm integrating prior knowledge.In Section 4, the algorithm proposed in this paper is simulated and analyzed in terms of effectiveness and complexity.Section 5 is the conclusion.

Theoretical Basis of Bayesian Network
Prior to the general definition of Bayesian networks, several basic concepts in graph theory need to be introduced.
X and Y are two nodes in the directed graph G. X → Y means there is an edge from X to Y where X is called the parent node of Y while Y is called the child node of X.For any node X, Ch(X) represents the set of all child nodes while Pa(X) represents the set of all parent nodes of X.If X has no parent node, then X is a root node.The set of all root nodes of G is Root(G).If X has no child node, then X is a leaf node.The set of all leaf nodes of G is Lea f (G).If there are k nodes, i.e., X 1 , . . ., X k in G, and for each i = 1, . . ., k − 1, there is X i → X i+1 , then there is a directed path from X i to X k , marked as X 1 ⇒ X k .For any X ⇒ Y , X is called the ancestor node of Y while Y is the descendant node of X.Likewise, An(X) represents the set of all ancestor nodes of X and De(X) represents the set of all descendant nodes.If there is a node in G, and the node is its own ancestor node, then the graph has a directed cycle.If the directed graph does not have any directed cycle, then it is a Directed Acyclic Graph (DAG).
A Bayesian Network consists of a DAG and a Conditional Probability Table (CPT), and its complete definition is as follows: Definition 1 [28].A Bayesian Network is a binary group 1 G, Θ , in which G = (V, E) represents the structure of the Bayesian network, a DAG where V = {X 1 , X 2 , . . . ,X n } represents a set of random variables, and E is a directed edge set indicating the nature of causal association between variables.
Definition 2 [28].(node order) A node order o refers to the linear arrangement of some variables in which X i ≺ X j means X i is in front of X j .The node order o is the node order of G.If and only if for arbitrary X i , X j ⊂ vari(o) there is X i ≺ X j in o, then X j cannot be an ancestor node of X i .
Theorem 1 [28].With BIC score as the criterion, in an optimal Bayesian network, any node has at most log(2N/ log N) parent nodes, where N is the number of samples.This article refers to log(2N/ log N) as n mp (max number parents).

Dynamic Programming Algorithm
A Bayesian network structure learning algorithm based on dynamic programming is a process of accurately solving mathematical programming problems, and with exponential computational complexity, it is limited by the number of nodes.The state transition equations of dynamic programming are: where V is a set of variables, X is a leaf node in the optimal structure, and Score(•) is a decomposable scoring function [29].Equations ( 1) and ( 2) connect the relationship between the whole structure and its substructures, and the optimal network on the remaining nodes V\X is recursively constructed through the above process until the remaining nodes are a variable.All the child node sets form a Hasse Diagram, showing the whole process of dynamic programming.When the DP algorithm calculates from top to bottom, the root node is determined first, and then the leaf nodes that are gradually added to the remaining nodes are universal set variables.When the DP algorithm calculates from bottom to top, leaf nodes are determined first, and then the root nodes that are gradually added to the remaining nodes are empty set variables.Because a Hasse Diagram contains the node order information of the network, it is also called the Order Graph.There is another similar graph called the parents graph [17].Figure 1 shows a node order graph with the number of nodes as n = 4 and the parent node graph of node X 1 .

Expression of Constraints
In this paper, deterministic prior knowledge is directly transformed into some constraints.Prior knowledge and prior constraints are equivalent concepts.For convenience of expression, constraints are used to refer to prior knowledge later.C is used in this article to refer to a set of constraints representing edges or paths, which are expressed as follows: Y is used to express any edge constraint between X and XY  means X cannot be an an-

Expression of Constraints
In this paper, deterministic prior knowledge is directly transformed into some constraints.Prior knowledge and prior constraints are equivalent concepts.For convenience of expression, constraints are used to refer to prior knowledge later.C is used in this article to refer to a set of constraints representing edges or paths, which are expressed as follows: This paper uses constraints in the following two steps: (1) to limit the construction process of the node order graph.Specifically, some illegal nodes are deleted from the node order graph, which can reduce the complexity, especially the space complexity.(2) A sparse parent node graph and query algorithm are constructed, so that the results of the optimal parent node query can satisfy the constraints.Theorem 2 is given as the basis for realizing constraints.
Theorem 2. In a given set of constraints on edges or paths edge , and if there is no edge(X 1 , Y 1 ) in G, then due to the non-aftereffect property of the dynamic programming method, all the extended structures G will not satisfy constraints C, so there must be edge(X 1 , Y 1 ) in G. path(X 2 , Y 2 ) ∈ C can also be proven in the same way.The proof is completed.

Pruning Node Order Graph
With the given constraints of edges X 1 → X 2 , node {X 2 } in the node order graph needs to be deleted because it violates the constraint: the structures produced by the optimal substructure of this node all satisfy the node order X 2 ≺ X 1 which is obviously inconsistent with constraints X 1 ≺ X 2 , so it is unnecessary to calculate node X 2 when constructing the node order graph.As can be seen from the above example, if a node in the node order graph violates a constraint, it needs to be deleted.The theorem is given as follows: Theorem 3. In a given constraint set C, there is a node U and its set of node order o U in the node order graph, then U needs to be deleted from the node order graph if and only if there is such a Proof of Theorem 3. In subsequent nodes of any, U, X 1 is added as a leaf node, so obviously for any o ∈ o U , there is Theorem 4. In a given edge constraint set C and the variable set V of the problem domain, when traversing to any node U during the construction of node order graph, make G s = sub(G C , vari(C)\U).If the resulting new node U ∪ X of U must satisfy X ∈ (V\vari(C) ∪ root(G s )), then all the con- structed nodes in the last node order graph satisfy the constraint C and all deleted nodes violate the constraint C.
Proof of Theorem 4. The node order graph is constructed from an empty set, which satisfies the constraint C. At this time, no node in the node order graph is deleted.When traversing to any node U, suppose all the existing nodes in the node order graph satisfy C and all the deleted nodes violate C.Then, it is necessary to prove that the new nodes constructed in the node order graph satisfy the constraints and the deleted nodes violate the constraints.
The newly constructed nodes satisfy the constraints: . . ,X n , because U satisfies the constraint, and the new variable X has nothing to do with constraint C, then obviously U ∪ X also satisfies the constraint.If X ∈ (root(sub(G C , vari(C)\U))), the newly added variable X is the remaining variable in vari(C) and it is the root node in the subgraph of the remaining variables of the edge constraint graph.So there is no such ancestor node Y ∈ vari(C)\(U ∪ X) of X in the remaining nodes, causing there to be Y ⇒ X in C. Therefore, according to Theorem 3, any newly added node satisfies the constraints.
No deleted node satisfies the constraints.Suppose H is deleted: H can be remade by combining H\X with X, with X as any variable in H.If there is H\X and H\X is the node that satisfies the constraint, then X is a non-root node in the corresponding subgraph, but in this subgraph, there must be a corresponding root node Y.There is Y ⇒ X and Y / ∈ H, so according to Theorem 3, H does not satisfy the constraint.When there is no arbitrary X to make H\X satisfy the constraint and if H satisfies the constraint, according to Theorem 3, there is a Y in H and X ⇒ Y .Then, a node order of constraint C, i.e., X ≺ Y ≺ . . .≺ Z, can be constructed by using H ∩ vari(C).Take the last variable Z: if Z does not exist, then Z = Y and H\Z satisfy the constraint, which contradicts the condition.Therefore, the hypothesis is invalid, which proves that the arbitrarily deleted node does not satisfy the constraints.The proof is completed.
Theorem 3 provides the basis for pruning the node order graph.The most direct way for the pruning is to make judgments on each node U so that they satisfy Theorem 3.However, even the simplified judgment algorithms cannot perform with the best efficiency.Therefore, Theorem 4 proposes a method to construct a node order graph, so that all nodes in the graph satisfy the constraint.According to the method in Theorem 4, we can make full use of the constraint to prune the node order graph, reduce the space complexity, and obtain the optimal structure after the node order graph is constructed.
As the score of sets in the node order graph needs to be queried repeatedly, in order to increase the efficiency of sets querying, this paper designs the hash function of sets in which different sets correspond to different hash values.The hash function is designed as follows: Suppose the set of all variable in the problem domain is {X 1 , . . . ,X n }.Set binary number b with the number of digits as n.For a set U in the node order graph, if X i ∈ U, then set the ith place of b to 1, otherwise set it to 0. Finally, convert b to decimal which will be the corresponding hash value.
The specific algorithm flow of the node order graph construction is shown in Algorithm 1.

Construction and Query of Sparse Parent Node Graph
The construction algorithm of the sparse parent node graph is as follows: As the sparse parent node graph stores the information of the first n mp layers of nodes in the complete parent node graph, we first construct a complete parent node graph based on the constraint according to Theorem 4, and then store it in the sparse parent node graph every time a node is constructed.When the first n mp layers are completely constructed, the sparse parent node graph will be obtained.Here is an example to illustrate how to construct a complete parent node graph based on this constraint.for each node U In the PreviousLyaer do 4.
V r ← V c \U 5.
G r ← Removing variables of V r And their relative arcs in G c 6.
R ← Root variables of G r 7.
NewLayer Figure 2 gives some constraints of variables.To calculate a node of X 3 in the graph, such as the node {X 1 , X 2 }, we only need to compare the scores of {X 1 } and {X 2 }, i.e.,score {X 1 } score {X 2 } and score(X 3 , {X 1 , X 2 }) when there is no constraint.Moreover, because X 1 is the parent node in the constraint, only score {X 1 } ,score(X 3 , {X 1 , X 2 }) are compared now.In addition, the crossed nodes in Figure 2b do not need to be solved because these nodes contain variables X 4 and X 4 is by no means the parent node of X 3 .
Entropy 2022, 24, x FOR PEER REVIEW 7 of 17 the constraint according to Theorem 4, and then store it in the sparse parent node graph every time a node is constructed.When the first mp n layers are completely constructed, the sparse parent node graph will be obtained.Here is an example to illustrate how to construct a complete parent node graph based on this constraint.
Figure 2 gives some constraints of variables.To calculate a node of 3 X in the graph, such as the node  

,
XX , we only need to compare the scores of   when there is no constraint.Moreover, because 1 X is the parent node in the constraint, only ,, score X X X are compared now.In addition, the crossed nodes in Figure 2b do not need to be solved because these nodes contain variables

4
X and 4 X is by no means the parent node of Based on the above ideas, the specific algorithm flow of the sparse parent node graph construction, defined as PBDP-EDGE, is shown in Algorithm for each node U Such that ( ) Based on the above ideas, the specific algorithm flow of the sparse parent node graph construction, defined as PBDP-EDGE, is shown in Algorithm 2.
The query algorithm idea of the sparse parent node graph is as follows: Suppose δ is a query constraint of X, that is, all the possible pa(X) in U must satisfy the constraint δ.In other words, the front set of parents X that satisfy δ is the best parent node set in U. Furthermore, query constraint δ must satisfy the following two conditions: (1) Y ⊂ U, (2) CPa(X) ∩ U ∈ Y, Y ∩ CNPa(X) = ∅, in which CPa(X) means it is the set of parent nodes of X and CNPa(X) means it is not.

Constructing Sparse Parent Node Graph Based on Edge Constraint
Input: V-set of all variables,C-set of constraints,score(., .)-decomposablescore function value Output: SPG-Sparse parent node graph for layer ← 0 to n do 4.
for each node U Such that U ∈ V\{X}&|U| == layer do 5.
Append [score X , parents X ] with [BestScore(X, The specific implementation is as follows: First, set a bit array valid X of all 1s and with the same length as parents X .Then, according to the first condition in δ, first do valid X & ∼ parents X i X for each X i that satisfies X i ∈ V\U.Then, according to the second condition in δ, do valid X & ∼ parents X i X for each X i that satisfies X i ∈ CPa(X) ∩ U.The purpose of this step is to ensure that all the remaining sets include all the variables in CPa(X) ∩ U. Finally, conduct valid X & ∼ parents X i X for each X i that satisfies X i ∈ CNPa(X) ∩ U.This step eliminates the sets which include variables in any CNPa(X) ∩ U, and the front set in the remaining sets is the best parent node set.Algorithm 3 shows the specific algorithm of the best parent node set query.Algorithm 3. Query algorithm of best parent node set.

The Optimal Parent Node Set Based on Query Constraints
Input: V-set of all variables, C-set of constraints, SPG-sparse parent node graph Output: bestsparents(., .)-Thebest parent node set, bestscore(., .)-Thecorresponding score Here is an example to illustrate the implementation process.As shown in Table 1, from {X 1 , X 2 , X 4 , X 5 }, find the best parent node sets of X 3 , CPa(X 3 ) = {X 1 , X 2 } and CNPa(X 3 ) = {X 4 }.At this point, the remaining candidate sets in the table all satisfy the first condition of δ, that is, they all are subsets of {X 1 , X 2 , X 4 , X 5 }.If there is no constraint, {X 1 , X 5 } will be the best parent node set.Next, we need to realize the second condition.Because {X 1 , X 2 } = CPa(X) ∩ U, do parents X 1 X &parents X 2 X .The result is shown in the seventh row of the table, in which a value of 1 means that all sets contain {X 1 , X 2 }.Because CNPa(X 3 ) = {X 4 }, find ∼ parents X 4  X .The result is shown in the eighth row of the table in which the value of 1 means none of them contains X 4 .Finally, sum the seventh and the eighth row to obtain the final valid X .The first set {X 1 , X 2 , X 5 } is the best parent node set satisfying the constraint.

Pruning Node Order Graph
The algorithm of the pruning node order graph by path constraint is the same as that by edge constraint.First, construct a constraint graph G C , then use the algorithm in Table 1 to prune the node order graph, wherein path constraint graph G C of the constraint set C is a directed acyclic graph containing variable vari(C), and for arbitrary {X 1 , X 2 } ⊂ vari(C), there is an edge

Construction and Query of Sparse Parent Node Graph
As the parent node of Y must contain at least one X or one descendant node of X, when constructing the sparse parent node graph, for Y, it is necessary to store all the parent node sets with the number of variables below n mp .For other variables, the sparse parent node graph is constructed according to the unconstrained condition.The specific construction algorithm of the sparse parent node graph, defined as PBDP-PATH, is shown in Algorithm 4.
The query algorithm idea of path constraint is as follows: For a given path constraint X ⇒ Y , to find the best parent node set S of Y in U, if X ∈ U, there is at least one Z ∈ {X ∪ des(X)} to make Z ∈ S. des(X) is the descendant node of X in the structure of U. If X ⇒ Y, then there is Z / ∈ S for all Z ∈ {X ∪ des(X)}.The specific query method is as follows: Initialize a bit array valid X of all 1s and with the same length as parents X .Conduct valid X & ∼ parents X k X for each X k that satisfies X k ∈ V\U.For each X i , set an auxiliary bit array Cvalid of all ones and find the descendant node des(X i ) of X i .For each Z ∈ {X i ∪ des(X i )}, perform the OR op- eration Cvalid parents Z X .Finally, perform the AND operation valid ← valid&Cvalid .For each X j , find des X j .For each Z ∈ {X i ∪ des(X i )}, perform the AND operation valid ← valid & parents Z X .Algorithm 5 shows the specific algorithm flow of the best parent node set query.
An example is given below to illustrate the implementation process.Figure 3 is an example of path constraints, in which C is X 1 ⇒ Y and X 2 ⇒ Y .At this point, we need to find the best parent node set S of Y from U = {X 1 , X 2 , X 3 , X 4 }.Table 2 shows the specific solution process.In the table, parent node sets are selected with part of them as the subset of U, so the first condition of δ has been satisfied.At this point, if there is no constraint, {X 2 , X 4 } will be the best parent node set.When a constraint is given, perform the OR operation for the line where the elements of {desX 1 ∪ X 1 } are and obtain Cvalid 1 , as in line 7. Perform OR operation for the line where the elements of {desX 2 ∪ X 2 } are and obtain Cvalid 2 , as in line 8.Then, perform the AND operation valid ← valid&Cvalid 1 &Cvalid 2 .At this time, valid equals 1, which shows that there are elements both from {desX 1 ∪ X 1 } and {desX 2 ∪ X 2 }; therefore {X 3 , X 4 } is the best solution at this time.
for each node P such that P ∈ V\{X}&|P| == layer do 13.
for each P ∈ V\{X} do 26.
end for 28.sort with score X in descending 29.
return [score X , parents X ] 30.end function Table 2. Example of query process based on path constraint. .Table 2 shows the specific solution process.In the table, parent node sets are selected with part of them as the subset of U , so the first condition of  has been satisfied.At this point, if there is no constraint,   24 ,

XX
will be the best parent node set.When a constraint is given, perform the OR operation for the line where the elements of  

Validity Verification
In this section, in order to verify the effectiveness, first, an 18-node network is generated by using Matlab constructor.Then, the constructed network, Asia network, and Sachs network are simulated and verified with 20 samples.In order to verify that this method can really integrate constraints, some extreme simulation conditions are set.

1.
The simulation is carried out with the Asia network.All the edge prior knowledge is given, which is verified by the PBDP-EDGE structure.Part of the path prior knowledge is given, specifically 1 ⇒ 6 , 2 ⇒ 6 , 2 ⇒ 8 , 3 ⇒ 7 , 3 ⇒ 8 , 4 ⇒ 7 , and 4 ⇒ 8 , which is verified by the PBDP-PATH structure.The results are shown in Figure 4.The real network structure of the Asia network is shown in Figure 4a.It can be seen from Figure 4b that training samples contain very little information and can only learn a few edges, and a complete structure cannot be constructed.It can be seen from Figure 4c that the correct structure can be learned even if the sample size is small, which demonstrates the correctness and effectiveness of the integrating edge prior-knowledge algorithm proposed in this paper.It can be seen from Figure 4d that it is obvious that all the learned structures contain these paths ( 1 6  , 26  , 28  , 37  , 38  , 47  , prior knowledge is given, specifically 1 2  The real network structure of the Sachs network is shown in Figure 5a.As can be seen from Figure 5b, the training sample contains little information, only a few edges can be learned, and a complete structure cannot be constructed.It can be seen from Figure 5c that partial correct structures can be learned even if the sample size is small, indicating the correctness and effectiveness of integrating the edge prior-knowledge algorithm proposed in this paper.It can be seen from Figure 5d that it is obvious that all the learned structures contain these paths (1  13 9   , and 15 10  , which is veri- fied by the PBDP-PATH structure.The results are shown in Figure 6.The real structure of the Constructed network is shown in Figure 6a.As can be seen from Figure 6b, the training samples contain little information, only a few edges can be learned, and a complete structure cannot be constructed.It can be seen from Figure 6c that partial correct structures can be learned even if the sample size is small, indicating the correctness and effectiveness of the integrating constraints of the edge algorithm proposed in this paper.It can be seen from Figure 6d that it is obvious that all the learned The real structure of the Constructed network is shown in Figure 6a.As can be seen from Figure 6b, the training samples contain little information, only a few edges can be learned, and a complete structure cannot be constructed.It can be seen from Figure 6c that partial correct structures can be learned even if the sample size is small, indicating the correctness and effectiveness of the integrating constraints of the edge algorithm proposed in this paper.It can be seen from Figure 6d that it is obvious that all the learned structures contain these paths ( 1 ⇒ 4 , 1 ⇒ 17 , 2 ⇒ 18 , 2 ⇒ 13 , 2 ⇒ 5 , 3 ⇒ 5 , 3 ⇒ 9 , 6 ⇒ 8 , 5 ⇒ 10 , 5 ⇒ 12 , 7 ⇒ 5 , 10 ⇒ 13 , 13 ⇒ 9 , and 15 ⇒ 10 ).
Therefore, the above simulation results can prove that the method proposed in this paper is correct and reliable and can be realized no matter what kind of prior knowledge is given.

•
The integrating edge constraint is simulated by the Halifinder network, a large-scaled network, and half of the real edges are randomly selected as prior knowledge.The training sample size is 200, 500, and 1000, respectively.Table 3 shows the simulation results.PBDP (Priors Based DP) indicates the integrating prior-knowledge method, which is measured in seconds.The space cost refers to the size of the array to be set, and the proportion represents the time and space ratio between the PBDP method and DP method.
The path constraint is simulated in the same way, with the results shown in Table 4.It can be seen from Tables 3 and 4 that the integrating edge constraint and path constraint can not only improve the scores, but also effectively reduce the complexity of time and space.To sum up, this method can use edge constraints and path constraints to effectively reduce the time and space complexity of the Dynamic Programming algorithm and improve its timeliness significantly.

Conclusions
In this paper, the specific process of dynamic planning is analyzed, and its restrictive relationship with edge constraints and path constraints is determined.The prior constraints are used to restrict and guide each link in dynamic planning, and deterministic prior knowledge is integrated into the dynamic planning of BN structure learning.The BN structure learning algorithm of dynamic planning integrating prior knowledge is proposed, and the specific implementation is described in detail.Simulation results show that this algorithm can use edge prior knowledge and path prior knowledge to effectively reduce the time and space complexity of the dynamic programming algorithm.It also reveals the complementary relationship between prior knowledge and learning in BN modeling, that is, only by making full use of prior knowledge and training sample information can an ideal model be obtained.This paper also provides some implications for the breaking through of the node number in the dynamic programming method.

Figure 1 . 1 XParent node graph of 1 X
Figure 1.Node order graph and parent node graph of 1 X : (a) Node order graph of four nodes; (b)

Figure 1 .
Figure 1.Node order graph and parent node graph of X 1 : (a) Node order graph of four nodes; (b) Parent node graph of X 1 .

Figure 2 . 3 X
Figure 2. Construction of complete parent node graph of 3 X with given edge constraint: (a) Edge constraint; (b) Construction of complete parent node graph of 3 X .

Figure 2 .
Figure 2. Construction of complete parent node graph of X 3 with given edge constraint: (a) Edge constraint; (b) Construction of complete parent node graph of X 3 .

Algorithm 2 .
Construction algorithm of sparse parent node graph.

Algorithm 4 .
Construction algorithm of sparse parent node graph.Constructing Sparse Parent Node Graph Based on Path Constraints Input: V-set of all variables,C-set of constraints,score(., .)-decomposablescore function value Output:SPG-sparse parent node graph 1

1 Cvalid 2 Cvalid.
, as in line 7. Perform OR operation for the line where the elements of  , as in line 8.Then, perform the AND operation At this time, valid equals 1, which shows that there are elements both from  

 , 28 
, 34  , 46  , 56  , and 9 11  , which is verified by the PBDP-PATH structure.The results are shown in Figure 5. (a) Real network structure (b) A structure learned without prior knowledge (c) The PBDP-EDGE structure (d) The PBDP-PATH structure
⇒ Y means X is the ancestor node of Y. X ⇒ Y means X cannot be an ancestor node of Y.In these two cases, Y is called a tail node and is a head node.path(X, Y) is used to express any X path constraint between X and Y; • Suppose there is an arbitrary node order o and constraint set C in which o and C are consistent.If and only if for arbitrary {X 1 be the parent node of Y. edge(X, Y) is used to express any edge constraint between X and Y;• X

Sparse Parent Node Graph Based on Edge Constraint Input:
2. Algorithm 2. Construction algorithm of sparse parent node graph.Constructing V-set of all variables, C -set of constraints, Sort [score X , parents X ] with score X In descending 13. end for 14. return SPG ← [score., patrnts.]

Table 1 .
Example of query process based on constraints.

Best Parent Node Set Based on the Path Constraint Query Input: V-set
Query algorithm of best parent node set. of all variables, C set of path constraints, SPG-sparse parent node graph Output: bestsparents(., .)-thebest parent node set, bestsCore(., .)-thecorresponding score 1. valid ← allScores X 2. for each do 3. valid ← valid& ∼ parents Y i X 10. end for 11.valid ← valid&Cvalid 12. end for 13. for each Y k such that (Y k ⇒ X) ∈ C do 14.for each S Holding that Y k ⇒ S in G do 15.valid ← valid& ∼ parents S X 16. end for 17. end for 18. index ← f irstSetBit(valid) 19.return scrores X [index]

Table 2 .
Example of query process based on path constraint.

Table 3 .
Simulation comparison of integrating constraints of edge.

Table 4 .
Simulation comparison of integrating path constraints.