A New Ensemble Learning Algorithm Combined with Causal Analysis for Bayesian Network Structural Learning

: The Bayesian Network (BN) has been widely applied to causal reasoning in artiﬁcial intelligence, and the Search-Score (SS) method has become a mainstream approach to mine causal relationships for establishing BN structure. Aiming at the problems of local optimum and low generalization in existing SS algorithms, we introduce the Ensemble Learning (EL) and causal analysis to propose a new BN structural learning algorithm named C-EL. Combined with the Bagging method and causal Information Flow theory, the EL mechanism for BN structural learning is established. Base learners of EL are trained by using various SS algorithms. Then, a new causality-based weighted ensemble way is proposed to achieve the fusion of di ﬀ erent BN structures. To verify the validity and feasibility of C-EL, we compare it with six di ﬀ erent SS algorithms. The experiment results show that C-EL has high accuracy and a strong generalization ability. More importantly, it is capable of learning more accurate structures under the small training sample condition.

In the era of Big Data, in order to obtain the scientific evaluation and decision, mathematical modeling for knowledge expression and inference, such as data mining and analysis, plays a crucial role, especially for complex problems with uncertain information and causal relationships.The Bayesian Network (BN), on the basis of probability theory and graph theory, is a powerful model for uncertainty expression and causality reasoning [1] and gets lots of attention in the area of defense, medicine and finance.BN modeling includes structural learning and parameter learning.Structural learning, considered the foundation of BN, is to construct a directed network topology based on objective data and priori knowledge [2].At present, one of the research hotspots of BN is how to learn network structures accurately and efficiently from large datasets.
There are mainly two types of structural learning approaches: the Conditional Independence Testing method (CIT) and Search-Score method (SS).The key to CIT is to choose appropriate measure functions for the dependency test among network nodes, so as to find variable groups having independence relationships.The typical testing algorithms are mostly based on Chi-Square test and conditional mutual information, such as the SGS algorithm, EP algorithm and TPDA algorithm [3].The employment of CIT is intuitive and easily understandable.Unfortunately, with the increase in network nodes, the complexity of CIT increases exponentially, causing low efficiency and unreliable results.By contrast, SS is more efficient in dealing with large-scale networks.It treats the structural learning as an optimization problem [4], that is, one search algorithm is used to find the optimal network structure in a network space constructed by all nodes, and the process is guided by one scoring function.SS is simple, convenient and feasible so it has become the mainstream method of BN structural learning [5].
Seen from the principle of SS methods, the scoring function and search algorithm are the two most important parts.The scoring function is used to evaluate structures and the optimal structure is found by search algorithms.With different scoring functions and search algorithms proposed, scholars have designed varieties of SS algorithms: Cooper [6] proposed the well-known K2 algorithm based on the Hill-Climbing (HC) algorithm, which needs the node order as prior information.Lam [7] improved the K2 algorithm into the K3 algorithm by using the Minimum Description Length (MDL) as the scoring function.In the case of small training samples, the K3 algorithm can get more accurate results than the K2 algorithm.Then, Heckeman [8] put forward another new scoring function, Bayesian Dirichlet equivalent (BDe), to improve the K2 algorithm.There is no need to obtain the node order in advance in this SS algorithm, which expands its application field.
However, there is a general consensus that the searching space is too large to find the optimum.Those SS algorithms have low searching efficiency and are easily trapped into local optimization [9].To solve this problem, heuristic search algorithms in the field of intelligent optimization are introduced to search for the optimal structure.The Genetic Algorithm (GA), Particle Swarm Optimization (PSO), Ant Colony Optimization (ACO) and Artificial Bee Colony (ABC) algorithm have been applied to BN structural learning and achieved satisfactory results because of their global searching ability [10][11][12].However, it is still a great challenge to obtain the global optimal structure as the structural learning from objective data is a typical NP-hard problem [13].
To further enhance the accuracy and efficiency of SS algorithms, some other improvement have also been made by modifying the scoring function or combining it with the CIT method.Campos [14] designed a new scoring function based on a Mutual Information Test (MIT), and the accuracy of network structure is measured by the Kullback-Leibler distance between the structure and observed data.Bouchaala [15] put forward a new scoring function, Implicit Score (IS), which uses the hidden estimation and avoids calculating probability distribution of variables in advance.Fan [16] analyzed the characteristics of the K2 algorithm and MCMC algorithm, and combined the advantages of both sides to propose an improved structural learning algorithm.This algorithm can obtain a stable structure efficiently without prior knowledge.Based on the unconstrained optimization and GA, Wang [17] constructed a restricted GA-based model for BN structural learning, leading to a significant reduction in search space.Li [18,19] introduced causal Information Flow theory to carry out causal analysis of structures in advance and improve initial network structures in searching process, which significantly enhances searching efficiency and effectively avoids premature convergence.
To some extent, the above-developed algorithms improve search spaces and scoring functions, increasing the accuracy and efficiency of structural learning.However, the algorithm optimization is restricted as learning network structure from a large-scale dataset remains the famous NP-hard problem [20].Although a better network structure may be found, it is still easy to fall into the local optimal solution.In addition, more and more search algorithms and scoring functions are proposed, generating all kinds of SS algorithms.When applied to networks with different sizes and types, SS algorithms perform differently.Thus, it is difficult to select a reasonable search algorithm and scoring function.More importantly, each SS algorithm has a weak generalization ability.The performance of an algorithm is not stable when it deals with different networks.(More detailed discussion about performances of different SS algorithms will be elaborated in Section 2.) Ensemble Learning (EL), an effective method of improvement of model performance in Machine Learning (ML), believes that if models are approximately independent of each other, the performance of an integrating model is better than that of an individual model [21].EL is able to improve the accuracy and generalization of learning systems so it has become a hot topic of ML, which has been applied to face recognition, medical diagnosis and weather forecasting successfully [22,23].Considering the Symmetry 2020, 12, 2054 3 of 17 differences between various SS algorithms, we will try to establish a new EL-based SS algorithm to improve the accuracy and generalization of structural learning.As we all know, the EL mechanism includes the base learner and integration way.Base learners are usually easy to be trained, but the integration way is hard to design and has a significant impact on EL effects [24].BN expresses causal relationships among variables quantitatively.The high complexity of BN structure makes traditional fusion methods no longer applicable.Aiming at the causality of BN, we will introduce the causal Information Flow (IF) theory [25], an emerging causal analysis theory, to propose an improved weighted fusion way that is fit for the structure aggregation.
On the basis of EL and causal IF, we put forward a new BN structural learning algorithm, namely C-EL.We establish the EL mechanism of BN structures by the means of Bagging method.Then causal IF is introduced to construct a new weighted integration rule and the weight is calculated from the causality of BN.Our proposed model can achieve the fusion of various algorithms.The experiment results show that C-EL has high accuracy and strong generalization ability.More importantly, it is capable of learning more accurate structures under the small training sample condition.The rest of the paper is organized as follows: Section 2 explains the theoretical formulations and Section 3 discusses the existing problems of SS algorithms by contrast experiments.Detailed elaboration of the C-EL algorithm is presented in Section 4. Application of the proposed algorithm in structural learning and results analysis are elaborated in Section 5. Section 6 concludes the presented studies.

Bayesian Network
The Bayesian Network (BN), also known as the Bayesian reliability network, is not only a graphical expression of causal relationships among variables but also a probabilistic reasoning technique [1].It can be represented by a binary B = < G, θ >: V is a set of nodes representing variables in the problem domain.E is a set of arcs, and a directed arc represents the causal dependency between two variables.• θ is the network parameter, that is, the probability distribution of each node.θ expresses the degree of mutual influence among nodes and presents quantitative characteristics in the knowledge domain.
Assuming a set of variables V = (v 1 , • • • , v n ), the mathematical basis of BN is the Bayes Theorem shown as Equation (1).
where P(v i ) is the prior probability, P v j v i is the conditional probability and P(v i v j ) is the posterior probability.Based on P(v i ), P v i v j can be derived by the Bayes Theorem under the relevant conditions The joint probability distribution of all nodes in the BN can be derived from Equation (1) under the assumption of conditional independence [19].
where v i is the node, A(v i ) is the parent node of v i .Equation (2), the core of Bayesian inference, can achieve the calculation of probability distribution of a set of query variables based on the evidence of input variables.BN modeling mainly includes structural learning and parameter learning.Structural learning is the basis and prerequisite of parameter learning, which mines causal relationships from data and expresses them in the form of a network.It can be described as a process where, based on an observed dataset D of the node set X, the network structure G that best matches D can be found through intelligent learning algorithms [26,27].SS algorithms are widely used for structural learning and achieving satisfactory results.We will make a specific introduction to it in the next subsection.

Search-Score Based Method
The basic idea of the SS method is: firstly, the scoring function is defined to measure the matching degree between the structure and training data.Then, an initial network is constructed and the search algorithm is applied to search for structures until the scoring function converges.The process can be described by the mathematical expression: where f (G i : D) is the score of a network G i based on dataset D. argmax_score is a function used to find the top-scoring structure.Scoring function and search algorithm are the two important components of SS methods, which directly determine the accuracy and efficiency of structural learning.The earliest scoring function is the Bayesian Information Criterion (BIC) proposed by Schwarz in 1978 [28].Later, a variety of scoring functions were developed, such as the K2 criterion, MDL, BDe and MIT mentioned in Instruction, for BN structural learning.In addition to scoring functions, search algorithms also play a pivotal role in the SS method [29].As structure searching is the famous NP-hard problem, many intelligent algorithms, such as GA, PSO, ACO and ABC algorithm, have been applied to searching optimization.Besides, some scholars have constructed hybrid search algorithms by combining CIT, such as the sparse candidate algorithm and max-min hill-climbing algorithm [30].The searching space of these algorithms is reduced, and convergence speed is improved by an independent relationship test.
Thus, there are all kinds of SS algorithms based on different scoring functions and search algorithms.How does the performance of different algorithms differ?What is the difference in the learning results of different networks with the same algorithm?Is there an algorithm with a strong generalization ability?In the next section, quantitative analysis of SS algorithms will be carried out through simulation experiments on the issues above.

Analysis of Search-Score Based Methods
In this section, we select different scoring functions and search algorithms for BN structural learning to discuss the performance of various SS algorithms from the view of accuracy and generalization.On the basis of studying many documents [12,14,18], the three most classical scoring functions, whose mathematical expressions are shown below, and the two most widely used search algorithms (HC and GA) are adopted for experiments, forming a total of six different SS algorithms: "BIC+HC", "BIC+GA", "MDL+HC", "MDL+GA", "BDe+HC" and "BDe+GA".

•
BIC criteria: • MDL criteria: • BDe criteria: where n is the number of network nodes; q i is the number of parent nodes combination; r i is the number of values of node v i ; m ijk represents that j takes the value of parent nodes of v i ; m ijk ; α ijk is the hyper-parameter of Dirichlet distribution, whose meanings of corner marks are same with m ijk ; Γ( * ) is Gamma function.
In the structural learning experiments, three representative BNs in different scales are used for algorithm verification: the Asia network composed of 8 nodes and 8 arcs, the Child network composed of 20 nodes and 25 arcs, and the Alarm network composed of 37 nodes and 46 arcs.The three standard networks have been widely used in previous studies of BN structural learning.According to the structure and conditional probability distribution of each BN, data sampling is conducted by means of the Full_BNT toolbox [31], and 1500 training samples of each network are collected randomly.Table 1 shows the training datasets of Asia network.
Hamming distance (H) defined by Equation ( 7) is used to measure the quality of learning structures [32].The smaller the H, the more accurate the network structure learned.In order to reduce the randomness of search algorithms, the structural learning is usually conducted multiple times.By referring to previous studies [17,18], we carry out each experiment ten times, and the average results are given in Table 2 and Figure 1.
where L represents the number of lost arcs in learning structures, E represents the number of excrescent arcs, and R represents the number of reverse arcs.From Table 2 and Figure 1, we can find that: (1) For the same network structure, the accuracy obtained by different algorithms varies greatly.The difference of H between the best algorithm and the worst algorithm is obvious (Asia: 3.2/10.4;Child: 5.8/14.7;Alarm: 17.6/68.2). ( 2) For the same SS algorithm, the accuracy of different networks also varies greatly, such as "BIC+GA" and "BDe+HC" Symmetry 2020, 12, 2054 6 of 17 marked in yellow.For example, "BIC+GA" has good performance in the structural learning of Child, while it performs badly for Asia, indicating that generalization capabilities of the above algorithms are weak.(3) For Asia, Child and Alarm, the best learning algorithms are "BDe+HC", "MDL+GA" and "BDe+GA", respectively.In other words, there exists no algorithm that can learn the optimal structure of all networks.In conclusion, different algorithms used for BN structural learning have different learning performances, and the performance of the same algorithm is not stable when it deals with different networks.From Table 2 and Figure 1, we can find that: (1) For the same network structure, the accuracy obtained by different algorithms varies greatly.The difference of  between the best algorithm and the worst algorithm is obvious (Asia: 3.2/10.4;Child: 5.8/14.7;Alarm: 17.6/68.2). ( 2) For the same SS It is not a good idea to choose the appropriate scoring function and search algorithm because there are all kinds of algorithms.Considering the visible differences of performance between various SS algorithms, we combine the Ensemble Learning with causal Information Flow theory to propose a new learning algorithm for BN structure (C-EL), aiming at improving the accuracy and stability of structural learning.The theoretical scheme and technical flow of the C-EL algorithm will be elaborated in detail in the next section.

Causality-Based Ensemble Learning Algorithm for BN Structure
In this section, we first make a brief introduction of Ensemble Learning and analyze its suitability in BN structural learning.Then, we put forward a new weighted integration rule based on the causal Information Flow theory.Finally, we elaborate on the technique process of our proposed algorithm based on the above parts.The three parts are logically combined internally and we will explain them specifically.

Ensemble Learning
Ensemble Learning (EL) is a kind of Machine Learning (ML) pattern.It uses multiple algorithms to learn and integrates all learning results with a certain rule, in order to improve accuracy and generalization of the learning system.The individual algorithm is called the base learner, and the rule of algorithm fusion is called the integration mechanism [33].
Research shows that the performance of the integrated learner is significantly higher than that of a single learner.The availability of EL can be concluded to two aspects [34]: On the one hand, it has been proved that the training of neural network or decision tree is the NP-hard problem, so heuristic search algorithms are usually adopted to solve the problem.Unfortunately, the optimal model is difficult to obtained in this way, but multi-model integration can be closer to the optimal result in theory.On the other hand, hypothesis spaces of ML algorithms are artificial and the actual objective in application scenarios may not be in the assumption space.EL can enrich the assumption space and express the objective that cannot be obtained by an individual ML algorithm, further significantly improving the generalization ability of learning system.In other words, the above advantages of EL can make up for the existing deficiencies of low accuracy and stability in SS algorithms.
Bagging and Boosting are the most basic methods in EL [33].BN structural learning, focusing on expressing causal relationships among variables, is different from the conventional ML and its time complexity is even higher.Therefore, we adopt the Bagging method with parallel computation ability to construct the EL mechanism of structures, which contributes to improving the learning efficiency.Bagging, the abbreviation of Bootstrap Aggregating proposed by Breiman [35], uses the Bootstrapping algorithm for EL.Bootstrapping is a sampling method with replacement and is used to generate different training datasets for each base learner.The Algorithm 1 flow of Bagging is presented as follows: Algorithm 1 EL with the Bagging method

Input:
Original data M; Base leaner L Output: Integrating model P

Step 1: Sampling
Training sets are extracted from M with Bootstrapping; A total of rounds areta A total of k rounds are taken, and n samples are extracted in each round.
Step 2: TrainingaEachEach time one training set is used to train a model, and k training sets can A total of rounds areta generate k models.
Step 3: Integration For classification: the final result is obtained by voting with the k models A total of rounds ar obtained in the previous step; A total of rounds arFor regression, the mean value of the above model is calculated as the final result.
The fusion rule of base learners in Step 3 is of vital importance and has a significant impact on EL effects.The common integration methods are voting or weighted voting [36].BN is a complex network expressing causal relationships, so a simple voting method may not be applicable.Aiming at the causality of BN, we introduce the causal Information Flow to conduct causal analysis of network structures and calculate weights for structure integration, which will be presented in the next subsection.

Causal Information Flow
Information flow (IF) is a real physical notion recently recognized by Professor Liang [20] to express causality between two variables in a quantitative way, where causality is measured by the information transfer rate from one variable series to another.IF can realize the formalization and quantification in causal analysis.
Following Liang, given two variable series X 1 and X 2 , the maximum likelihood estimator of the rate of the IF from X 2 to X 1 is: where C ij denotes the covariance between X i and X j , and C i,d j is determined as follows.Let .
X j be the finite-difference approximation of dX j /ds using the Euler forward scheme: .X j,n = X j,n+k − X j,n k∆s with k = 1 or k = 2 (for details about how to determine k, see [25]) and ∆s being the step length.C i,d j in Equation ( 8) is the covariance between X i and .

X j .
Symmetry 2020, 12, 2054 8 of 17 In order to quantify the relative importance of a detected causality, Liang [37] developed an approach to normalizing the IF: (10) where H * 1 represents the phase space expansion along the X 1 direction; H noise 1 represents the random effect.
The normalized IF calculated with Equation ( 10) can be zero or non-zero.Ideally, if τ 2→1 = 0, then X 2 does not cause X 1 ; otherwise, there is a causal link between X 1 and X 2 .Further, if τ 2→1 > 0, then X 2 makes X 1 unstable.On the contrary, τ 2→1 < 0 indicates that X 2 makes X 1 stable.In particular, when the significance level is 0.1, |τ 2→1 | > 1% indicates that the causal relationship is significant.

Global Causality Measure
Based on the causal IF, we define a criterion to evaluate the causality of networks.BN structure can be represented by an adjacency matrix X = x ij .A network with n nodes can be expressed as follows: where x ij = 1 represents that there is an arc between v i and v j , while x ij = 0 represents that there is no arc between v i and v j .
Definition: Construct a mathematical variable to measure the causality of a network based on X = x ij and τ ij .
The larger the value of CM, the more significant the causality of the network.CM is taken as the measure to evaluate the reliability of network.Then, the global causality measure CM is normalized to obtain the weight, and different network structures can be integrated with weights.

Causality-Based Ensemble Learning Algorithm
Based on the new causal IF-based integration mechanism, we propose the Ensemble Learning algorithm for BN structure (C-EL).Figure 2 shows the algorithm process.
Firstly, training sets {D 1 , D 2 , • • • , D n } are extracted from original data with the Bootstrapping algorithm.In terms of different training sets, different SS algorithms are adopted for structural learning and different BN structures {BN 1 , BN 2 , • • • , BN n } are obtained.Then, the causal IF and adjacency matrix are used to calculate the global causality measure of each network {CM 1 , CM 2 , • • • , CM n }, and they are converted to weights by normalization.Thirdly, multiple BN structures are integrated with weights, getting the transition matrix.Each element in the matrix represents the causal strength between the corresponding variables.The arc with weak causal strength is filtered according to the setting threshold value, and the direction of the arc is judged according to the symbol of IF.Finally, the integrated structural matrix is obtained.measure to evaluate the reliability of network.Then, the global causality measure  is normalized to obtain the weight, and different network structures can be integrated with weights.

Causality-based Ensemble Learning Algorithm
Based on the new causal IF-based integration mechanism, we propose the Ensemble Learning algorithm for BN structure (C-EL).Figure 2 shows the algorithm process.

Experiments and Analysis
In order to test the effectiveness of the C-EL algorithm, we use multiple standard BN datasets to conduct simulation experiments in this section, comparing the performance of our proposed algorithm with other SS algorithms.

Experimental Data
We still choose the three standard BNs (Asia, Child and Alarm) in Section 3 for experiments.The Full_BNT toolbox is used for data sampling on each network, obtaining 20000 samples as original data randomly.The six SS algorithms ("BIC+HC", "BIC+GA", "MDL+HC", "MDL+GA", "BDe+HC" and "Bde+GA") are taken as base learners in the EL mechanism.

Structual Learning with C-EL
For each network, the Bootstrapping algorithm is adopted to extract six different training sets from original data, and each training set has 1500 samples.The six SS algorithms are used to learn the structure of Asia, Child and Alarm, respectively; therefore, the adjacency matrix of each network is obtained accordingly.Figure 3 shows the adjacency matrixes of Asia based on different structural learning algorithms.Then causal IF between every two nodes in the network is calculated according to Equations ( 8)- (10).Table 3 shows the IF between every pair of nodes in Asia network.
Coupled with the result of causal IF in Table 3, we can analyze the causality with examples as follows: τ Smoking→Bronchitis = 0.0109, τ Bronchitis→Smoking = −0.0131,both pass the significance test and τ Smoking→Bronchitis > τ Bronchitis→Smoking .It can be concluded that there is a unidirectional causation between "Smoking" and "Bronchitis", that is, "Smoking→Bronchitis".For another example, τ Smoking→TB = 0.0002, τ TB→Smoking = −0.0087,both fail pass the significance test, so the causality between "Smoking" and "TB" is weak.Thus, causal IF is able to measure the strength of causality between every two BN nodes.Based on the adjacency matrix and causal IF, the global causality measure of networks learned by different SS algorithms can be calculated according to Equation (11), then they are normalized to get weights of different structures, as shown in Table 4.
For each network, the Bootstrapping algorithm is adopted to extract six different training sets from original data, and each training set has 1500 samples.The six SS algorithms are used to learn the structure of Asia, Child and Alarm, respectively; therefore, the adjacency matrix of each network is obtained accordingly.Figure 3 shows the adjacency matrixes of Asia based on different structural learning algorithms.Then causal IF between every two nodes in the network is calculated according to Equations ( 8)- (10).Table shows the IF between every pair of nodes in Asia network.

Analysis and Discussion of Results
(1) Comparison with Other Algorithms In order to verify the effectiveness of the C-EL algorithm, we compare it with the above six SS algorithms.The number of correct arcs (), lost arcs (), excrescent arcs (), reverse arcs () and  are used as evaluation criteria to measure the accuracy of learning structures.Given the randomness of the search algorithm, each experiment is done ten times, and the average results are shown in Table 5 and Figure 6.34.5) also have similar performance with C-EL.Evaluated by  index, C-EL performs better than the other six SS algorithms obviously, 39.71% lower than the second best algorithm averagely.Besides, with the network scale increasing, the advantage of C-EL is more and more notable.Even though there are a huge number of nodes and arcs in the network, such as Alarm, C-EL can still maintain high learning accuracy, demonstrating that the introduction of Ensemble Learning does improve the accuracy of BN structural learning.
Figure 6 displays the results of all evaluation indexes based on different algorithms.For three different BNs, C-EL has the highest  (Asia: 7.9; Child: 23.6; Alarm: 40.2) and smallest  (Asia: 0;

(1) Comparison with Other Algorithms
In order to verify the effectiveness of the C-EL algorithm, we compare it with the above six SS algorithms.The number of correct arcs (C), lost arcs (L), excrescent arcs (E), reverse arcs (R) and H are used as evaluation criteria to measure the accuracy of learning structures.Given the randomness of the search algorithm, each experiment is done ten times, and the average results are shown in Table 5 and Figure 6.In order to test the sensitivity of the proposed C-EL algorithm to sample sizes, we also acquire 500 and 1000 training samples randomly for structural learning.By comparing with the other six SS algorithms, the impact of sample sizes on the performance of C-EL is analyzed.Figure 7 shows the comparative results of three BNs with different learning algorithms.In order to test the sensitivity of the proposed C-EL algorithm to sample sizes, we also acquire 500 and 1000 training samples randomly for structural learning.By comparing with the other six SS algorithms, the impact of sample sizes on the performance of C-EL is analyzed.Figure 7 shows the comparative results of three BNs with different learning algorithms.As shown in Figure 7, with the sample size decreasing, the  of structures learned by different algorithms all increase, showing the accuracy of structural learning decreases.It is worth noting that As shown in Figure 7, with the sample size decreasing, the H of structures learned by different algorithms all increase, showing the accuracy of structural learning decreases.It is worth noting that

Conclusions
Aiming at dealing with the deficiencies of local optimum and low generalization in most SS algorithms, we introduce the Ensemble Learning (EL) and causal Information Flow (IF) theory to propose a new structural learning algorithm by integrating several algorithms.In the established EL mechanism, we first adopt a Bagging method to train base learners and obtain different learning structures.Then, causal IF is introduced to construct a new integration rule, which fully takes the causality of BN into consideration.The proposed C-EL algorithm can achieve the fusion of various structures.We applied it to the structural learning with a different network scale and the experiment results show that: (1) Compared with individual algorithms, the network structure learned by C-EL is more accurate.
With respect to the three BNs, the accuracy of the proposed algorithm has improved by 48.38% (Asia), 42.22% (Child) and 28.51% (Alarm), respectively, in terms of the optimal individual algorithm.(2) Compared with individual algorithms, C-EL has a far stronger generalization ability.As for existing SS algorithms, the accuracy of one algorithm varies greatly when dealing with different BNs.In other words, one algorithm has great performance for one BN but may not have as good performance for another BN.By contrast, the performance of C-EL is stable and it maintains high accuracy for different BNs.The introduction of EL provides a new idea of structural learning, achieving the fusion of various structural learning algorithms.The C-EL algorithm has higher accuracy and a stronger generalization ability compared with existing SS algorithms.More importantly, it has an ability to learn more accurate structures from a relatively small sample, which can promote the application of BN.However, we have not touched on the efficiency of the C-EL algorithm in this paper.We will continue to improve our proposed C-EL algorithm in further study.

Figure 1 .
Figure 1.Experimental results of three BNs with different SS algorithms.

Figure 1 .
Figure 1.Experimental results of three BNs with different SS algorithms.

Figure 2 .
Figure 2. Algorithm process of C-EL.Firstly, training sets { ,  , ⋯ ,  } are extracted from original data with the Bootstrapping algorithm.In terms of different training sets, different SS algorithms are adopted for structural learning and different BN structures {BN , BN , ⋯ , BN } are obtained.Then, the causal IF and adjacency matrix are used to calculate the global causality measure of each network { ,  , ⋯ ,  } , and they are converted to weights by normalization.Thirdly, multiple BN structures are integrated with weights, getting the transition matrix.Each element in the matrix represents the causal strength between the corresponding variables.The arc with weak causal

Figure 3 .
Figure 3. Adjacency matrixes of Asia based on different structural learning algorithms.

Figure 4 .
Figure 4. Causal transition matrix and adjacency matrix of Asia.Figure 4. Causal transition matrix and adjacency matrix of Asia.

Figure 4 .Figure 5 .
Figure 4. Causal transition matrix and adjacency matrix of Asia.Figure 4. Causal transition matrix and adjacency matrix of Asia.Symmetry 2020, 12, x FOR PEER REVIEW 11 of 17

Figure 5 .
Figure 5. Learning structure and actual structure of Asia.

Figure 6 .
Figure 6.Results of all evaluation indexes of three BNs with different algorithms.

Figure 6 .
Figure 6.Results of all evaluation indexes of three BNs with different algorithms.

Figure 7 .
Figure 7.  of three BNs with different algorithms and sample sizes.

Figure 7 .
Figure 7. H of three BNs with different algorithms and sample sizes.

Figure 8 .Figure 8 .
Figure 8. Change of Hamming distance of three BNs with threshold value.

( 3 )
Compared with individual algorithms, C-EL is less affected by sample size.It is capable of learning structures under the small training sample condition, especially for the structural learning of big-scale network.

Table 1 .
Training datasets of the Asia network.

Table 2 .
Mean H of three Bayesian Networks (BNs) with different Search-Score (SS) algorithms (the numbers in brackets represent the ranking).

Table 2 .
Mean  of three Bayesian Networks (BNs) with different Search-Score (SS) algorithms (the numbers in brackets represent the ranking).

Table 3 .
Causal Information Flow (IF) between every two nodes in Asia.Adjacency matrixes of Asia based on different structural learning algorithms.

Table 3 .
Causal Information Flow (IF) between every two nodes in Asia.

Table 4 .
Causality measure and weight of three BNs with different SS algorithms.

Table 5 .
and  of three BNs with different algorithms.

Table 5 .
C and H of three BNs with different algorithms.Alarm: 34.5) also have similar performance with C-EL.Evaluated by H index, C-EL performs better than the other six SS algorithms obviously, 39.71% lower than the second best algorithm averagely.Besides, with the network scale increasing, the advantage of C-EL is more and more notable.Even though there are a huge number of nodes and arcs in the network, such as Alarm, C-EL can still maintain high learning accuracy, demonstrating that the introduction of Ensemble Learning does improve the accuracy of BN structural learning.Figure6displays the results of all evaluation indexes based on different algorithms.For three different BNs, C-EL has the highest C (Asia: 7.9; Child: 23.6; Alarm: 40.2) and smallest L (Asia: 0; Child: 2.7; Alarm: 5.1), E (Asia: 1.4; Child: 3.1; Alarm: 5.4), R (Asia: 0.2; Child: 2.2; Alarm: 4.3) and H (Asia: 1.6; Child: 7.8; Alarm: 14.8), indicating that the three structures obtained by C-EL are the best.Therefore, it appears that C-EL has an excellent generalization capability.Its stable performance in structural learning with different BNs makes C-EL unique.