2.1. Problem of BN Structure Learning
A BN is a graphical tool to represent the n-dimensional probability distribution. It can be described by a directed acyclic graph (DAG) G ≤ X, A, Θ>. In G, each node XiX represents a random variable of interest, while each arc aijA represents a direct dependence relationship between the variables Xi and Xj. In addition, the parameter θi = P(Xi|πi), where Θ = {θi}, denotes the conditional probability distribution of Xi given its parent set πi. From the conditional distributions, the joint probability can be uniquely determined by
Given a training database D = {x1, x2, …, xm} composed of m cases and each case contains n variables, where xi is an instance of domain variable X, the problem of the BN structure learning is to find the BN topology structure that best matches the dataset D.
As noted previously, algorithms for learning the BN structure from data mainly include the constraint-based methods and the score+search methods. Based on a dependency analysis, the existing approaches are close to the semantics of BNs and relatively simpler to implement [
8]. However, it is hard to ensure the precision of the obtained structure. Furthermore, the computation for high-order tests is complex and unreliable. For this reason, most of the developed structure learning algorithms fall into the latter, namely the score+search methods [
3,
5], which treat the problem of BN structure learning as a combinatorial optimization problem.
The BN structure learning based on score+search methods firstly uses a scoring metric to evaluate how well the candidate BN structure matches the given dataset, and then finds the network structure with the maximum score. Popular scoring metrics include Akaike’s information criterion (AIC), the Bayesian information criterion (BIC), the minimum description length (MDL) score, and the Bayesian Dirichlet equivalence (BDe) metric (usually called the K2 metric) [
3]. Here, the BIC scoring metric, which comes from the penalized maximum likelihood, is used as the structure to identify the dataset matching degree as follows:
where,
BS is the candidate BN structure;
ri is the number of possible values for the variable Xi;
qi is the number of possible configurations instantiations for its parents πi;
Nijk is the number of cases in D in which variable Xi has its k-th value, πi is instantiated to its j-th value, and ;
is the dimension (the number of parameters needed to specify the model) of the BN; and f(m) is the non-negative penalization function that depends on the size of the dataset and can be computed as f(m) = 0.5·log m.
Using
f(
BS,
D) instead of
P(
BS|
D), the BIC scoring metric [
3] is defined as:
where
One desirable and important feature of scoring metrics is their decomposability in the presence of full data, and (3) shows that the BIC metric used here is decomposable. With the decomposable metric, a local search procedure that changes one arc at each move can efficiently evaluate the improvement obtained by this change [
2,
3], because it can reuse most of the computations made in previous stages. Moreover, the score of a BN can be computed as the combination of scores obtained for smaller factors.
2.2. ACO
As a representative bio-inspired meta-heuristic algorithm, ACO was firstly put forward by Dorigo in the 1990s [
19] to solve the travel salesman problem (TSP). Till now, ACO has been proven to be a more common framework for various optimization problems in a wide range of fields [
23,
24,
25], such as job-shop scheduling, data mining, routing problems, and other complex optimization problems. When observing the foraging behavior of real ant colonies, researchers discovered that real ants can deposit a chemical substance, called a pheromone, while walking. The pheromone can be accumulative and evaporative, through which the ant colony can carry on indirect communication and finally achieve the cooperative goal. Ants can smell the pheromone and choose their way, in a probabilistic way, based on the amount of pheromone. The larger the amount of pheromone deposited on a route, the greater the probability that ants select the route. Meanwhile, on the shorter route that ants travel, the pheromone accumulates faster than on the longer routes. Thus, the faster the amount of pheromone increases on the shorter route, then the greater the probability that the ants travel this route. In the initial stage when the pheromone is absent, ants choose their routes fully randomly, but after a transitory period the shortest routes will be more and more frequently visited and pheromone will accumulate faster and faster on them, which in turn will attract more and more ants to choose these routes.
The mathematical model of ACO is described as follows. Let
Mant be the number of ants, and the matrix
τ(
t) = {
τij(
t)} be the pheromone, of which the element
τij(
t) is the level of pheromone deposited on the arc from node
i to node
j, at time
t. The initial level of pheromone on each directed arc is a constant value, i.e.,
τij(0) =
τ0. Each ant builds a possible solution to the problem by moving through a finite sequence of neighbor nodes, and these moves are directed by the ant’s internal state, problem-specific local information, and the shared information about the pheromone [
19]. For the
k-th ant located at the
i-th node, it will move to the next
j-th node with the transition probability:
where
ηij represents the heuristic information about the problem,
allowedk denotes the feasible domain of the
k-th ant at the
i-th node, and
α and
β are parameters that determine the relative importance of the pheromone with respect to the heuristic information.
In addition, in order to achieve a trade-off between exploitation and exploration [
2], a different transition rule is introduced and the next node
is selected as:
where
q is a random number uniformly distributed in
;
q0[0,1] is the parameter that determines the relative importance of exploitation versus exploration; and
J is a node randomly selected according to the transition probability in (1) with
α = 1.
As the ants move and build the possible solutions, the pheromone matrix is updated according to both the global updating and local updating processes [
20]. As to the local updating process when building the solution, if an ant moves from node
i to node
j, then the pheromone level on the corresponding arc
ij is updated as follows:
where
τ0 is the initial pheromone level on all arcs, and
ψ(0,1] denotes the parameter that can control the pheromone evaporation. After all ants have constructed a solution, only the ant that obtains the best solution can reinforce the pheromone level on the arcs, which constitute the best solution,
S+, obtained by the ant colony so far. The global updating rule can be expressed by
where
ρ(0,1] is the parameter that can control the pheromone evaporation, and
f(
S+) is the cost associated with the best solution
S+. The following Algorithm 1 shows the complete algorithm of ACO applied to optimization problems [
19].
Algorithm 1: ACO algorithm. |
| /*Initialization */ |
1 | Set the iteration counter g = 0; |
2 | Generate Mant ants, and initialize the pheromone matrix; |
| /* Iterative search */ |
3 | while termination criteria are not satisfied do |
4 | iteration counter g = g + 1 |
5 | for i = 1: Mant do |
| /* Build a possible solution */ |
6 | while the solution is not completed do |
7 | Randomly select a state/node according to the probabilistic transition rule; |
8 | Update the pheromone according to the local updating rule; |
9 | end while |
10 | end for |
| /* Pheromone updating */ |
11 | Select the best solution and perform the global updating process; |
12 | end while |
13 | Return the best solution S+. |
2.3. ACO Applied to BNs
Using the basic ACO algorithm, the best network can be found in the space of possible networks based on the score+search framework, [
1,
2]. Beginning with a blank network, the ant colony progressively searches for good single-step changes to build a complete BN. Each ant connects randomly two variables and determines whether the arc should be included in the BN structure. As the construction process that is illustrated in
Figure 1 [
2,
14,
15], the ant uses the incremental construction of the solution starting from a blank network
G0 through connecting an arc
aij = {
Xi→Xj} and then adding it to the current network, i.e.,
Gh+1 =
Gh∪
aij. When no arc can be added to achieve a higher score of the BN structure, the construction process of the ant will be stopped with obtaining the final solution
Gg. The pheromone placed on all candidate arcs together with the heuristic information are used to guide the network construction process. The random rule that the ant
k selects the arc
aij from the current optional arcs is
where
Aij are the arcs randomly selected according to the following probabilities:
where
allowedk is the set composed of all candidate arcs that do not create a directed cycle and have the positive heuristic information.
q0 is the threshold value that is set by the user.
The maximum objective function of ACO is the BIC scoring metric in (3). Thus, the heuristic information ηij of the arc aij at time t is defined as
The pheromone level
τij on the arc
aij changes according to the local and global updating rules as described in (7) and (8), while the increment is competed by
where
G+ is the best BN structure found by all ants so far. The basic ACO algorithm applied to learning the BN structure is presented in Algorithm 2 [
2].
Algorithm 2: Basic ACO based BN learning. |
| /* Initialization */ |
1 | Set the iteration counter t = 0; |
2 | Generate Mant ants; |
3 | Initialize the pheromone matrix τ(0): for all arcs aij, set τij(0) = τ0; |
4 | Set G+ be an empty graph; |
| /* Iterative search */ |
5 | while termination criteria are not satisfied do |
6 | iteration counter t = t + 1 |
7 | for k = 1:Mant do |
8 | Generate an empty network Gk: for i = 1 to n, do πi = ϕ; |
9 | Calculate the heuristic information: for i, j = 1 to n (I ≠ j) do ηij = fBIC(Xi, Xj) − fBIC(Xi, ϕ); |
10 | while ηij > 0 do |
| /* Add an arc */ |
11 | Select an arc aij from the feasible domain allowed according to (9) and (10); |
12 | if ηij > 0 then πi = πi∪{Xj} and construct the network Gk = Gk∪aji; |
13 | Set ηij = −∞; |
| /* Avoiding directed cycles */ |
14 | for u, v = 1 to n do |
15 | if Gk∪auv includes a directed cycle, then ηuv = −∞; |
16 | end for |
| /* Calculation the heuristic information */ |
17 | for u = 1 to n do |
18 | if ηiu > −∞ then ηiu = fBIC(Xi, πi∪{Xu}) − fBIC(Xi, πi); |
19 | end for |
| /* Local updating */ |
20 | Update the pheromone: τij = (1 − ψ)·τij + ψ·τ0; |
| /* Local updating */ |
21 | Update the pheromone: τij = (1 − ψ)·τij + ψ·τ0; |
22 | end while |
23 | end for |
24 | /* Pheromone update */ |
26 | Select Gt = arg max fBIC(Gk, D) |
27 | if fBIC(Gt, D) ≥ fBIC(G+, D), then G+ = Gt |
28 | Update the pheromone matrix according to (8) and (12) using fBIC(G+, D) |
29 | end while |
30 | Return the best BN structure G+. |