## 2. The Basis of Formal Concept Lattice

The basis of concept lattice will be denoted in this section [

25].

The formal context is shown by a triple K = (G, M, I) in FCA, in which G represents the set of objects, M represents the set of attributes, and I represents the binary relation between G and M. gIm denotes that the object g has the attribute m for an object $g\in G$ and an attribute $m\in M$.

G and

M are the object set and the attribute set in the formal context, respectively, and there are two reflections written as follows between

$A\subseteq G$ and

$B\subseteq M$:

In the formal context, if a binary group C = (A, B) satisfies f(A) = B and g(B) = A, then we call C = (A, B) a formal concept. Here, A is a subset of G, which contains objects known as the extent of the formal concept; meanwhile, B is a subset of M, which contains attributes known as the intent of the formal concept.

All formal concepts are denoted as the set CS(K) of K. We take two concepts, C_{1} = (A_{1}, B_{1}) and C_{2} = (A_{2}, B_{2}). If A_{1} ⊆ A_{2}, we say that C_{1} is a subconcept of C_{2} and C_{2} is a superconcept of C_{1}. This relation can be expressed as (A_{1}, B_{1}) ≦ (A_{2}, B_{2}). If there is no C_{3} = (A_{3}, B_{3}) which satisfies (A_{1}, B_{1}) < (A_{3}, B_{3}) < (A_{2}, B_{2}), we denote C_{1} = (A_{1}, B_{1}) as the child of C_{2} = (A_{2}, B_{2}) and C_{2} as the parent of C_{1}. Using this partial order relation, CS(K) can induce a concept lattice L(K), which is known as the concept lattice of K = (G, M, I).

A concept lattice can be visualized by a Hasse diagram, which draws a line segment or curve going upward from the child concept to the parent concept for each parent–child relationship.

Next, we will introduce some definitions and theories on the basis of incremental construction algorithm [

1,

4,

18,

19,

20].

Let M_{i} = {m_{1}, …, m_{i}} ⊂ M, I_{i} = I∩(G_{i} × M), M_{i+1} = M_{i}∪{m*}, I_{i+1} = I∩(G × M_{i+1}), where m* is a newly added attribute. Given a formal context, K_{i} = (G, M_{i}, I_{i}) and the corresponding concept lattice is L(K_{i}). After adding m*, the new concept lattice is L(K_{i+1}) and the corresponding formal context is K_{i+1} = (G, M_{i+1}, I_{i+1}).

**Definition** **1.** For the concept C = (A, B), if A = g(m*), then C is a modified concept. If C is a modified concept, the concept will be updated to be (A, B∪{m*}) in L(K_{i+1}).

**Definition** **2.** Let L_{1} and L_{2} be the concept lattice before and after inserting the new attribute m, respectively. The object set of m is denoted as m’ and (A, B) is a formal concept in L_{2}. Then,

- (1)
(A, B) is a new concept if A is not an extent of any concept in L_{1},

- (2)
(A, B) is a modified concept if A ⊆ m’ and A is an extent of one concept in L_{1},

- (3)
If (A, B) is unchanged from L_{1} to L_{2}, it is an old concept,

- (4)
Assuming that (X, Y) is a new concept and (A, B) is an old concept, if they satisfy A∩m’ = X ≠ A, the concept (A, B) is the generator of the concept (X, Y). Otherwise, it is a general old concept.

**Proposition** **1.** If (A_{1}, B_{1}) is the canonical generator of a new concept (A_{2}, B_{2}), and (A_{3}, B_{3}) is a non-canonical generator of (A_{2}, B_{2}), in the case that A_{1} ⊂ A_{3}, A ⊂ A_{3} but A ⊄ A_{1}, the concept (A, B) is neither a modified concept nor a canonical generator of any concept.

**Proposition** **2.** If (A_{3}, B_{3}) is an old concept and A_{3}∩g(m*) = A_{1}, and also in the condition of (A_{1}, B_{1}) ∈ L(K_{i+1}), which is a modified concept, and A ⊂ A_{3}, A ⊄ A_{1}, the concept (A, B) is neither a modified concept nor a canonical generator of any concept.

## 4. A New Rapid AddExtent Algorithm

First, we summarize the AddExtent algorithm as follows: to add an attribute m (its object set is Extent), new concepts and modified concepts will be searched in a recursive way starting from the greatest upper bound. The concept MaximalConcept will be found repeatedly in the recursive function named GetMaximalConcept, and the extent of this concept is Extent. If the Extent of MaximalConcept equals to the function AddExtent’s parameters named extent, the resulting concept will be seen as a modified concept which is the greatest upper bound of the new concept. On the contrary, if the result concept’s Extent does not equal to the AddExtent’s extent, it will be identified as a canonical generator and a new concept called NewConcept will be generated. Meanwhile, the NewConcept will be returned as the greatest upper bound among all those modified. By marking every child of MaximalConcept as the initial GeneratorConcept and regarding MaximalConcept.Extent ∩ Extent as the initial object set extent, the AddExtent will receive the new two parameters and carry on a new round of recursion to find modified concepts and new concepts recursively. Next, the relationship between NewConcept and GeneratorConcept and the relationship between NewConcept and its children will be established. According to Proposition 1 and Proposition 2, any concept whose extent is smaller than MaximalConcept’s is neither a canonical generator nor a modified concept. That is to say, modified concepts and the canonical concept of the NewConcept is a MaximalConcept, so a canonical concept or a modified concept will turn up by a recursion through the AddExtent algorithm. To find other canonical generators and modified concepts, the function AddExtent will be used recursively to every child of the MaximalConcept.

A new rapid AddExtent algorithm: FastAddExtent algorithm is proposed in this paper, having a higher efficiency by avoiding a large part of comparisons.

Then, the detail of the FastAddExtent algorithm will be expressed: at one recursion of AddExtent, the descendants of all children of a MaximalConcept may be of the same concept and one concept will possibly be compared several times. At the same time, using the recursive function GetMaximalConcept to find MaximalConcept results in similar issues. In order to reduce comparisons and the number of recursive calls, every concept will be added four data fields.

#### 4.1. The Overall Procedure

The FastAddExtent algorithm proposed in this paper, like the original AddIntent algorithm, uses a recursive way to construct a concept lattice. The FastAddExtent algorithm make refinements by adding four data fields to a concept: visited, NewConcept, doExtent, and MaximalConcept. Here, visited is a data field that stores an integer. The ID of a new attribute will be assigned to visited when the concept is being accessed. If visited of one concept is found equal to the id of a new attribute, we know this concept has been visited. The NewConcept field stores the returned new concept during the process of adding one attribute. If one concept is visited, the NewConcept field of the concept will be assigned to the candidate directly. Therefore, unnecessary recursive calls and comparisons can be avoided. The doExtent field stores the set of extent passed to the FastAddExtent procedure, and the canonical generator or the modified concept will be stored in the MaximalConcept of the GeneratorConcept. Obviously, the added fields doExtent and MaximalConcept make the parameter GeneratorConcept closer to the ClosureConcept so that we can decrease the time of search. Those refinements mentioned above are also the differences between FastAddExtent and FastAddIntent.

The Algorithm 1 will be described in which the lines with the mark {*} is are newly added compared to the AddExtent algorithm, while the lines with the mark {#} are modified.

**Algorithm** **1:** Procedure FastAddExtent(extent, generatorConcept, L, n) {#} |

1: tempConcept = generatorConcept {*} |

2: generatorConcept = GetClosureConcept(extent, generatorConcept, L, n) |

3: tempConcept.doExtent = extent {*} |

4: tempConcept.MaximalConcept = generatorConcept {*} |

5: **if** generatorConcept.Extent == extent **then** |

6: **return** generatorConcept |

7: **end if** |

8: GeneratorChildren = generatorConcept.Children |

9: newChildren = ∅ |

10: **for each** candidate **in** GeneratorChildren |

11: meet = candidate.Extent ∩ extent |

12: **if** meet != candidate.Extent **then** |

13: **if** candidate.visited == n **then** {*} |

14: candidate = candidate.NewConcept {*} |

15: **else** |

16: **if** meet ∩ candidate.doExtent == meet **then** {*} |

17: candidate = candidate.MaximalConcept {*} |

18: **end if** |

19: NC = FastAddExtent(meet, candidate, L, n) {#} |

20: candidate.NewConcept = NC {*} |

21: candidate.visited = n {*} |

22: candidate = NC {*} |

23: **end if** |

24: **end if** |

25: addChild = true |

26: **for each** Child **in** NewChildren |

27: **if** Candidate.Extent ⊆ Child.Extent **then** |

28: addChild = false |

29: **exit for** |

30: **else if** Child.Extent ⊆ Candidate.Extent **then** |

31: **remove** Child **from** NewChildren |

32: **end if** |

33: **end for** |

34: **if** addChild **then** |

35: **add** Candidate **to** NewChildren |

36: **end if** |

37: **end for** |

38: newConcept = (extent, generatorConcept.Intent) |

39: L = L∪{newConcept} |

40: **for each** Child **in** NewChildren |

41: removeLink(Child, generatorConcept, L) |

42: SetLink(Child, newConcept, L) |

43: **end for** |

44: SetLink(newConcept, generatorConcept, L) |

45: generatorConcept.NewConcept = newConcept {*} |

46: **return** newConcept |

The following paragraphs primarily explain the differences between the FastAddExtent algorithm and the AddExtent algorithm. The unchanged part of AddExtent and function can be referred to in [

26], and this paper does not make a statement.

All the concepts which are accessed include new concepts, modified concepts, canonical generators, non-canonical generators in the whole process of running the algorithm. Since this algorithm is recursive, a new concept maybe be found as a modified concept in some recursive calls. We can see that an entry n is added to the parameters of the FastAddExtent algorithm, which is equal to the ID of the added attribute. This value n will be assigned to each accessed concept, indicating that the concept is last accessed by attribute n. In line 45, the last new concept called newConcept is given to the NewConcept field of the canonical generator GeneratorConcept that calls the earliest recursion in the process of adding an attribute. From line 19 to line 22, the NC is a newConcept returned by recursive calling FastAddExtent through the parameter candidate, and assigns NC to the candidate’s NewConcept field, and then changes the visited field of the candidate to the current attribute id, and finally sees the NC as a new candidate. All the operations of lines 19 to 22 serve for line 13 and line 14, and when the value of visited of a candidate equals to the id of the new attribute, we know the concept has been visited. Then the value of candidate.NewConcept is assigned directly to the candidate, eliminating a lot of unnecessary recursive calls and comparisons.

Here we use a case to explain the above added lines in the favor of reducing the running time.

Table 1 shows the formal context before adding the attribute

e, while

Table 2 shows the formal context after adding the attribute

e. Correspondingly,

Figure 1 depicts the concept lattice based on

Table 1 and

Figure 2 depicts the concept lattice based on

Table 2.

All the concepts showed in

Figure 3 have been marked to view easily and the labels are as follows:

c_{0} ({1, 2, 3, 4, 5}, ∅)

c_{1} ({1, 2, 3, 5}, {c})

c_{2} ({1, 2, 5}, {a, b, c})

c_{3} ({1, 3, 5}, {c, d})

c_{4} ({1, 5}, {a, b, c, d})

c_{5} ({1, 2, 3}, {c, e})

c_{6} ({1, 2}, {a, b, c, e})

c_{7} ({1, 3}, {c, d, e})

c_{8} ({1}, {a, b, c, d, e})

In the process of adding attribute e whose object set is {1, 2, 3}, c_{1} is the canonical generator of the new concept c_{5}. Visibly c_{1} has two candidates, and then the extent of two candidates need to do the intersection with {1, 2, 3}, respectively. At the same time, the results concluded from previous calls are that c_{2}.NewConcept = c_{6}, c_{3}.NewConcept = c_{7}, and the values of visited of c_{2} and c_{3} are both 5. Meanwhile, we suppose that c_{6} is built earlier than c_{7}. In the process, c_{4} is a candidate of c_{2}, and c_{8} is generated by c_{4} which is seen as the canonical generator, where c_{4}.NewConcept = c_{8}, c_{4}.visited = 5. When creating c_{7}, the candidate of c_{7} is c_{4}. Because c_{4}.visited = 5, c_{4} has been visited. Then, c_{4}.NewConcept will be assigned directly to the candidate of c_{7}. Eliminating a recursive call and many following comparisons greatly reduces the running time.