Conceptual Coverage Driven by Essential Concepts: A Formal Concept Analysis Approach

: Formal concept analysis (FCA) is a mathematical theory that is typically used as a knowledge representation method. The approach starts with an input binary relation specifying a set of objects and attributes, ﬁnds the natural groupings (formal concepts) described in the data, and then organizes the concepts in a partial order structure or concept (Galois) lattice. Unfortunately, the total number of concepts in this structure tends to grow exponentially as the size of the data increases. Therefore, there are numerous approaches for selecting a subset of concepts to provide full or partial coverage. In this paper, we rely on the battery of mathematical models offered by FCA to introduce a new greedy algorithm, called C ONCISE , to compute minimal and meaningful subsets of concepts. Thanks to its theoretical properties, the C ONCISE algorithm is shown to avoid the sluggishness of its competitors while offering the ability to mine both partial and full conceptual coverage of formal contexts. Furthermore, experiments on massive datasets also underscore the preservation of the quality of the mined formal concepts through interestingness measures agreed upon by the community.


Introduction
With the rapid development of 5G, the Internet of Things (IoT), and artificial intelligence (AI) in recent years, increasing numbers of large datasets are becoming available in a wide variety of communities. In this respect, identifying the cohesive structures in these various datasets facilitates the discovery of valuable hidden patterns. Formal concept analysis (FCA) provides a robust mathematical foundation based on lattice theory for identifying the cohesive structures of a social network [1,2]. In social network analysis, we can model the topological information as a bipartite graph; i.e., a graph with two types of vertices and whose links are only between vertices of different types. Then, by identifying the formal concepts (also known as biclusters [3] or bicliques [4]) from bipartite graphs, the network structure is transformed into a concept form, which indicates the relationships hidden in the data. However, it has often been observed that the overwhelming number of formal concepts is an actual burden for valuable data analysis. Indeed, it is of utmost importance to determine a representative subset from this vast set of formal concepts that can be extracted from even modestly sized formal contexts [5][6][7][8]. Hence, the primary problem in FCA is to find a minimal contextual structure that is concise and maintains thestructural consistency.
For a sizable formal context, the number of formal concepts in a concept (Galois) lattice can be vast, and such complexity is often managed by selecting the most interesting concepts according to a particular metric. This issue was the focus of a myriad of works. Of note, Mouakher and Ben Yahia recently proposed the pioneering QUALITYCOVER algorithm [7]. The algorithm was mainly guided by the quality of the extracted association 1.
Extraction of full and partial conceptual coverage: Our approach finds a minimal subset of formal concepts that fully or partially cover the relations in the formal context. Partial coverage has received considerably less attention than it deserves. The main conclusion drawn is that partial coverage may be an interesting issue to eliminate the "noise" or outliers from an obtained coverage.

2.
Scalability: Thanks to the introduced theoretical properties, our approach is shown to be very scalable since it is able to process very large datasets with a reasonable running time.

3.
Quality of the drawn knowledge: Alongside the compactness feature, the new approach highlights worthy statistics for the interestingness measures of stability, separation, and object uniformity.
The remainder of this paper is organized as follows. First, Section 2 provides background on formal concept analysis, and Section 3 gives a brief overview of related work. Then, in Section 4, we thoroughly describe and illustrate our greedy algorithm, called Concise, for the extraction of optimal full and partial coverage of a formal context. Next, in Section 5, we detail the theoretical complexity of our algorithm. Finally, Section 6 presents the empirical study, and the conclusion and future work are given in Section 7.

Basic Settings
Formal concept analysis (FCA) and its mathematical foundations [9] have been used as a theoretical basis for various tasks (e.g., [10][11][12]). In our context, we introduce a new approach for extracting full and partial conceptual coverage based on FCA. Let us recall its basic notions. Definition 1 (FORMAL CONTEXT). A formal context K = (O, I, R) consists of two sets O and I and a binary (incidence) relation R between O and I . The elements of O are called the objects and the elements of I are called the attributes of the context. In order to express that an object o is in a relation R with an attribute i, we write o R i or (o, i) ∈ R and read this as "the object o has the attribute i". Example 1. In the remainder, we consider the formal context K depicted by Table 1. A small context can be easily represented by a cross table; i.e., by a rectangular table, the rows of which are headed by the object names and the columns headed by the attribute names. A cross (×) in row o and column i means that the object o has the attribute i. Table 1 illustrates the relationship between a set of patients O = {1, 2, 3, 4, 5, 6, 7, 8, 9} and a set of symptoms I = {a, b, c, d, e, f , g, h}.
Definition 2 (BIPARTITE GRAPH). A graph G = (N , E ) is a bipartite graph [13] if there is a bipartition {U ∪ V } of N such that all edges from E intersect with all elements of the partition; i.e., ∀e ∈ E : e ∩ U = ∅ ∧ e ∩ V = ∅ It may be noticed that formal contexts are closely related to bipartite graphs, where both objects and attributes are nodes in the graph and edges connect each object with its attributes. This link enables us to employ the whole tool-set of FCA to bipartite graphs and vice versa.  Figure 1 illustrates the formal context depicted in Table 1.  Table 1.
An interesting link between the power sets P (I ) and P (O) associated with the set of items I and the set of objects O is defined as follows: Definition 3 (GALOIS OPERATORS). For a set A ⊆ O of objects we define: (the set of attributes common to the objects in A). Correspondingly, for a set B of attributes, we define (the set of objects which have all attributes in B). The operators are known as concept-forming (also known as derivator) operators [9,14].
Definition 4 (SUPPORT OF A PATTERN). Let K = (O, I, R) be a formal context and P be a non-empty pattern. The conjunctive support of a pattern P [15], denoted by Supp(P), is equal to the number of objects/items containing all items/objects of P. Less formally, we can say that a formal concept is a set of objects together with the attributes these objects have in common under the restriction that we cannot add an additional attribute without removing an object and we cannot add an additional object without removing an attribute. Definition 6 (PSEUDOCONCEPT). The pseudoconcept [16] associated with the element (a, b), denoted as PC(a, b), is a binary relation computed by obtaining the Cartesian product of the maximal set of attributes fulfilling the object a and the maximal set of objects having attribute b. Formally, Plainly, PC(a, b) is the union of all the formal concepts containing element (a, b). We also define the size of a given pseudoconcept PC(a, b) as follows: Example 4. With respect to the formal context shown by Table 1, the pseudoconcept associated with element (3, h) is computed as follows: (3, e), (3, f ), (3, g), (3, h), (4, d), (4, e), (4, f ), (4, g), (4, h), (5, e), (5, f ), (5, g), (5, h), (9, d), (9, e), (9, f ), (9, g), (9, h)} The corresponding size of PC(3, h) is computed as follows: Definition 7 (OBJECT/ATTRIBUTE CONCEPT). Let K = (O, I, R) be a formal context with associated concept (Galois) lattice B(O, I, R). An object concept and attribute concept were introduced in [9]. Hence, the following mappings were defined: where γ relates O to B(O, I, R) and associates each object o with an object concept A, B where B is the set of all attributes of o and A is the set of all objects having all the attributes of B.
Analogously, µ relates I to B(O, I, R) by associating each attribute set i with an attribute concept A, B where A is the set of all objects of i and B is the set of all attributes valid for all objects of A.
. Given a formal context K = (O, I, R) and a threshold δ, a conceptual coverage [17] is defined as a set of formal concepts C K = {C 1 , C 2 , . . . , C n } in the concept (Galois) lattice B(O, I, R) [9,14].
The conceptual coverage C K is said to be full (δ = 1) if any element (x, y) in the context K is included in at least one concept of C K . However, the conceptual coverage is said to be partial (δ < 1) whenever the number of elements (x, y) in C K covers δ percent of the formal context K. Example 6. If we consider the formal context depicted by Table 1 and δ = 1, C K ={ 169, bg , 349, defgh , 29, acg , 245789, ag , 345679, fg , 3459, efgh , 34589, eg } is one full coverage since every element is covered by at least one formal concept.
In the following, we present the most relevant works that address the extraction of the full and partial conceptual coverage of a formal context.

Related Work
In the literature, extracting the minimal coverage of formal concepts (i.e., the set covering problem) is not entirely new and has been the subject of several previous works. Some of these approaches focused on covering the entire formal context and are called full-coverage approaches, whereas others called partial-coverage approaches were interested in covering only a subset of the formal context.

Full-Coverage Approaches
Kcherif et al. [18] introduced a rectangular decomposition approach based on Riguet's difunctional relation. Indeed, computing this difunctional was reduced to detecting a particular set of elements called isolated points, allowing the determination of the minimal conceptual coverage of a given binary relation. Later, an extended isolated points-based approach was applied on textual data in [19]. The authors proposed an algorithm called MINGENCOVERAGE for covering a formal context (as a formal representation of a text) based on isolated labels. The algorithm studied the connections between minimal generators and isolated points, which reduced the search space and improved its performance. Mouakher and Ben Yahia [7] introduced a new approach based on a greedy algorithm called QUALITYCOVER to build a full conceptual coverage. The authors defined a new gain function based on correlation metrics for high-quality coverage. The major drawback of this approach is scalability. Another related work is [20] which investigated the same problem using bipartite graphs. The authors proposed a new algorithm called FAST-COVER which provided a concise conceptual coverage using the graph structure. Later, Elloumi et al. relied on the notion of N-composite isolated points to produce the conceptual coverage progressively and proposed a new approach for conceptual coverage construction based on N-composite isolated points [21]. Belohlavek and Vychodil [22] tackled the same issue by attempting to solve the Boolean factor analysis problem. The authors proposed a greedy approximation algorithm, called GRECOND, aiming to find approximately optimal decompositions of binary matrices. In the same trend, Belohlavek and Trnecka [23], via the GREESS algorithm, focused on the same issue. Thus, they proposed an approach for decom-posing a binary matrix into a Boolean product of factors. Recently, Tatiana and Martin [24] proposed an MDL-based from-below factorization algorithm called MDLGRECOND. The algorithm uses the minimum description length (MDL) principle as a criterion for factor selection and produces a small subset of formal concepts with a low information loss rate.

Partial-Coverage Approaches
The partial coverage approaches have received less attention from the FCA community. To the best of our knowledge, the GREESS algorithm [23], mentioned in the previous subsection, is one of the most well-known approaches allowing generating partial coverage. However, this problem is usually assimilated to a δ approximation role mining problem, and it is also proven to be NP-complete [25]. In this case, users and permissions correspond to FCA objects and FCA attributes, respectively, and measuring δ for a selected subset of concepts uses the coverage ratio c to evaluate role mining algorithms [25]. Some studies have been proposed by the role mining community [25,26]. In [27], the authors addressed the same issue and presented a novel bottom-up approach called the δ-Approx Important Role Mining approach in which the permissions were classified based on the number of users assigned to. It has been shown that this approach is effective in decreasing the number of roles. Torim et al. [28] proposed three heuristic algorithms using concept chains instead of formal concepts for partial context coverage. Their approach was mainly based on the selection of a subset of the most interesting concepts. This study was extended in [8]. The authors proposed a novel concept chain coverage method to service the use data of a telecommunications company. The idea behind concept chain coverage is to cover the data not with single concepts but with chains of related concepts. Recently, Kristo et al. [29] introduced a greedy algorithm for generating efficient partial coverage. The latter algorithm is a revised version of GRECOND [22], and the choice of the selected concept is based on minimizing the cumulative coverage.
In this paper, we revisit the QUALITYCOVER algorithm [7] and propose an efficient implementation for full and partial conceptual coverage called CONCISE.

The CONCISE Algorithm: A Conceptual Coverage Driven by Essential Concepts
In this section, we present the description of the CONCISE algorithm. First, we explain the importance of the essential concepts. Then, we detail the use of these fundamental elements in the pseudocode of the algorithm.

Essential Formal Concepts
Essential concepts, also called mandatory concepts (MCs), play a crucial role in data mining as they allow the discovery of regular structures from data based on formal concept analysis (FCA). They qualify as essential because they belong to any conceptual coverage of a formal context [22]. From the relational algebra (RA) perspective, an essential concept contains at least one isolated point, as introduced by Riguet [30]. As a mathematical background, FCA and RA have already been combined and used to discover regularities in data [18]. A formal concept represents the regular atomic structure for decomposing a binary relation. Moreover, the computing of Riguet's difunctional relation [30] results in a set of isolated points describing invariant structures that could be used for database decomposition and textual feature selection (TFS) [19]. Furthermore, an isolated point belongs to a unique formal concept that exists in any conceptual coverage. Therefore, any FCA-based knowledge discovery process necessarily considers such concepts. Several approaches have been proposed to locate the essential concepts in a formal context to build conceptual coverage. This paper presents alternatives for conceptual coverage construction, and we discuss their main characteristics and features. Nevertheless, finding the most efficient strategy remains a challenging perspective.
Definition 9 (ISOLATED POINT). Let us consider a formal context K = (O, I, R). An element (o, i) ∈ R is said to be an isolated point if it belongs to only one formal concept.

Definition 10 (ESSENTIAL CONCEPT).
A formal concept is called essential if it contains at least one isolated point.

Theorem 1.
A formal concept C = A, B is essential if it is both an object concept and an attribute concept.
Proof. ⇒. Let A 1 , B 1 be a formal concept that introduces the objects in a nonempty set O and the attributes in a nonempty set I. Let (o, i) ∈ O × I. By definition, {o} = B 1 and {i} = A 1 . Hence, for any formal concept A 2 , B 2 such that o ∈ A 2 and i ∈ B 2 , we have A 2 ⊆ A 1 and B 2 ⊆ B 1 . As A 1 , B 1 is a formal concept and thus maximal, ⇐. Let A 1 , B 1 be an essential concept and (o, i) be an isolated point that only belongs to A 1 , B 1 . We find that {o} = B 1 and {i} = A 1 . This means that A 1 , B 1 , by definition, introduces both o and i and is thus an object concept and attribute concept. Table 2, we find the following: 56, ae , 24, bc , and 134, ad introduce both an object and an attribute and are essential concepts.

is a minimal generator [31] of A and i is a minimal generator of B.
Proof. The proof is straightforward since, by definition, a minimal generator is the smallest element for which the closure computation leads to the closed element. Thus, since o and i are minimal generators, which is equivalent to being an object concept and attribute concept, respectively, (o, i) is an isolated point.
The following theorem introduces the formal characterization of an isolated point. Proof. The proof shows that for an essential formal concept X, Y , an element (o, i) exists such that |{o} | = |{i} |. Since o ∈ X, then we have X ⊆ |{o} |. Moreover, we find |{o} | = |{i} |, which means that the object exactly generates the extent part X; that is, |{o} | = X. In addition, this also means that i ∈ Y is the only item that appears exactly in the same objects as X. In consequence, i is also a minimal generator of Y.

Corollary 2.
Let us consider a formal concept C = X, Y . If |X| = 1, then X, Y is an essential formal concept.
Proof. If the extent part is reduced to a singleton, this single object is the object concept of C. Then, the proof that ∃ i ∈ Y, such that it is an attribute concept of C-i.e. (X, i) is an isolated point-remains true. Since the cardinality of the extent part of C is equal to 1, it means that this object, say o, fulfills this property, {o} = Y. Example 8. According to the formal context given in Table 1, the list of essential concepts can be easily checked: { 169, bg ; 29, acg ; 349, de f gh }. If we consider the formal context given by Table 2, all of its formal concepts are essential.

Remark 1.
Let us consider the particular formal context given by Table 3. As this table shows, no essential formal concepts can be mined.
In the following, we provide a formal characterization of the type of formal context, namely the "worst case", and prove that we cannot mine essential formal concepts from this type of formal context. A "worst case" formal context is defined as follows: Definition 11. A "worst case" context is a triplet K = (O, I, R) where I is a finite set of items of size n, O represents a finite set of objects of size (n + 1), and R is a binary (incidence) relation (i.e., R ⊆ O × I). In such a context, each item belongs to n distinct objects. Each object, among the first n objects, contains (n − 1) distinct items, and the last object is fulfilled by all items.
Thus, in a "worst case" context, each object concept/attribute concept is equal to its unique minimal generator. Hence, from a "worst case" context of a dimension equal to n×(n+1), 2 n formal concepts can be extracted. Even if the worst case is rarely encountered in practice, "worst case" datasets have been shown to allow the behavior of an algorithm to be scrutinized on extremely sparse concepts and hence to assess its scalability [15]. Table 4 presents an example of a "worst case" dataset for n = 4. Table 4. A "worst case" context for n = 4.
No essential concepts can be extracted from a worst case formal context.
Proof. Let us consider a worst case formal context K = (O, I, R). By constructing a worst case dataset, and with regard to Theorem 2, we have the following assumptions: ∀i ∈ I, |i| = n ∀(o, i), we find always that |{o} | = |{i} |. Thus, no essential concepts can be drawn from a worst case dataset.

Description of the CONCISE Algorithm
In the following, we present the description and the pseudocode of the CONCISE algorithm. According to the pseudocode described by Algorithm 1, we start by computing the basic information from the ground set items of the given formal context. Then, this process computes the corresponding formal concept for each item. The different steps followed to obtain minimal conceptual coverage are detailed in the remainder of this section.
/* remove all covered elements from the concept BestFC */ forall (x, y) ∈ BestFC do K ← K − (x, y); end end end end end return C K The CONCISE algorithm proceeds according to the following steps: Step 1: Detect the essential concepts After closing the items through the COMPUTE_INTRODUCTORY_CLOSURE procedure, the efficient detection of the set of essential formal concepts (if they exist) is conducted by the COMPUTE_ESSENTIAL_CONCEPTS function. The corresponding pseudocode is provided by Algorithm 2. The algorithm iterates over the seed set of attributes I. In (Lines 4-8) and with regard to Corollary 2, if the cardinality of the extent part is equal to 1, then its induced formal concept is considered an essential concept, and we remove all the covered elements from the formal context. Otherwise, we iterate over the extent part, seeking an object whose support is equal to the cardinality of the intent part (c.f. . Finally, we run the second and third steps if the essential concepts do not reach the threshold δ covering the formal context.

Step 2: Compute the size of noncovered elements
For each noncovered element (o, i), we proceed by obtaining its corresponding pseudoconcept through the GET_PSEUDOCONCEPT function and assessing its size by calling the COMPUTE_SIZE function (c.f. Lines 10-11). We provide a more straightforward reformulation of the size in the COMPUTE_SIZE function based on the following corollary.

Corollary 4.
Let us consider the element (o, i) ∈ R. The size of its corresponding pseudoconcept according to Equation (5) can be rewritten as follows: Example 9. If we consider the formal context depicted by Table 1, then element (3, h) and its corresponding pseudoconcept PC(3, h) are calculated as The size of this pseudoconcept is computed as follows: The following pseudocode given by Algorithm 3 illustrates the COMPUTE_SIZE function. Step 3: Greedily cover the remaining concepts We repeat this algorithm step when the fixed threshold δ of covered elements (c.f. Line 14) is not reached. Then, for each uncovered element, we call the CALCULATE_BEST_FC function (c.f. Line 17) to obtain the best candidate to add to the concept coverage. This best candidate is selected according to a quality metric. In the CALCULATE_BEST_FC function, we use the bond measure [32], and the chosen concept is the concept that maximizes this measure. This correlation measure computes the ratio between the conjunctive support and the disjunctive support. In [7], it was shown that this metric results in formal concepts with high quality. The bond measure of a nonempty pattern I ⊆ I is defined as follows: If we consider the formal concept C = X, Y , the formula of the bond can be expressed as follows: Equation (9) shows that for a formal concept C = X, Y such that |Y| = 1, we have Therefore, if the cardinality of the intent part is equal to 1, then it is the best formal concept, in terms of the bond metric, from all the formal concepts included in the pseudoconcept induced by element (o, i).
Algorithm 4 describes the pseudocode of the CALCULATE_BEST_FC function. As outlined by Line 3, we have to explore |{o} | formal concepts exactly. Indeed, it is useless to explore all the formal concepts obtained by combining the seed attributes. From them, we will return the best concept in terms of the bond measure. We do not need to generate the formal concepts since we can decide on their extent. Then, we assess the bond metric value of each generated concept using Equation (9) (c.f. Line 7). The formal concept having the highest bond value is the returned BestFc (c.f. Line 10). Example 10. In this example, we illustrate the different phases of the CONCISE algorithm for building minimal conceptual coverage. Let us consider the formal context K given by Table 1 with a threshold δ = 1. The procedure of the algorithm is depicted in Table 5.
Step 1: During this step, we first call the COMPUTE_INTRODUCTORY_CLOSURE procedure, and we obtain Table 6. Then, we invoke the COMPUTE_ESSENTIAL_CONCEPTS function, and we find that (1, b), (2, c), (3, d) are isolated points. Thus, we have three essential formal concepts and C K = { 169, bg , 29, acg , 349, de f gh }. Since K does not fully cover (δ = 1), we proceed to the second step.
Step 2: In this step, we compute the pseudoconcept of elements in the formal concept by invoking the GET_PSEUDOCONCEPT function. Next, the size of each pseudoconcept is assessed through the COMPUTE_SIZE function. Then, the elements are sorted in decreasing order via the SORT_ELEMENTS procedure.
Step 3: The different outputs obtained during this step are also detailed in Table 5. After sorting the elements, we find that (3, h) and (5, h) are ranked first with a size value equal to 19 20 . Since element (3, h) has already been covered by an essential concept, the best formal concept is 3459, e f gh . Thus, we update the list of concept coverage as follows: C K = { 169, bg , 29, acg , 349, de f gh , 3459, e f gh }. All the elements covered by this list of formal concepts are removed from the initial list. Then, element (8, e) with a size value equal to 14 15 comes into play, and the formal concept 34589, eg is added to C K . Then, element (7, a) with a size value equal to 16 18 comes to the top. Consequently, the formal concept 245789, ag is added to C K , and the latter becomes equal to C K = { 169, bg , 29, acg , 349, de f gh , 3459, e f gh , 34589, eg , and 245789, ag }. After removing the covered elements, we find on the top of the remaining elements the couple (7, f ) (as shown by Table 5). The best concept obtainable from the latter is 345679, f g . Thanks to the latter formal concept, all the elements of the formal context are covered, and the final cover of 7 formal concepts is as follows: C K = { 169, bg , 29, acg , 349, de f gh , 3459, e f gh , 34589, eg , 245789, ag , 345679, f g }. Table 5. The procedure of the CONCISE algorithm (δ = 1) on the formal context given by Table 1.

Theoretical Complexity
We now derive an upper bound of the worst-case time complexity of the CON-CISE algorithm. First, let us denote n, m, and k as the numbers of objects, items, and entries, respectively, with × of the input formal context K. To simplify the analysis, we assume that max(n, m) ≤ k, which is a reasonable condition. Moreover, computing {o} {i} takes O(m) and O(n) time, respectively. Therefore, the complexity of the COMPUTE_INTRODUCTORY_CLOSURE procedure is estimated by n × O(n). The complexity of the COMPUTE_ESSENTIAL_CONCEPTS function is n × (O(n) + O(m)). Then, the GET_PSEUDOCONCEPT and the COMPUTE_SIZE functions can be performed in O(n · m) time in the worst case. The cost of these functions in the loop (Lines 8 − 11) is n × O(n · m). We have chosen the QUICKSORT algorithm to sort elements (o, i) of the formal context with respect to the size of the associated pseudoconcepts. This sort has a complexity of O(n · log(n)) according to [33]. The number of elements in the formal context is equal to k. Thus, there are k possible iterations in the case of full coverage, i.e., (δ = 1). The CALCULATE_BEST_FC function takes n × (O(n) + O(m)) in the worst case. In summary, we can say that the theoretical complexity of the CONCISE algorithm is polynomial, which is equal to O(k 2 ).

Experimental Evaluation
In this section, we present our results, showing the efficiency of our proposed algorithm. The solution was implemented and executed on a machine with 32 cores, 64 GB of memory and an Ubuntu Linux operating system. The CPUs are modern and have the AVX-512 instructions available, which can provide more than 10-fold increases in speed in some data processing tasks.

Benchmark Datasets
In this study, we used some benchmark datasets for experimental investigations of the performance and robustness of our proposed algorithm. As shown in Table 7, we considered the Apj and Americas-small datasets. The remaining datasets were furnished by the UC Irvine Machine Learning Database Repository [34]. The table presents the number of objects, the number of attributes, and the number of all formal concepts that may be drawn from the dataset using the LCM algorithm [35] for each dataset. The datasets are listed in increasing order with regard to the number of formal concepts.

Performance of the CONCISE Algorithm
In the following, we evaluate the CONCISE algorithm. In the first step, we compare the minimal coverage (or compacity) with the GREESS. In fact, according to [7], the latter generates the best coverages in terms of compacity. Then, we assess the quality of full and partial converges using different metrics.

Definition 12 (STRESS).
Stress measures the conciseness of the presentation of a matrix (twomode data) and can be seen as a purity function that compares the values in a matrix with their neighbors. The stress measures used here are computed as the sum of squared distances of each matrix entry from its adjacent entries. In [36], Niermann defined two types of neighborhoods for an n × m matrix X = (x ij ) :

•
The Moore neighborhood (M Stress) comprises the (at most) eight adjacent entries. The local stress measure for element x ij is defined as • The Neumann neighborhood (N Stress) comprises the (at most) four adjacent entries resulting in the local stress of x ij : As depicted by Table 8, the CONCISE algorithm gives equal or more compact coverages than the GREESS algorithm on 12 out of 16 datasets. Furthermore, for the Soybean-large and Dermatology datasets, CONCISE outputs 103 and 128 formal concepts, respectively, while GREESS flags 126 and 158 formal concepts, respectively. A close look at Table 8 reveals that CONCISE performs better than GREESS (except with the Mushroom dataset) when N stress and M stress are higher. Although the GREESS algorithm outperforms CONCISE for some datasets, the latter could not provide results for Americas-large, Dual-matching-40, and Ac-90k datasets.

Comparison between the Full and Partial Coverage of the CONCISE Algorithm
The principal added value of the CONCISE algorithm is that it provides full and partial coverage of formal concepts using a threshold δ. We evaluate the obtained coverage regarding the number of concepts, quality metrics, and running time below.
The impact of the variation of δ on the number of concepts obtained by the CONCISE algorithm is shown in Figure 2 and Table 9. Figure 2 shows that the number of concepts decreases drastically when switching from full coverage (δ = 1) to partial coverage (δ = 0.9). For example, the Apj dataset is completely covered by 774 concepts, while only 321 concepts are needed to cover 90%. This difference is smaller between the different thresholds of the partial coverages. The Americas-large dataset is covered by 182 concepts when δ = 0.8, and only 10 concepts are omitted when δ = 0.7. The number of concepts remains the same for the Americas-small dataset from the threshold δ = 0.7. The Ac-90k dataset represents a particular case because the first found concept covers approximately 90% of the formal context.  In the following, we evaluate the CONCISE algorithm in terms of quality. Several measures for concept interestingness were recently reviewed by Kuznetsov et al. [37]. In this study, we use the two most common measures, which are stability and separation [28]. Then, we propose a new measure called object uniformity.
Definition 13 (STABILITY). Stability seems to be the most widely used metric in the FCA community and is applied in numerous applications [38]; e.g., biclustering and the detection of scientific subcommunities, among others. Jay et al. [39] also showed that We can simplify Equation (12) as follows: The higher the stability index of a concept is, the lower the influence that any single object has on its intent. The concepts with high stability are more stable with regard to the random removal of the objects.
, where k represents for the number of "isolated" elements of A.
In our experiments, we used the DFSP algorithm [40] to compute the stability of the obtained coverage. This method is considered an efficient algorithm for computing the exact stability. Table 10 shows that the CONCISE algorithm obtains excellent stability values, especially on the Apj, Breast-cancer, and Tic-tac-toe datasets, where the stability is higher than 0.8. Moreover, we should also mention that for most of the datasets, the stability of the coverage is better for partial coverage. For example, the stability for the Americas-small dataset ranges from 0.598 for δ = 1 to 0.779 for δ = 0.5. However, CONCISE obtains bad results on the Chess, Dual-matching-40, and Ac-90k datasets, even with lower thresholds, and does not exceed the rate of 0.189.  [41] is meant to describe how well a concept sorts out the objects it covers from other objects and how well it sorts out the attributes it covers from other attributes of the context. Thus, this metric characterizes how specific the relationship between the objects and attributes of the concept is concerning the formal context. For example, the separation index of the formal concept A, B is defined as follows: The higher the separation index of a concept, the smaller the number of similar concepts in the formal context. It is defined as the ratio between the area covered by the concept and the total area covered by its objects and attributes.
The results of the separation metric are described in Table 11, which shows that the best separation rate is not always obtained with the same threshold for the different datasets. For example, considering the Apj Americas-small Paleo and Americas-large datasets, the separation is better when δ = 0.9. However, for the DBLP, DNA, Mushroom, Soybean-large, and Chess datasets, the maximum separation is obtained with thresholds equal to 0.6 and 0.5, respectively. Note that varying the threshold does not affect the separation value for the House-vote, Tic-tac-toe, and Dual-matching-40 datasets. In the following, we introduce a new quality metric of formal concepts called object uniformity.
Definition 15 (OBJECT UNIFORMITY). We know that the intent part is the maximal set of attributes located at the intersection of all the objects of the extent part. If we consider each object of the extent part, we would like to assess to what extent the pseudoconcept is different from the formal concept X, Y . Please note that all of these pseudoconcepts share the same extent part. To assess such uniformity or cohesion, we introduce the following metric called object uniformity. If we consider the formal concept C = X, Y , then we define the following metric: Example 12. Let us consider the formal concept C 1 = 13456, a extracted from the formal context given by Table 2. Then, as shown by Table 12, we have: If we also consider the formal concept C 2 = 134, ad , then we have according to Table 13:  Table 14 shows the obtained results of the object uniformity with the different thresholds. Similar to the separation metric, there is no fixed threshold that gives the best results for all the datasets. For example, better results are obtained on the Apj, Americas-large, Soybean, and Chess datasets with a threshold equal to 0.5. Conversely, better results are obtained on the Breast-cancer, Paleo, Spect-test, Mushroom, and Dermatology datasets with full coverage. It is also important to mention that, on average, there is no significant difference between the obtained results when varying the thresholds. For instance, this difference is equal to 0.001 on the House-vote and Ac-90k datasets.  Table 15 shows that the proposed algorithm is very efficient and provides excellent results with all thresholds. For example, the proposed algorithm can process 636 and 474 concepts in 0.52 and 0.809 s, respectively, on the Americas-large and Apj datasets. Furthermore, the proposed algorithm has the highest running time on the Dual-matching-40 dataset among all datasets, and the dataset was handled in 1011.790 s in the worst case. Moreover, it is essential to point out that the GREESS algorithm was unable to handle the same dataset within 48 h. We did not compare the running times of the two algorithms because they were not implemented using the same programming language. The efficiency of the CONCISE algorithm is due to us using the C++ language and parallelism paradigm in implementation. The source code is publicly available at https://github.com/AmiraMouakher/Concise (accessed on 15 September 2021).

Conclusions and Perspectives
This paper proposed a greedy approximation algorithm, called CONCISE, to find a minimal subset of formal concepts that fully or partially cover the formal context's relations. The proposed method avoids computing the entire set of formal concepts associated with a given formal context. Moreover, the CONCISE algorithm yielded high quality for both full and partial coverage in a reasonable running time, even for large datasets. In the near future, we plan to pay close attention to the following issues:

1.
Shallow embedding: From "Boolean matrix factorization," the presented concise coverage leads to the establishment of a gainful approach for unveiling the smallest set of hidden factors, also known as shallow embedding, in contrast to the deep approach learned by deep learning-based techniques. The most important question to answer would be to find the optimal coverage value-i.e., to maximize the conciseness-and maximize the pertinence of the factors by removing the noisy ones.

2.
Scalability for big data bipartite graphs: The growth of many real-world datasets has taken the world by storm, and the community has realized that any "centralized" option would be simply pointless in the very short term. In this respect, we can start to implement a new version of CONCISE on top of the big data frameworks Apache Spark and Graphs to handle very large streaming bipartite graphs.