Efficient Associate Rules Mining Based on Topology for Items of Transactional Data

Bo Li; Zheng Pei; Chao Zhang; Fei Hao

doi:10.3390/math11020401

,

and

¹

School of Computer and Software Engineering, Xihua University, Chengdu 610039, China

²

School of Science, Xihua University, Chengdu 610039, China

³

Intelligent Policing Key Laboratory of Sichuan Province, Sichuan Police College, Luzhou 646000, China

⁴

School of Computer Science, Shaanxi Normal University, Xi’an 710119, China

Mathematics2023, 11(2), 401;https://doi.org/10.3390/math11020401

This article belongs to the Special Issue Artificial Intelligence and Data Science

Version Notes

Order Reprints

Abstract

A challenge in association rules’ mining is effectively reducing the time and space complexity in association rules mining with predefined minimum support and confidence thresholds from huge transaction databases. In this paper, we propose an efficient method based on the topology space of the itemset for mining associate rules from transaction databases. To do so, we deduce a binary relation on itemset, and construct a topology space of itemset based on the binary relation and the quotient lattice of the topology according to transactions of itemsets. Furthermore, we prove that all closed itemsets are included in the quotient lattice of the topology, and generators or minimal generators of every closed itemset can be easily obtained from an element of the quotient lattice. Formally, the topology on itemset represents more general associative relationship among items of transaction databases, the quotient lattice of the topology displays the hierarchical structures on all itemsets, and provide us a method to approximate any template of the itemset. Accordingly, we provide efficient algorithms to generate Min-Max association rules or reduce generalized association rules based on the lower approximation and the upper approximation of a template, respectively. The experiment results demonstrate that the proposed method is an alternative and efficient method to generate or reduce association rules from transaction databases.

Keywords:

knowledge discovery in database (KDD); frequent itemsets; closed itemsets; association rules; the topology for itemsets

MSC:

68W99

1. Introduction

In knowledge discovery in database (KDD), association rules mining (ARM) from transaction databases proposed in [1,2] have received considerable attention and wide applications, such as medical diagnosis [3,4], marketing planning [5,6], etc. ARM can be formally explained as follows: Let

A = (U, A)

be a transaction database, where U a non-empty finite set of transactions, A a non-empty finite set of items, each transaction

u_{i} \in U

decides a mapping

u_{i} : A ⟶ {0, 1}

, i.e., for any item

a_{j} \in A

, the transaction

u_{i}

has (or has not) the item

a_{j} \in A

if

u_{i} (a_{j}) = 1

(or

u_{i} (a_{j}) = 0

), each subset of A is called as an itemset. An association rule describing the co-occurrence relation among items is an implication in the form

A_{1} \to A_{2}

, where itemsets

A_{1}

and

A_{2}

of A such that

A_{1} \cap A_{2} = \emptyset

are called as antecedent and consequent, respectively. Theoretically, ARM from a transaction database is a NP-hard problem because itemsets

A_{1}

and

A_{2}

are selected from

2^{| A |}

(powerset of A), in real world practices, the existed mining methods usually extract a large number of association rules which are difficult to handle; many kinds of association rules or mining methods have been proposed to generate association rules, although these mining methods are different, their processing is nearly the same, i.e., how to evaluate usefulness of association rules and how to select the antecedent and consequent of association rules.

Quality measures are used to evaluate the usefulness of association rules, and various measures have been provided to discover significant and specific association rules. In a transaction database

A = (U, A)

, for any

A^{'} \subseteq A

, denote

τ (A^{'}) = {u_{i} \in U | \forall a_{j} \in A^{'}, u_{i} (a_{j}) = 1}

,

\neg A^{'}

is the absence of

A^{'}

, then the following quality measures can be defined

Support [1,2]: $s u p p_{A} (A_{1} \cup A_{2}) = \frac{| τ (A_{1} \cup A_{2}) |}{| U |}$ ;
Confidence [1,2]: $c o n f_{A} (A_{1} \to A_{2}) = \frac{s u p p_{A} (A_{1} \cup A_{2})}{s u p p_{A} (A_{1})}$ ;
Netconfidence [7]: $n e t c_{A} (A_{1} \to A_{2}) = \frac{s u p p_{A} (A_{1} \cup A_{2}) - s u p p_{A} (A_{1}) s u p p_{A} (A_{2})}{s u p p_{A} (A_{1}) (1 - s u p p_{A} (A_{1}))}$ ;
Conviction [8]: $c o n v_{A} (A_{1} \to A_{2}) = \frac{s u p p_{A} (A_{1}) s u p p_{A} (\neg A_{2})}{s u p p_{A} (A_{1} \cup \neg A_{2})}$ ;
Added value [9]: $a d d_{A} (A_{1} \to A_{2}) = c o n f_{A} (A_{1} \to A_{2}) - s u p p_{A} (A_{2})$ ;
Accuracy [9]: $a c c_{A} (A_{1} \to A_{2}) = c o n f_{A} (A_{1} \to A_{2}) + c o n f_{A} (\neg A_{1} \to \neg A_{2})$ ;
Interestingness [10]: $i n t e_{A} (A_{1} \to A_{2}) = \frac{s u p p_{A} (A_{1} \cup A_{2})}{s u p p_{A} (A_{1})} \times \frac{s u p p_{A} (A_{1} \cup A_{2})}{s u p p_{A} (A_{2})} \times (1 - s u p p_{A} (A_{1} \cup A_{2}))$ ;
Comprehensibility [10]: $c o m p_{A} (A_{1} \to A_{2}) = \frac{log (1 + | τ (A_{2}) |)}{log (1 + | τ (A_{1} \cup A_{2}) |)}$ ;
Lift [11]: $l i f t_{A} (A_{1} \to A_{2}) = \frac{s u p p_{A} (A_{1} \cup A_{2})}{s u p p_{A} (A_{1}) s u p p_{A} (A_{2})}$ .

By combining with several quality measures to form fitness functions, ARM is transformed into optimization problems, and various optimal algorithms can be used to extract interesting or valid association rules from

A = (U, A)

. In [12], schema constraints and the opportunistic confidence constraint are enforced to mine generalized association rules in the analyzed data. In [13], confidence, comprehensibility, and interestingness are considered as a multi-objective problem, and the particle swarm optimization algorithm is proposed to extract the best rules. In [10], support, comprehensibility and interestingness are considered as a multi-objective problem, and a Pareto-based genetic algorithm is used to extract some useful and interesting rules from any market-basket type database. In [14], Shafer’s theory of evidence is used as two information measures for the quality evaluation of the set of frequent itemset (or frequent pattern). In [15], off-the-shelf constraint programming techniques are employed for modeling and solving a wide variety of constraint-based frequent itemsets. In [16], based on exclusive causal-leverage measure, a data mining algorithm is developed to mine the causal relation between drugs and their associated adverse drug reactions. In [17], the particle swarm optimization algorithm are provided to improve computational efficiency as well as to automatically determine suitable support and confidence threshold values of association rules. In [18], grammar-guided genetic programming models are proposed to deal with the association rule mining problem under a multi-objective perspective. In [19], a new multi-objective evolutionary model, which maximizes the comprehensibility, interestingness and performance of the objectives, is presented to mine a set of quantitative association rules with a good trade-off between interpretability and accuracy. In [20], a principal components analysis is applied to a set of measures that evaluate quantitative association rules’ quality; the QARGA algorithm is provided to find out quantitative association rules from a wide variety of datasets. In [21], a new confidence degree of association rules and a discrete bi-level parametric programming are proposed to extract association rules from huge databases. In [22], redundancy analysis, sampling and multivariate statistical analysis are provided to ascertain the discovered rules and discard the non-significant rules. In [23], a systematic assessment of various numerical association rule mining methods and a meta-study of thirty numerical association rule mining algorithms are provided. The authors investigate how far the discretization techniques have been used in the numerical association rule mining methods. In [24], a multi-objective particle swarm optimization is proposed using an adaptive archive grid based on the Pareto optimal strategy for numerical association rule mining. In [25,26,27,28], these papers adopt the idea of parallelization and improves the Apriori algorithm based on the MapReduce model.

By analyzing associative relation among items, suitable itemsets can be selected to generate association rules. The most widely used itemsets are frequent itemsets and closed itemsets [15,29,30,31,32], an itemset is frequent if its support is not less than the minimum support value, an itemset is closed if and only if no proper superset of the itemset has the same support as the itemset. In [33], frequent itemsets are employed to construct taxonomy over items, and a breadth-first search is adopted to enumerate all frequent itemsets. Closed frequent itemsets can be used to uniquely determine the set of all frequent itemsets and applied to generate a condensed set of association rules [34,35,36]. Formally, all closed itemsets of

A = (U, A)

can be constructed as a closed itemset lattice, and CHARM-L algorithm is proposed to explicitly generate the frequent closed itemset lattice [37]. In [38], the parent-child relation of the closed itemset lattice has been exploited, and the cross-level closed itemset lattice is constructed to mine the most relevant minimal cross-level association rules. In [39], an efficient post-processing method is presented to prune redundant rules by virtue of the property of Galois connection, which inherently constrains rules with respect to objects. At present, itemsets have been widely investigated [40,41], and various itemsets and their generating algorithms for specific association rules mining have been proposed, such as expressive generalized itemsets [42], free itemsets [43], disjunction-free itemsets [44,45], non-derivable itemsets [46,47], disjunctive closed itemsets [48], etc. In fact, we notice that frequent itemsets or closed itemsets are rooted to the co-occurrence relation among items; from the mathematical point of view, topology may be a more suitable tool to express the relation among items, because a topology for the set is used to express a relation among subsets, subsets of the set are granulated as members of the topology, and the topology for the set is generated by its topology base; this means that the base for the topology is the basic relation among elements of the set. Topology-based pattern mining has been widely investigated in the rough set theory and knowledge model, such as in [49]; a nested topology on a crisp set of reference is provided to interpret a fuzzy subset; the approach provides us a unified framework for most of the fuzzy thresholding algorithms. In [50], the lower and upper approximations of the rough set theory are shown as pre-topological invariants and dual to each other, and the corresponding generalized topological closure via the upper approximation is constructed. In [51], the knowledge representation and reasoning method are proposed for identity-based spatial change; the change process is presented by a multistage graph, the binary relation model

B C

for identity change is defined, and qualitative reasoning for

B C

and combining

B C

with topological relations are investigated.

In this paper, we analyze topology and the topology base for items, express more general associative relations among items and extract useful association rules based on the topology and the base; all of these are inspired by our previous works on the formal concept analysis; in [52], we proposed a novel method based on the topology for attributes of a formal context to generate all formal concepts, the topology for attributes has been used to explain the associative relation among attributes of the formal context, and the formal concept lattice can be constructed by the topology for attributes based on an equivalent relation. In [53], we used the topology for attributes of multi-valued information systems to generate Min-Max association rules. In this paper, we construct the topology for the set of items of a transaction database, which can be used to explain more general associative relation among items; moreover, we prove that frequent itemsets, closed itemsets and closed frequent itemsets are included in the topology for the set of items, provide two kinds of lattice on the topology to display the hierarchical structures on itemsets, and propose efficient algorithms based on the base for the topology to generate or reduce many kinds of association rules.

The organization of this paper is as follows: In Section 2, we briefly review some basic notions used in association rules mining. In Section 3, we present the topology for the set of items and lattices on itemsets of a transactional database, in which some important properties of the topology, the quotient lattice and minimal generators of closed frequent itemsets, are provided. In Section 4, we propose several algorithms to generate or reduce association rules, and analyze confidences of generalized association rules. In Section 5, we choose zoo, mushroom, connect-4 and chess as transactional databases to show the proposed method in generating association rules. We conclude the paper in Section 6.

2. Preliminaries

In the section, we briefly review some basic concepts used in frequent itemsets, closed itemsets or association rules mining; several concepts are also widely used in rough sets and formal concept analysis. For a uniform expression, we adopt the following set-to-set mapping in a transaction database

A = (U, A)

, denote

P (U)

and

P (A)

as power sets of U and A, respectively, then we have

\begin{matrix} τ & : & P (A) ⟶ P (U), τ (A_{1}) = {u_{i} \in U | \forall a_{j} \in A_{1} \subset A, u_{i} (a_{j}) = 1}, \\ γ & : & P (U) ⟶ P (A), γ (O) = {a_{j} \in A | \forall u_{i} \in O \subset U, u_{i} (a_{j}) = 1} . \end{matrix}

Based on

τ

and

γ

, we can rewrite the support of an itemset

A_{1}

and the confidence of an association rule

A_{1} \to A_{2}

, as follows:

\begin{matrix} s u p p (A_{1}) = | τ (A_{1}) | (o r \frac{| τ (A_{1}) |}{| U |}), \\ c o n f (A_{1} \to A_{2}) = \frac{| τ (A_{1} \cup A_{2}) |}{| τ (A_{1}) |} . \end{matrix}

τ

and

γ

also make us define closed itemsets and their generator concisely, i.e.,

A_{1} \subset A

is a closed itemset if and only if

γ τ (A_{1}) = A_{1}

, an itemset

A_{2}

is a generator of the closed itemset

A_{1}

if and only if

A_{2} \subset A_{1}

and

γ τ (A_{2}) = A_{1}

, furthermore,

A_{2}

is a minimal generator of

A_{1}

if and only if

∄ A_{3} \subset A_{2}

such that

γ τ (A_{3}) = A_{1}

. Formally,

τ

and

γ

are also analyzed in rough sets and formal concept analysis [50,54,55,56,57], and the following properties of

τ

and

γ

are obviously, for any

A_{1}, A_{2}, A_{3} \subset A

and

O_{1}, O_{2} \subset U

, (1) If

A_{1} \subseteq A_{2}

, then

τ (A_{1}) \supseteq τ (A_{2})

; (2) If

O_{1} \subseteq O_{2}

, then

γ (O_{1}) \supseteq γ (O_{2})

; (3)

τ (A_{1} \cup A_{2}) = τ (A_{1}) \cap τ (A_{2}), γ (O_{1} \cup O_{2}) = γ (O_{1}) \cap γ (O_{2})

; (4)

τ (γ τ (A_{1})) = τ (A_{1})

,

γ (τ γ (O_{1})) = γ (O_{2})

. In [52], we use

τ

and

γ

to induce a reflexive and transitive relation on the set of attributes from a formal context, the topology for attributes are constructed by using the relation, then the formal concepts are analyzed in the the topology for attributes. Here, we provide the same results in a transaction database.

Definition 1.

For any transactional database

A = (U, A)

, τ and γ decide a point-to-set mapping from A to

P (A)

, i.e., for any

a_{i} \in A

,

C : A ⟶ P (A)

is

\begin{matrix} C (a_{i}) = {a_{j} \in A | a_{j} \in γ τ ({a_{i}})} . \end{matrix}

Intuitively, the point-to-set mapping C represents a co-occurrence relation among items, i.e., the following binary relation on A can be decided by the mapping, C. For any

a_{i}, a_{j} \in A

,

\begin{matrix} R_{A} (a_{i}, a_{j}) = \{\begin{matrix} 1, & if a_{j} \in C (a_{i}), \\ 0, & if a_{j} \notin C (a_{i}) . \end{matrix} \end{matrix}

(1)

In [52], we have proved that the binary relation

R_{A}

on A is reflexive and transitive, and

(A, R_{A})

is an approximation space. For any subset

A_{1}

of A, denote

\underset{̲}{R_{A}} (A_{1}) = {a_{i} \in A | \forall a_{j}, (R_{A} (a_{i}, a_{j}) = 1) \to (a_{j} \in A_{1})}

and

\bar{R_{A}} (A_{1}) = {a_{i} \in A | \exists a_{j} \in A_{1} \land R_{A} (a_{i}, a_{j}) = 1}

, then

\bar{R_{A}} (A_{1})

and

\underset{̲}{R_{A}} (A_{1})

are generalized upper approximation and lower approximation of

A_{1}

, especially if

R_{A}

is an equivalent relation;

\bar{R_{A}} (A_{1})

and

\underset{̲}{R_{A}} (A_{1})

are Pawlak’s upper approximation and lower approximation [55]. More important, we have the following theorem.

Theorem 1

([52]). For any transactional database

A = (U, A)

.

1 .: $T (R_{A}) = {\underset{̲}{R_{A}} (A_{1}) |$ $A_{1} \subseteq A}$ is a topology for A;
2 .: For any $a_{i} \in A$ , $\underset{̲}{R_{A}} (C (a_{i})) = C (a_{i})$ ;
3 .: $B_{A} = {C (a_{i}) | a_{i} \in A}$ is a base for the topology $T (R_{A})$ .

The theorem means that the topology for A has a simple expression, i.e.,

T (R_{A}) = {⋃_{C (a_{i}) \in B_{A}^{'}} C (a_{i}) | \forall B_{A}^{'} \subseteq B_{A}}

, we use the following example to explain the above mentioned concepts and results; the example was initially used to analyze the concept “to-be-a-fruit” [58].

Example 1.

Let a fruit data be

A = (U, A)

, where

U = {

a chestnut (

u_{1}

), an olive (

u_{2}

), a pepper (

u_{3}

), a strawberry (

u_{4}

), an orange (

u_{5}

), a tomato (

u_{6}

)} and

A = {

to-grow-on-trees (

a_{1}

), to-be-sweet (

a_{2}

), to-be-raw-edible (

a_{3}

), to-yield-juice (

a_{4}

), to-have-a-skin (

a_{5}

)} be used to understand the concept “to-be-a-fruit” (shown in Table 1). According to Definition 1, we have

C (a_{1}) = {a_{1}}, C (a_{2}) = {a_{2}, a_{3}}

,

C (a_{3}) = {a_{3}}

,

C (a_{4}) = {a_{3}, a_{4}, a_{5}}

and

C (a_{5}) = {a_{5}}

, e.g., for

C (a_{2}) = {a_{2}, a_{3}}

, it means that

a_{2}

and

a_{3}

are co-occurrence in Table 1, i.e., a fruit is “to-be-sweet”, it must be “to-be-raw-edible”. The binary relation

R_{A}

on

A = {a_{1}, a_{2}, a_{3}, a_{4}, a_{5}}

is shown in Table 2.

Table 1. A transactional database

A = (U, A)

.

Table 2. The binary relation

R_{A}

on A.

Based on Theorem 1 and the mapping C, we have

T (R_{A}) = {\emptyset, {a_{1}}, {a_{3}}, {a_{5}}, {

a_{1}, a_{3}},

{a_{1}, a_{5}}, {a_{2}, a_{3}}, {a_{3}, a_{5}}, {a_{1}, a_{2}, a_{3}}, {a_{1}, a_{3}, a_{5}}, {a_{2}, a_{3}, a_{5}}, {a_{3}, a_{4}, a_{5}},

{a_{1}, a_{3}, a_{4}, a_{5}},

{a_{1}, a_{2}, a_{3}, a_{5}}, {a_{2}, a_{3}, a_{4}, a_{5}}, {a_{1}, a_{2}, a_{3},

a_{4}, a_{5}}}

, e.g., the member

{a_{2}, a_{3}, a_{5}}

is obtained by

{a_{2}, a_{3}, a_{5}} = C (a_{2}) \cup C (a_{5}) .

According to Table 2, we can find that

R_{A}

on A is not symmetric generally, e.g.,

R_{A} (a_{2}, a_{3}) = 1

but

R_{A} (a_{3}, a_{2}) = 0

, hence, members of the topology

T (R_{A})

for the set of items are not Pawlak’s lower approximation in practice. According to Table 1, we can obtain the support of each member in the topology for A, e.g., for

{a_{2}, a_{3}}

, we have

s u p p ({a_{2}, a_{3}}) = | τ (T) | = | {u_{4}, u_{5}} | = 2

or

s u p p ({a_{2}, a_{3}}) = \frac{| {u_{4}, u_{5}} |}{| U |} = \frac{1}{3}

. If we fix minimum support

\frac{1}{3}

, then members such as

{a_{2}, a_{3}}

and

{a_{1}, a_{3}}

are frequent itemsets, which can be used to generate association rules, e.g., we can generate association rules “

a_{1} \to a_{3}

” and “

a_{3} \to a_{1}

”; confidences of them are

\begin{matrix} c o n f (a_{1} \to a_{3}) & = & \frac{| τ ({a_{1}, a_{3}}) |}{| τ (a_{1}) |} = \frac{| {u_{2}, u_{5}} |}{| {u_{1}, u_{2}, u_{5}} |} = \frac{2}{3}, \\ c o n f (a_{3} \to a_{1}) & = & \frac{| τ ({a_{1}, a_{3}}) |}{| τ (a_{3}) |} = \frac{| {u_{2}, u_{5}} |}{| {u_{2}, u_{3}, u_{4}, u_{5}, u_{6}} |} = 0.4 . \end{matrix}

In A of the fruit data

A = (U, A)

, “

a_{1} \to a_{3}

” is “to-grow-on-trees implies to-be-raw-edible”, “

a_{3} \to a_{1}

” is “to-be-raw-edible implies to-grow-on-trees”, by considering their confidences, “

a_{1} \to a_{3}

” is more confident than “

a_{3} \to a_{1}

” due to

\frac{2}{3} > 0.4

.

From Example 1, we notice that members of the topology

T (R_{A})

for the set of items are itemsets, which is generated by the co-occurrence relation among items. In existed ARM methods, frequent itemsets or closed itemsets are always used to mine association rules, one problem is “can we use members of the topology

T (R_{A})

for the set of items to mine all useful and needful association rules”. The problem will be solved in the next section.

3. Lattice Structures on the Topology for the Set of Items

In this section, we use set inclusion to construct lattice structures on the topology

T (R_{A})

for the set of items, one is constructed on the topology

T (R_{A})

itself, another is constructed on quotient set of the topology

T (R_{A})

, then we analyze minimal elements and minimal generators of closed itemsets in the lattice structures, which are useful information in ARM.

3.1. The Lattice on the Topology

Formally, the topology for the set of items is a proper subset of the power set of A—theoretically, the power set of A naturally forms a power set lattice; however, its proper subset may be not a lattice. Here, we use the set inclusion to construct lattice on the topology for the set of items and analyze hierarchical structure of members of the topology.

It is obvious that

T (R_{A})

is an poset by set inclusion, i.e., for any

T_{1}, T_{2} \in T (R_{A}),

T_{1} \leq T_{2} if and only if T_{1} \subseteq T_{2} .

On the poset

(T (R_{A}), \leq)

, we define

\begin{matrix} T_{1} \land T_{2} & = & (⋃_{a_{i} \in A}^{C (a_{i}) \subseteq T_{1}} C (a_{i})) \cap (⋃_{a_{j} \in A}^{C (a_{j}) \subseteq T_{2}} C (a_{j})), \\ T_{1} \lor T_{2} & = & (⋃_{a_{i} \in A}^{C (a_{i}) \subseteq T_{1}} C (a_{i})) \cup (⋃_{a_{j} \in A}^{C (a_{j}) \subseteq T_{2}} C (a_{j})) . \end{matrix}

For any

a_{k} \in T_{1} \land T_{2}

, we have

a_{k} \in ⋃_{a_{i} \in A}^{C (a_{i}) \subseteq T_{1}} C (a_{i})

and

a_{k} \in ⋃_{a_{j} \in A}^{C (a_{j}) \subseteq T_{2}} C (a_{j})

; hence, there exists

C (a_{i}) \subseteq T_{1}

and

C (a_{j}) \subseteq T_{2}

such that

a_{k} \in C (a_{i})

and

a_{k} \in C (a_{j})

, according to Theorem 1,

C (a_{k}) \subseteq C (a_{i})

and

C (a_{k}) \subseteq C (a_{j})

, we have

\begin{matrix} C (a_{k}) & \subseteq & (⋃_{a_{i} \in A}^{C (a_{i}) \subseteq T_{1}} C (a_{i})) \cap (⋃_{a_{j} \in A}^{C (a_{j}) \subseteq T_{2}} C (a_{j})) \\ = & ⋃_{a_{i} \in A}^{C (a_{i}) \subseteq T_{1}} ⋃_{a_{j} \in A}^{C (a_{j}) \subseteq T_{2}} (C (a_{i}) \cap C (a_{j})) = T_{1} \land T_{2}, \end{matrix}

this means that we rewrite

T_{1} \land T_{2} = ⋃_{a_{k} \in T_{1} \cap T_{2}} C (a_{k}) \in T (R_{A}) .

Similarly, we rewrite

T_{1} \lor T_{2} = ⋃_{a_{k} \in T_{1} \cup T_{2}} C (a_{k}) \in T (R_{A}) \in T (R_{A})

, this means that

(T (R_{A}), \land, \lor)

is a lattice.

Furthermore, for any subset

T \subseteq T (R_{A})

, we denote

\begin{matrix} \land T & = & \underset{T_{i} \in T}{⋀} T_{i} = ⋂_{T_{i} \in T} (⋃_{a_{i} \in A}^{C (a_{i}) \subseteq T_{i}} C (a_{i})), \\ \lor T & = & \underset{T_{i} \in T}{⋁} T_{i} = ⋃_{T_{i} \in T} (⋃_{a_{i} \in A}^{C (a_{i}) \subseteq T_{i}} C (a_{i})), \end{matrix}

then, one can easily prove that

\land T = ⋃_{a_{k} \in ⋂_{T_{i} \in T} T_{i}} C (a_{k}) \in T (R_{A})

and

\lor T = ⋃_{a_{k} \in ⋃_{T_{i} \in T} T_{i}} C (a_{k}) \in T (R_{A})

, and

(T (R_{A}), \land, \lor)

is also a complete lattice, e.g., in Example 1, we have

{a_{1}} \land {a_{3}} = \emptyset

,

{a_{1}, a_{2}, a_{3}} \land {a_{1}, a_{3}, a_{5}} = C (a_{1}) \cup C (a_{3}) = {a_{1}, a_{3}}

due to

{a_{1}, a_{2}, a_{3}} \cap {a_{1}, a_{3}, a_{5}} = {a_{1}, a_{3}}

, and

{a_{2}, a_{3}, a_{5}} \lor {a_{3}, a_{4}, a_{5}} = C (a_{2}) \cup C (a_{3}) \cup C (a_{4}) \cup C (a_{5}) = {a_{2}, a_{3}, a_{4}, a_{5}}

due to

{a_{2}, a_{3}, a_{5}} \cup {a_{3}, a_{4}, a_{5}} = {a_{2}, a_{3}, a_{4}, a_{5}}

. The complete lattice

(T (R_{A}), \land, \lor)

of Example 1 is shown in Figure 1.

Figure 1. The lattice of the topology

T (R_{A})

of Example 1.

From up to down, the complete lattice

(T (R_{A}), \land, \lor)

provides us hierarchical information among members of the topology, intuitively, hierarchical information means that any member of

T (R_{A})

is contained in its upper hierarchical members, and there are common transactions in the member of

T (R_{A})

and its upper hierarchical members; such information can be used to generate association rules, i.e., suppose that

T_{2}

is an upper hierarchical member of

T_{1}

, then, we have the following association rule

ψ = T_{1} \to (T_{2} - T_{1}),

in which,

s u p p (ψ) = | τ (T_{2}) |

and

c o n f (ψ) = \frac{| τ (T_{2}) |}{| τ (T_{1}) |}

, e.g., in Figure 1, itemset

{a_{3}, a_{5}}

is an up hierarchical member of itemset

{a_{5}}

, according to Table 1, we have

a_{5} \to a_{3}

with

s u p p (a_{5} \to a_{3}) = | τ ({a_{3}, a_{5}}) | = 3

and

c o n f (a_{5} \to a_{3}) = \frac{| τ ({a_{3}, a_{5}}) |}{| τ ({a_{5}}) |} = 0.75

.

a_{1} \land a_{3} \to a_{5}

with

s u p p (a_{1} \land a_{3} \to a_{5}) = | τ ({a_{1}, a_{3}, a_{5}}) | = 1

and

c o n f (a_{1} \land a_{3} \to a_{5}) = \frac{| τ ({a_{1}, a_{3}, a_{5}}) |}{| τ ({a_{1}, a_{3}}) |} = 0.5

. In A of the fruit data

A = (U, A)

, “

a_{5} \to a_{3}

” is “to-have-a-skin implies to-be-raw-edible”, “

a_{1} \land a_{3} \to a_{5}

” is “to-grow-on-trees and to-be-raw-edible implies to-have-a-skin”; obviously, “to-have-a-skin implies to-be-raw-edible” is more confident and useful than “to-grow-on-trees and to-be-raw-edible implies to-have-a-skin” due to

0.75 > 0.5

.

3.2. The Lattice on the Quotient Set of the Topology

To fast mine association rules with high support and confidence from the topology for the set of items, we construct another lattice structure on the topology in the subsection; then, we analyze minimal elements and minimal generators of closed itemsets in the lattice, which can help us to fast generate association rules with high support and confidence.

For any

T_{1}, T_{2} \in T (R_{A})

, we define a binary relation on the topology for the set of items, as follows

T_{1} \sim_{τ} T_{2} if and only if τ (T_{1}) = τ (T_{2}) .

It is obvious that

\sim_{τ}

is an equivalent relation on the topology

T (R_{A})

, and an quotient set of the topology

T (R_{A})

can be decided by the equivalent relation, i.e.,

T (R_{A}) / \sim_{τ} = {[T] | T \in T (R_{A})}

; each equivalent class

[T]

in

T (R_{A}) / \sim_{τ}

is consisted of members of

T (R_{A})

with the same support

τ (T)

. According to the property of the topology, we have

T_{1} \cup T_{2}

is in

T (R_{A})

if

T_{1}

and

T_{2}

are in

T (R_{A})

. On the other hand, according to property of the set-to-set mapping

τ

, for any

[T] \in T (R_{A}) / \sim_{τ}

and

T^{'}, T^{″} \in [T]

,

τ (T^{'} \cup T^{″}) = τ (T^{'}) \cap τ (T^{″}) = τ (T)

, we have

T^{'} \cup T^{″} \in [T]

, i.e., each equivalent class

[T]

in

T (R_{A}) / \sim_{τ}

is closed for the ∪ operation, hence, we have the following theorem.

Theorem 2.

For each equivalent class

[T]

in

T (R_{A}) / \sim_{τ}

,

[T]

is a union semi-lattice.

The maximum element of each equivalent class

[T]

in

T (R_{A}) / \sim_{τ}

is important. Formally, we denote the maximum element of

[T]

as

\cup [T]

, then we confirm that

\cup [T]

is a closed itemset and each member T of

[T]

is a generator of

\cup [T]

, in fact, suppose that T is a generator of a closed itemset

A^{'}

, according to the set-to-set mapping

τ

and Definition 1; we have

\begin{matrix} τ (A^{'}) = ⋂_{a_{i} \in A^{'}} τ (a_{i}) = ⋂_{a_{i} \in A^{'}} τ (C (a_{i})) = τ (⋃_{a_{i} \in A^{'}} C (a_{i})) = τ (T), \end{matrix}

this means that

A^{'}

is a member of the topology for the set of items and

A^{'} \in [T]

, hence,

A^{'} = \cup [T]

.

Corollary 1.

For any transactional database

A = (U, A)

and each equivalent class

[T]

in

T (R_{A}) / \sim_{τ}

, (1) all closed itemsets of A are in

T (R_{A})

; (2)

\cup [T]

is a closed itemset; (3)

\forall T^{'} \in [T]

and

T^{'} \neq \cup [T]

,

T^{'}

is a generator of

\cup [T]

.

Because

T (R_{A})

is a proper subset of powerset of A and the existed methods search closed itemsets in powerset of A, the corollary means that we can reduce the searching range of closed itemsets in the topology for the set of items. On the other hand, frequent itemsets own the downward closed property, i.e., any subset of a frequent itemset is still frequent; hence, we can use closed frequent itemsetsto obtain its all frequent itemsets, any subset of a closed frequent itemset is a frequent itemset. The following corollaries help us to obtain minimal generators of closed itemsets, which is useful information in ARM.

In each union semi-lattice

[T]

, we denote minimal members of

[T]

as

m i n [T] = {T^{'} \in [T] | ∄ T^{″} \in [T] \land T^{″} \subseteq T^{'}}

.

Corollary 2.

For any

T^{'} \in m i n [T]

, if there exists no

P \subset T^{'}

such that

τ (P) = τ (T^{'})

, then

T^{'}

is a minimal generator of

T^{″} \in [T]

such that

T^{″} \notin m i n [T]

and

T^{'} \subset T^{″}

.

In Example 1, we can easily check

[{a_{1}, a_{2}, a_{3}}] = {{a_{1}, a_{2}, a_{3}}, {a_{1}, a_{3}, a_{5}}, {a_{2}, a_{3},

a_{5}}, {a_{1}, a_{3}, a_{4}, a_{5}}, {a_{1}, a_{2}, a_{3}, a_{5}}, {a_{2}, a_{3}, a_{4}, a_{5}}, {a_{1}, a_{2}, a_{3}, a_{4}, a_{5}}}

, in which,

m i n [{a_{1}, a_{2}, a_{3}}] = {{a_{1}, a_{2}, a_{3}}, {a_{1}, a_{3}, a_{5}}, {a_{2},

a_{3}, a_{5}}}

, due to for any

P \subset {a_{1}, a_{3}, a_{5}}

,

τ (P) \neq τ ({a_{1}, a_{3}, a_{5}}) = {u_{5}}

,

{a_{1}, a_{3}, a_{5}}

is a minimal generator of

{a_{1}, a_{3}, a_{4}, a_{5}}

,

{a_{1}, a_{2}, a_{3}, a_{5}}

and

{a_{1}, a_{2}, a_{3}, a_{4}, a_{5}}

.

For

T^{'} \in m i n [T]

, if there exists

P \subset T^{'}

such that

τ (P) = τ (T^{'})

, we denote

P_{τ} (T^{'}) = {P \subset T^{'} | T^{'} \in m i n [T], τ (P) = τ (T^{'})}

,

P_{τ} (T^{'})

is an poset by set inclusion, i.e.,

\forall P_{1}, P_{2} \in P_{τ} (T^{'})

,

P_{1} \leq P_{2}

if and only if

P_{1} \subseteq P_{2}

, minimal elements of

P_{τ} (T^{'})

is denoted by

m i n P_{τ} (T^{'}) = {P^{'} \in P_{τ} (T^{'}) | ∄ P^{″} \in P_{τ} (T^{'}) \land P^{″} \subseteq P^{'}} .

Corollary 3.

For

P^{'} \in m i n P_{τ} (T^{'})

,

P^{'}

is a minimal generator of

T^{″} \in [T]

, where

T^{'}

is a generator of

T^{″}

.

In Example 1,

{a_{1}, a_{2}, a_{3}} \in m i n [{a_{1}, a_{2}, a_{3}}]

, due to

τ ({a_{1}, a_{2}}) = τ ({a_{1}, a_{2}, a_{3}})

= {u_{5}}

,

{a_{1}, a_{2}} \in m i n P_{τ} ({a_{1}, a_{2},

a_{3}})

and

{a_{1}, a_{2}}

is a minimal generator of

{a_{1}, a_{2}, a_{3}}

,

{a_{1}, a_{2}, a_{3},

a_{5}}

and

{a_{1}, a_{2}, a_{3}, a_{4}, a_{5}}

.

From the algebraic point of view, we can construct a lattice structure on the quotient set of the topology

T (R_{A})

, i.e., for any

[T_{1}], [T_{2}] \in T (R_{A}) / \sim_{τ}

, we define

[T_{1}] \lor [T_{2}] = [(\cup [T_{1}]) \cap (\cup [T_{2}])]

and

[T_{1}] \land [T_{2}] = [T_{1} \cup T_{2}]

, one can easily check that operators ∨ and ∧ on

T (R_{A}) / \sim_{τ}

are well defined, and we have the following theorem.

Theorem 3

([52]).

(T (R_{A}) / \sim_{τ}, \land, \lor)

is a complete lattice, in which the maximum and minimum elements are

[\emptyset]

and

[A]

, respectively.

In Example 1,

T (R_{A}) / \sim_{τ} = {[\emptyset], [{a_{1}}], [{a_{3}}],

[{a_{5}}], [{a_{1},

a_{3}}],

[{a_{1}, a_{5}}], [{a_{2},

a_{3}}],

[{a_{3}, a_{5}}], [{a_{3}, a_{4}, a_{5}}], [{a_{1}, a_{2}, a_{3}}] (= [A])}

, the lattice

(T (R_{A}) / \sim_{τ}, \land, \lor)

of Example 1 is shown in Figure 2.

Figure 2. The complete lattice

(T (R_{A}) / \sim_{τ}, \land, \lor)

of Example 1.

4. Association Rules Mining from the Quotient Set of the Topology

In summary, we can confirm the follows facts about itemsets of a transactional database

A = (U, A)

based on the above mentioned results:

1.: The topology $T (R_{A})$ for the set of items is a complete lattice and displays a hierarchical structure on some itemsets, it can be generated by the base $B_{A} = {C (a_{i}) | a_{i} \in A}$ ; generally, each $C (a_{i})$ is an itemset and can more fast generate closed itemsets than single items in the existed methods;
2.: All closed itemsets are included in the topology $T (R_{A})$ , moreover, a closed itemset is the maximum element of an equivalent class $[T] \in T (R_{A}) / \sim_{τ}$ ;
3.: Each itemsets in $[T]$ has the same support; moreover, generators and minimal generators of a closed itemset can be obtained from $[T]$ ;
4.: The complete lattice $(T (R_{A}) / \sim_{τ}, \land, \lor)$ displays the hierarchical structures on closed itemsets.

Inspired by rough set theory, each equivalent class in

T (R_{A}) / \sim_{τ}

can be understood as a granular knowledge on the set of itemsets of

A = (U, A)

; on the one hand, each granular knowledge can be used to mine association rules because it includes a closed itemset. On the other hand, all granular knowledge can be also used to approximate any itemset

A_{1} = {a_{i_{1}}, a_{i_{2}}, \dots, a_{i_{k}}} \subseteq A

, i.e.,

1.: The lower approximation of $A_{1}$ :

$L (A_{1}) = ⋃_{a_{i} \in A}^{C (a_{i}) \subseteq A_{1}} C (a_{i});$
2.: The upper approximation of $A_{1}$ :

$G (A_{1}) = ⋃_{a_{i} \in A}^{C (a_{i}) \cap A_{1} \neq \emptyset} C (a_{i}) .$

It is obvious that for any itemset

A_{1} \subseteq A

,

L (A_{1})

and

G (A_{1})

are in the topology

T (R_{A})

for the set of items, if

L (A_{1}) = G (A_{1})

, then the itemset

A_{1}

is in

T (R_{A})

; furthermore, supports of

L (A_{1})

,

A_{1}

and

G (A_{1})

are such that

s u p p (L (A_{1})) \geq s u p p (A_{1}) \geq s u p p (G (A_{1})),

such as in Example 1, for itemset

{a_{1}, a_{2}, a_{5}} \subset A

,

L ({a_{1}, a_{2}, a_{5}}) = {a_{1}, a_{5}}

,

G ({a_{1}, a_{2}, a_{5}}) = {a_{1}, a_{2}, a_{3}, a_{5}}

, and

s u p p ({a_{1}, a_{5}}) = | {u_{1}, u_{5}} | = 2 \geq s u p p ({

a_{1}, a_{2}, a_{5}}) = | {u_{5}} | = 1 \geq s u p p ({a_{1}, a_{2}, a_{3}, a_{5}}) = | {u_{5}} | = 1

. Formally, the lower approximation and the upper approximation of an itemset provide us an alternative method to mine generalized association rules or reduce association rules. All of these will be discussed in the rest of the section.

4.1. Min-Max Association Rules Mining

In the subsection, we provide an useful method to mine Min-Max association rules from closed frequent itemsets; here, an association rule is called as a Min-Max association rule if and only if there does not exist an association rule with the same quality measures as the association rule, but with a more specific antecedent part and a more general consequent part. Because closed itemsets are in equivalent classes of

T (R_{A}) / \sim_{τ}

, we propose Algorithm 1 to generate all Min-Max association rules with confidence

c = 1

from equivalent classes of

T (R_{A}) / \sim_{τ}

.

Algorithm 1 Min-Max association rules mining from closed itemsets

Input: A transactional database $A = (U, A)$ .
Output: Min-Max association rules with confidence $c = 1$ .
while The stop condition is not satisfied do
(1) Generate $B_{A} = {C (a_{i}) | a_{i} \in A}$ according to $C (a_{i}) = {a_{j} \in A | a_{j} \in γ τ ({a_{i}})}$ ( $\forall a_{i} \in A$ ).
(2) Generate the topology $T (R_{A}) = {⋃_{C (a_{i}) \in B_{A}^{'}} C (a_{i}) | \forall B_{A}^{'} \subseteq B_{A}}$ according to $B_{A}$ .
(3) Generate $T (R_{A}) / \sim_{τ} = {[T] | T \in T (R_{A}), \forall T_{1}, T_{2} \in [T], τ (T_{1}) = τ (T_{2})}$ according to $T (R_{A})$ and the mapping $τ$ .
(4) Generate $m i n [T] = {T^{'} \in [T] | ∄ T^{″} \in [T] \land T^{″} \subseteq T^{'}}$ according to set inclusion for any $[T] \in T (R_{A}) / \sim_{τ}$ .
(5) Generate $P \in m i n P_{τ} (T^{'})$ according to set inclusion for any $T^{'} \in m i n [T]$ .
(6) Select $P \in m i n P_{τ} (T^{'})$ ( $T^{'} \in m i n [T]$ ), generate Min-Max association rule

$ϕ \equiv P ⟶ Q,$

, in which $P \in m i n P_{τ} (T^{'})$ , $T^{'} \in m i n [T]$ , $Q = \cup [T] - P$ and $c o n f (ϕ) = 1$ .
(7) Return to (6).
end while
Output All $ϕ \equiv P ⟶ Q$ with $c o n f (ϕ) = 1$ .

The pseudocode provided in Algorithm 1, is responsible for mining Min-Max association rules from closed itemsets. Step (1) generates the base

B

, step (2) generates the topology

T (R_{A})

, step (3) generates the equivalent class

T (R_{A}) / \sim_{τ}

, step (4) generates the minimum of

[T]

.

Example 2.

A transactional database is shown in Table 3. According to Definition 1, we obtain

C (a_{1}) = {a_{1}, a_{5}}

,

C (a_{2}) = {a_{2}, a_{5}}

,

C (a_{3}) = {a_{3}}

,

C (a_{4}) = {a_{3}, a_{4}}

and

C (a_{5}) = {a_{5}}

, i.e., the base for the topology for the set of items is

B_{A} = {{a_{1}, a_{5}}, {a_{2}, a_{5}}, {a_{3}}, {a_{3}, a_{4}}, {a_{5}}}

, the co-occurrence relation

R_{A}

among items decided by Equation (1) is shown in Table 4. The topology

T (R_{A})

for the set of items generated by

B_{A}

and the quotient set

T (R_{A}) / \sim_{τ}

of the topology are

\begin{matrix} T (R_{A}) & = & {\emptyset, {a_{1}, a_{5}}, {a_{2}, a_{5}}, {a_{3}}, {a_{3}, a_{4}}, {a_{5}}, {a_{3}, a_{5}}, {a_{1}, a_{2}, a_{5}}, \\ {a_{1}, a_{3}, a_{5}}, {a_{2}, a_{3}, a_{5}}, {a_{3}, a_{4}, a_{5}}, {a_{1}, a_{2}, a_{3}, a_{5}}, {a_{1}, a_{3}, a_{4}, \\ a_{5}}, {a_{2}, a_{3}, a_{4}, a_{5}}, A (= {a_{1}, a_{2}, a_{3}, a_{4}, a_{5}})}, \\ T (R_{A}) / \sim_{τ} & = & {[\emptyset], [{a_{1}, a_{5}}], [{a_{2}, a_{5}}], [{a_{3}}], [{a_{3}, a_{4}}], [{a_{5}}], [{a_{3}, a_{5}}], \\ [{a_{1}, a_{2}, a_{5}}], [{a_{2}, a_{3}, a_{5}}], [{a_{3}, a_{4}, a_{5}}], [A]} . \end{matrix}

Table 3. A transactional database

A = (U, A)

.

Table 4. A binary relation on A.

The complete lattice

(T (R_{A}) / \sim_{τ}, \land, \lor)

is shown in Figure 3, in which, each equivalent class is with its support, e.g.,

([{a_{5}}], 7)

means

s u p p ({a_{5}}) = 7

. Closed itemset and minimal generators of each equivalent class are shown in Table 5; accordingly, Min-Max association rules with confidence

c = 1

generated from each equivalent class are shown in Table 6.

Figure 3. The complete lattice

(T (R_{A}) / \sim_{τ}, \land, \lor)

of Example 2.

Table 5. Closed itemset and minimal generators of each equivalent class.

Table 6. Min-Max association rules with support s and confidence

c = 1

.

In Table 6,

a_{2} \land a_{3} ⟶ a_{5}

is generated by a minimal generator

{a_{2}, a_{3}}

of

{a_{2}, a_{3}, a_{5}}

,

a_{2} \land a_{3} \land a_{5} ⟶ a_{1}

is generated by generator

{a_{2}, a_{3}, a_{5}}

of

{a_{1}, a_{2}, a_{3}, a_{5}}

, which are association rules with confidence

c = 1

; however, they are not Min-Max association rules with confidence

c = 1

, only

a_{2} \land a_{3} ⟶ a_{1} \land a_{5}

is Min-Max association rules with confidence

c = 1

. Generally, there are many association rules with confidence

c = 1

generated from each equivalent class; however, Min-Max association rules

P ⟶ Q

with confidence

c = 1

must be generated by

P \in m i n P_{τ} (T^{'})

,

T^{'} \in m i n [T]

and

Q = \cup [T] - P

, i.e., the advantage of our method is that searching minimal generator is limited in

T^{'} \in m i n [T]

but not in all subsets of

\cup [T]

.

According to the hierarchical structures on closed itemsets displayed in the complete lattice

(T (R_{A}) / \sim_{τ}, \land, \lor)

, we provide Algorithm 2 to generate Min-Max association rules with high support and confidence from a fixed itemset.

Algorithm 2: Min-Max association rules mining from a fixed itemset

Input: An itemset $A_{1} = {a_{i_{1}}, a_{i_{2}}, \dots, a_{i_{k}}} \subseteq A$ with $s u p p (A_{1}) \geq s$ .
Output: Min-Max association rules with confidence $c \in (0, 1)$ .
while The stop condition is not satisfied do
(1) Generate $L (A_{1})$ in $T (R_{A})$ .
(2) Generate the set of equivalent classes such that $A_{1}^{\sim_{τ}} = {[T_{i}] | [T_{i}] \geq_{\sim_{τ}} [L (A_{1})]}$ .
(3) Select $[T_{i}]$ from $A_{1}^{\sim_{τ}}$ such that $\cup [T_{i}] \subseteq A_{1}$ ,
(4) Generate association rule

$ϕ \equiv P ⟶ Q,$

where, $P \in m i n P_{τ} (T^{'})$ , $T^{'} \in m i n [T_{i}]$ , $Q = A_{1} - P$ and $c o n f (ϕ) = \frac{s u p p (A_{1})}{s u p p (T_{i})}$ .
(5) Return to (3).
end while
Output All $ϕ \equiv P ⟶ Q$ with $s u p p (A_{1}) \geq s$ and $c o n f (ϕ) = \frac{s u p p (A_{1})}{s u p p (P)}$ .

The pseudocode provided in Algorithm 2 is responsible for mining Min-Max association rules from a fixed itemsets. Step (1) generates

L (A_{1})

, step (2) generates the set of equivalent classes, and step (4) generates the Min-Max association rules.

In Example 2, several fixed itemsets and their lower approximations are shown in Table 7, the corresponding Min-Max association rules generated by the itemsets are shown in Table 8.

Table 7. Itemsets and their lower approximations of Example 2.

Table 8. Min-Max association rules generated by the itemsets.

Finally, advantages of Algorithms 1 and 2 can be summarized as follows:

1.: Min-Max association rules are always mined from closed itemsets, in this paper, we prove that closed itemsets are maximum elements of equivalent classes, i.e., equivalent classes can be used to mine Min-Max association rules with confidence 1;
2.: The shortest length antecedents of Min-Max association rules are searched from minimal members of equivalent classes, i.e., $P \in m i n P_{τ} (T^{'})$ and $T^{'} \in m i n [T]$ ; in this paper, searching minimal generators are in smaller scope than in all subsets of closed itemsets;
3.: Lower approximations and their minimal generators help us to fast mine Min-Max association rules from a fixed itemset.

4.2. Generalized Association Rules Based on the Lower Approximation

Generalized association rules are important extension of association rules. By using taxonomy over items of transactional databases, generalized items are aggregated according to different granularity levels. In practice, generalized itemsets provide a high level view of the patterns hidden in the analyzed data and a high level abstraction of the mined knowledge in different application domains [12,30,59,60]. In this paper, each equivalent class

[T]

in

T (R_{A}) / \sim_{τ}

is understood as a granular knowledge on the set of itemsets; it can also be considered as a kind of generalized itemsets. Accordingly, we provide an alternative method to mine generalized association rules based on the lower approximation of any itemset, and analyze the changing confidence of generalized association rules. Formally, for any association rule

ψ \equiv A_{1} ⟶ A_{2}

, we propose the following three kinds of generalized association rules of

ψ

based on the lower approximation:

1.: Generalized antecedent association rule (GAR): $ψ_{G A} \equiv L (A_{1}) ⟶ A_{2}$ ;
2.: Generalized conclusion association rule (GCR): $ψ_{G C} \equiv A_{1} ⟶ L (A_{2})$ ;
3.: Generalized antecedent and conclusion association rule (GACR): $ψ_{G A C} \equiv L (A_{1}) ⟶ L (A_{2})$ .

It is obvious that if

A_{1} \in T (R_{A})

, then

L (A_{1}) = A_{1}

and

ψ = ψ_{G A}

; if

A_{2} \in T (R_{A})

, then

L (A_{2}) = A_{2}

and

ψ = ψ_{G C}

; if

A_{1}, A_{2} \in T (R_{A})

, then

L (A_{1}) = A_{1}

,

L (A_{2}) = A_{2}

and

ψ = ψ_{G A C}

.

Corollary 4.

For any association rule

ψ \equiv A_{1} ⟶ A_{2}

,

c o n f_{A} (ψ) = c

,

l = | τ (L (A_{1})) - τ (A_{1}) |

and

k = | (τ (L (A_{1})) - τ (A_{1})) \cap τ (A_{2}) |

.

1 .: $c o n f_{A} (ψ) \leq c o n f_{A} (ψ_{G C})$ ;
2 .: $c o n f_{A} (ψ_{G A}) \leq c o n f_{A} (ψ_{G A C})$ ;
3 .: If $k \geq c l$ , then $c o n f_{A} (ψ) \leq c o n f_{A} (ψ_{G A})$ ;
4 .: If $k < c l$ , then $c o n f_{A} (ψ) > c o n f_{A} (ψ_{G A})$ .

Proof.

Based on

τ (A_{1}) \subseteq τ (L (A_{1}))

and

τ (A_{2}) \subseteq τ (L (A_{2}))

, (1) and (2) can be easily proved. For (3) and (4), due to

τ (L (A_{1})) = τ (A_{1}) \cup (τ (L (A_{1})) - τ (A_{1}))

and

τ (A_{1}) \cap (τ (L (A_{1})) - τ (A_{1})) = \emptyset

, we have

c o n f_{A} (ψ_{G A})

=

\frac{| τ (L (A_{1})) \cap τ (A_{2}) |}{| τ (L (A_{1})) |}

= \frac{| (τ (A_{1}) \cup (τ (L (A_{1})) - τ (A_{1}))) \cap τ (A_{2}) |}{| τ (L (A_{1})) |}

= \frac{| (τ (A_{1}) \cap τ (A_{2})) \cup ((τ (L (A_{1})) - τ (A_{1})) \cap τ (A_{2})) |}{| τ (A_{1}) \cup (τ (L (A_{1})) - τ (A_{1})) |}

= \frac{k + | τ (A_{1}) \cap τ (A_{2}) |}{l + | τ (A_{1}) |}

, by

\frac{| τ (A_{1}) \cap τ (A_{2}) |}{| τ (A_{1}) |} = c

, if

k \geq c l

, then

\frac{k + | τ (A_{1}) \cap τ (A_{2}) |}{l + | τ (A_{1}) |} \geq \frac{c l + c | τ (A_{1}) |}{l + | τ (A_{1}) |} = c

. If

k < c l

, then

\frac{k + | τ (A_{1}) \cap τ (A_{2}) |}{l + | τ (A_{1}) |} < \frac{c l + c | τ (A_{1}) |}{l + | τ (A_{1}) |} = c

. □

Corollary 5.

Let

c o n f_{A} (ψ_{G C}) = c

,

l = | τ (L (A_{1})) - τ (A_{1}) |

and

k = | (τ (L (A_{1})) - τ (A_{1})) \cap τ (L (A_{2})) |

. 1) If

k \geq c l

, then

c o n f_{A} (ψ_{G C}) \leq c o n f_{A} (ψ_{G A C})

; 2) If

k < c l

, then

c o n f_{A} (ψ_{G C}) > c o n f_{A} (ψ_{G A C})

.

Based on

P \in m i n P_{τ} (T^{'})

,

T^{'} \in m i n [L (A_{1})]

and

\cup [L (A_{2})]

,

ψ_{G A}

,

ψ_{G C}

and

ψ_{G A C}

of association rule

ψ \equiv A_{1} ⟶ A_{2}

have the following redundant rules with the same confidence and the same support:

1.: Redundant association rule of GAR: $ψ_{G A N} \equiv P ⟶ A_{2}$ ;
2.: Redundant association rule of GCR: $ψ_{G C N} \equiv A_{1} ⟶ \cup [L (A_{2})]$ ;
3.: Redundant association rule of GACR: $ψ_{G A C N} \equiv P ⟶ \cup [L (A_{2})]$ .

Accordingly, we provide Algorithm 3 to mine generalized association rules of any association rule

ψ

and its corresponding redundant association rules.

Algorithm 3: Mining generalized association rules and redundant association rules based on the lower approximation

Input: Association rule $ψ \equiv A_{1} ⟶ A_{2}$ .
Output: Generalized association rules of $ψ$ and corresponding redundant association rules.
while The stop condition is not satisfied do
(1) Generate $L (A_{1})$ and $L (A_{2})$ in $T (R_{A})$ .
(2) Generate $\cup [L (A_{1})]$ and $\cup [L (A_{2})]$ according to set inclusion of $[L (A_{1})]$ and $[L (A_{2})]$ , respectively.
(3) Generate $T^{'} \in m i n [L (A_{1})]$ and $P \in m i n P_{τ} (T^{'})$ according to set inclusion of $[L (A_{1})]$ and the power set of $T^{'}$ , respectively.
(4) Obtain generalized association rules $ψ_{G A} \equiv L (A_{1}) ⟶ A_{2}$ , $ψ_{G C} \equiv A_{1} ⟶ L (A_{2})$ and $ψ_{G A C} \equiv L (A_{1}) ⟶ L (A_{2})$ , respectively.
(5) Generate redundant association rules $ψ_{G A N} \equiv P ⟶ A_{2}$ , $ψ_{G C N} \equiv A_{1} ⟶ \cup [L (A_{2})]$ and $ψ_{G A C N} \equiv P ⟶ \cup [L (A_{2})]$ , respectively.
(6) Return to (5).
end while
Output Generalized association rules and corresponding redundant association rules based on the lower approximation with $(s u p p (ψ_{*}), c o n f (ψ_{*}))$ .

The pseudocode provided in Algorithm 3 is responsible for mining generalized association rules and redundant association rules based on the lower approximation. Step (1) generates

L (A_{1})

and

L (A_{2})

, step (2) generates

\cup [L (A_{1})]

and

\cup [L (A_{2})]

, step (4) generates association rules.

4.3. Generalized Association Rules Based on the Upper Approximation

Similarly, for any association rule

ψ \equiv A_{1} ⟶ A_{2}

, we propose the following three kinds of generalized association rules of

ψ

based on the upper approximation:

1.: Generalized antecedent association rule (gar): $ψ_{g a} \equiv G (A_{1}) ⟶ A_{2}$ ;
2.: Generalized conclusion association rule (gcr): $ψ_{g c} \equiv A_{1} ⟶ G (A_{2})$ ;
3.: Generalized antecedent and conclusion association rule (gacr): $ψ_{g a c} \equiv G (A_{1}) ⟶ G (A_{2})$ .

It is obvious that if

A_{1} \in T (R_{A})

, then

ψ = ψ_{g a}

; if

A_{2} \in T (R_{A})

, then

ψ = ψ_{g c}

; if

A_{1}, A_{2} \in T (R_{A})

, and then

ψ = ψ_{g a c}

.

Corollary 6.

For

ψ \equiv A_{1} ⟶ A_{2}

,

c o n f_{A} (ψ) = c

,

l = | τ (A_{1}) - τ (G (A_{1})) |

and

k = | (τ (A_{1}) - τ (G (A_{1}))) \cap τ (A_{2}) |

.

1 .: $c o n f_{A} (ψ) \geq c o n f_{A} (ψ_{g c})$ ;
2 .: $c o n f_{A} (ψ_{g a}) \geq c o n f_{A} (ψ_{g a c})$
3 .: If $k > c l$ , then $c o n f_{A} (ψ) > c o n f_{A} (ψ_{g a})$ ;
4 .: If $k \leq c l$ , then $c o n f_{A} (ψ) \leq c o n f_{A} (ψ_{g a})$ .

Proof.

Based on

τ (A_{1}) \supseteq τ (G (A_{1}))

and

τ (A_{2}) \supseteq τ (G (A_{2}))

, (1) and (2) can be easily proved. For (3) and (4), due to

τ (A_{1}) = τ (G (A_{1})) \cup (τ (A_{1}) - τ (G (A_{1})))

and

τ (G (A_{1})) \cap (τ (A_{1}) - τ (G (A_{1}))) = \emptyset

, we have

\begin{matrix} c o n f_{A} (ψ) & = & \frac{| τ (A_{1}) \cap τ (A_{2}) |}{| τ (A_{1}) |} = \frac{| (τ (G (A_{1})) \cup (τ (A_{1}) - τ (G (A_{1})))) \cap τ (A_{2}) |}{| τ (G (A_{1})) \cup (τ (A_{1}) - τ (G (A_{1}))) |} \\ = & \frac{| (τ (G (A_{1})) \cap τ (A_{2})) \cup ((τ (A_{1}) - τ (G (A_{1}))) \cap τ (A_{2})) |}{| τ (G (A_{1})) \cup (τ (A_{1}) - τ (G (A_{1}))) |} \\ = & \frac{| τ (G (A_{1})) \cap τ (A_{2}) | + k}{| τ (G (A_{1})) | + l} = c, \end{matrix}

i.e.,

| τ (G (A_{1})) \cap τ (A_{2}) | - c | τ (G (A_{1})) | = c l - k

. If

k > c l

, then

| τ (G (A_{1})) \cap τ (A_{2}) | - c | τ (G (A_{1})) | < 0

, i.e.,

c o n f_{A} (ψ_{g a}) = \frac{| τ (G (A_{1})) \cap τ (A_{2}) |}{| τ (G (A_{1})) |} < c

. If

k \leq c l

, then

| τ (G (A_{1})) \cap τ (A_{2}) | - c | τ (G (A_{1})) | \geq 0

, i.e.,

c o n f_{A} (ψ_{g a}) = \frac{| τ (G (A_{1})) \cap τ (A_{2}) |}{| τ (G (A_{1})) |} \geq c

. □

Corollary 7.

Let

c o n f_{A} (ψ_{g c}) = c

,

l = | τ (A_{1}) - τ (G (A_{1})) |

and

k = | (τ (A_{1}) - τ (G (A_{1}))) \cap τ (G (A_{2})) |

. (1) If

k > c l

, then

c o n f_{A} (ψ_{g c}) > c o n f_{A} (ψ_{g a c})

; (2) If

k \leq c l

, then

c o n f_{A} (ψ_{g c}) \leq c o n f_{A} (ψ_{g a c})

.

Based on

P \in m i n P_{τ} (T^{'})

,

T^{'} \in m i n [G (A_{1})]

and

\cup [G (A_{2})]

,

ψ_{g a}

,

ψ_{g c}

and

ψ_{g a c}

of association rule

ψ \equiv A_{1} ⟶ A_{2}

have the following redundant rules with the same confidence and support:

1.: Redundant association rule of gar: $ψ_{g a n} \equiv P ⟶ A_{2}$ ;
2.: Redundant association rule of gcr: $ψ_{g c n} \equiv A_{1} ⟶ \cup [G (A_{2})]$ ;
3.: Redundant association rule of gacr: $ψ_{g a c n} \equiv P ⟶ \cup [G (A_{2})]$ .

Accordingly, we provide Algorithm 4 to mine generalized association rules of

ψ

and corresponding redundant association rules.

The pseudocode provided in Algorithm 4, is responsible for mining generalized association rules and redundant association rules based on the upper approximation. Step (1) generates

G (A_{1})

and

G (A_{2})

, step (2) generates

\cup [G (A_{1})]

and

\cup [G (A_{2})]

, step (4) generates association rules, and step (5) generates redundant association rules.

Algorithm 4 Mining generalized association rules and redundant association rules based on the upper approximation

Input: Association rule $ψ \equiv A_{1} ⟶ A_{2}$ .
Output: Generalized association rules of $ψ$ and corresponding redundant association rules.
while The stop condition is not satisfied do
(1) Generate $G (A_{1})$ and $G (A_{2})$ in $T (R_{A})$ .
(2) Generate $\cup [G (A_{1})]$ and $\cup [G (A_{2})]$ according to set inclusion of $[G (A_{1})]$ and $[G (A_{2})]$ , respectively.
(3) Generate $T^{'} \in m i n [G (A_{1})]$ and $P \in m i n P_{τ} (T^{'})$ according to set inclusion of $[G (A_{1})]$ and the power set of $T^{'}$ , respectively.
(4) Obtain generalized association rules $ψ_{g a r} \equiv G (A_{1}) ⟶ A_{2}$ , $ψ_{g c r} \equiv A_{1} ⟶ G (A_{2})$ and $ψ_{g a c r} \equiv G (A_{1}) ⟶ G (A_{2})$ , respectively.
(5) Generate redundant association rules $ψ_{g a n} \equiv P ⟶ A_{2}$ , $ψ_{g c n} \equiv A_{1} ⟶ \cup [G (A_{2})]$ and $ψ_{g a c n} \equiv P ⟶ \cup [G (A_{2})]$ , respectively.
(6) Return to (5).
end while
Output Generalized association rules and corresponding redundant association rules based on the upper approximation with $(s u p p (ψ_{*}), c o n f (ψ_{*}))$ .

5. Example Analysis

Experiments were made to compare the execution time, memory usage and numbers of association rules of the Apriori [1,61] algorithms and our method. They were implemented on a Thinkpad X1 laptop with Intel i5 Core Duo (2 × 2.4 GHz), 4 GB of RAM and running Windows 10. The algorithms were coded in Matlab 2015b. Four databases from the UCI databases [62] were used for the experiments, of which the features are shown in Table 9.

Table 9. Dataset characteristics.

5.1. The Execution Time

Experiments were made to compare the execution time of the algorithms Apriori and ours.The minConf was set to 50%. The results of the four databases for various minSup values are shown in Figure 4, Figure 5, Figure 6 and Figure 7, which shows that the execution of our algorithm was faster than Apriori in all cases. For example, given

m i n S u p = 75 %

, for the Chess database, the mining time of Apriori and ours were 3035.53(s) and 507.33(s), respectively. The time ratio is

\frac{507.33}{3035.53} \times 100 % = 16.71 %

. Besides, as the minSup is decreased, the time ratio is reduced also. For example, consider the Connect database with

m i n S u p

set at 98%, 96% and 94%, the speed up of the time ratio were 91.93%, 7.26% and 1.56%, respectively. The results demonstrate that the execution time of mining rules from ours was more efficient than Apriori.

Figure 4. Execution time of the two algorithms in Chess dataset for various minSup values.

Figure 5. Execution time of the two algorithms in Connect dataset for various minSup values.

Figure 6. Execution time of the two algorithms in Mushroom dataset for various minSup values.

Figure 7. Execution time of the two algorithms in Zoon dataset for various minSup values.

5.2. The Memory Usage

Experiments were made with the same databases and same parameters as the execution time experiments. The results show that the Apriori algorithm consumed more memory than ours in almost all cases, which are shown if Figure 8, Figure 9, Figure 10 and Figure 11. For example, given

m i n S u p = 75 %

, for Chess database, the memory usage of Apriori and ours were 672 (Mb) and 136.54 (Mb), respectively. The time ratio is

\frac{136.54}{672} \times 100 % = 20.32 %

. Moreover, when we decrease the

m i n S u p

, the time ratio will reduce as well. For example, consider the Connect database with

m i n S u p

set at 98%, 96% and 94%, the speed up of the time ratio were 87.55%, 75.84% and 9.18%, respectively.

Figure 8. Memory usage of the two algorithms in Chess dataset for various minSup values.

Figure 9. Memory usage of the two algorithms in Connect dataset for various minSup values.

Figure 10. Memory usage of the two algorithms in Mushroom dataset for various minSup values.

Figure 11. Memory usage of the two algorithms in Zoon dataset for various minSup values.

5.3. Numbers of Rules

We compare the numbers of association rules of Apriori with ours with the same databases and same parameters as the execution time experiments in Table 10. The results demonstrate that the numbers of ours is always smaller than those of Apriori. For example, given

m i n S u p = 75 %

, for Chess database, the number of rules of Apriori and ours were 2,336,556 and 253,836, respectively. The time ratio is

\frac{253836}{2336556} \times 100 % = 10.86 %

. For our algorithm generating Min-Max association rules, the numbers of rules of ours is smaller than Apriori.

Table 10. Number of rules of Apriori and Ours.

In this section, some examples are provided to show our method in association rules mining; transactional databases come from [62] and their characteristics are shown in Table 9. The Zoo database is initially used for classification of animals; in this paper, association rules among attributes of animals are considered, in which Min-Max association rules with confidence

c = 1

and confidence

c \in (0, 1)

will be shown, respectively. For mushroom database, connect-4 and chess, the topology, basis for the topology and equivalent classes with thresholds of support are obtained, and Min-Max association rules and generalized association rules are generated. Because the Reliable basis used to retrieve and reduce association rules in [63] is similar to our method, our results are compared with the method based on the Reliable basis limited in the number of Min-Max association rules and redundant rules.

Example 3.

In the Zoo database, there are 101 objects (animals) and 17 attributes (15 boolean, 2 numeric). Here, we discover association rules among 15 boolean attributes of animals, i.e., hair (

h a

), feathers (

f e

), eggs (

e g

), milk (

m i

), airborne (

a i

), aquatic (

a q

), predator (

p r

), toothed (

t o

), backbone (

b a

), breathes (

b r

), venomous (

v e

), fins (

f i

), tail (

t a

), domestic (

d o

) and catsize (

c a

).

According to Definition 1, 15 bases are generated (shown in Table A1 of Appendix A). Members of topology and equivalent classes with

s u p p_{r} (*)

are shown in Table A2 of Appendix A. Equivalent classes used to generate association rules with confidence

c = 1

are shown in Table A3 of Appendix A. According to Table A3, 7 Min-Max association rules with

s u p p_{r} (*) \geq 0.4

and confidence

c = 1

are generated (shown in Table A4 of Appendix A), in which

m i ⟶ b a \land b r

(rule 4) is considered redundant of

m i \land b a ⟶ b r

(rule 5) and

m i \land b r ⟶ b a

(rule 6) due to

{m i} \subset {m i, b a}

,

{m i} \subset {m i, b r}

,

{b r} \subset {b a, b r}

and

{b a} \subset {b a, b r}

; i.e., we use less conditions to achieve more results by

m i ⟶ b a \land b r

with the same support and confidence. Furthermore, some Min-Max association rules generated in the quotient lattice of the topology with

s u p p_{r} (*) \geq 0.4

and confidence

c \in (0, 1)

are shown in Table A5 of Appendix A, in which the Min-Max association rule

b r ⟶ b a

and

t a ⟶ b a

can be considered as a GAR of

b r \land t a ⟶ b a

, respectively.

Example 4.

The mushroom database consists of a database with 8124 objects (mushrooms) and 22 nominally valued attributes. Here, we convert 22 nominally valued attributes as 126 boolean attributes and generate 111 bases. Change of bases with different

s u p p_{r} (*)

is shown in Figure A1 of Appendix A. Members of topology, equivalent classes and its generating time by using different bases are shown in Figure A2 of Appendix A.

Table A6 of Appendix A shows Min-Max association rules with

s u p p_{r} (*) \geq 0.8

and

c = 1

. Compared with results in [63], rules 1–8 and 11 are same with [63], and rules 9, 10, 12 and 13 are new rules, in which rule 8 can be considered as a redundant rule of rules 9 and 10, rule 11 being a redundant rule of rules 12 and 13.

Table A7 of Appendix A show Min-Max association rules with

s u p p_{r} (*) \geq 0.8

and

c \in [0.8, 1)

, in which rules 1–25 are also generated in [63]; rules 26-41 are new rules.

Table A8 of Appendix A shows comparative results, in which

T R X

means the number of all approximate rules in [63], Min-Max (

N, R t

) and Reliable (

N, R t

) means number and reduction ratio of Min-Max basis rules and reliable basis rules generated by Min-Max approximate basis, Min-Max exact basis, reliable approximate basis and reliable exact basis in [63].

T R

means the number of all rules generated in the quotient lattice

(T (R_{A}) / \sim_{τ}, \land, \lor)

, N-Min-Max (

N, R t

) means the number and reduction ratio of Min-Max association rules generated in

(T (R_{A}) / \sim_{τ}, \land, \lor)

, the reduction ratio is calculated as

R t = \frac{T R X - N}{T R X}

or

\frac{T R - N}{T R}

. In Table A8, it can be noticed that the association rules used to obtain a reduction in the quotient lattice

(T (R_{A}) / \sim_{τ}, \land, \lor)

are less than in [63], such as for

s u p p_{r} (*) \geq 0.4

, 2528 association rules are used to obtain reduction. The lower

s u p p_{r} (*)

and confidence c is corresponding to more generated association rules and higher reduction ratio in the quotient lattice

(T (R_{A}) / \sim_{τ}, \land, \lor)

, such as for

s u p p_{r} (*) \geq 0.4

and confidence

c \geq 0.5

, 1825 rules are generated and the reduction ratio is 0.77.

Example 5.

In the connect-4 database, the number of reduction association rules with confidence

c \geq 0.5

and compared with in [63] are shown in Table A9 of Appendix A. For template

A = {d_{5} - b, d_{6} - b, e_{5} - b, f_{5} - b, f_{6} - b}

and rule

ϕ \equiv a_{6} - b \land e_{4} - b \land f_{5} - b ⟶ e_{6} - b \land f_{6} - b

,

\begin{matrix} L ({d_{5} - b, d_{6} - b, e_{5} - b, f_{5} - b, f_{6} - b}) & = & {d_{5} - b, d_{6} - b, f_{5} - b, f_{6} - b}, \\ m i n [{d_{5} - b, d_{6} - b, f_{5} - b, f_{6} - b}] & = & {d_{5} - b, f_{5} - b}, \\ L ({a_{6} - b, e_{4} - b, f_{5} - b}) & = & {a_{6} - b}, \\ G ({a_{6} - b, e_{4} - b, f_{5} - b}) & = & {a_{6} - b, e_{4} - b, e_{5} - b, f_{5} - b}, \\ L ({e_{6} - b, f_{6} - b}) & = & {e_{6} - b, f_{6} - b}, \\ G ({e_{6} - b, f_{6} - b}) & = & {e_{6} - b, f_{6} - b} . \end{matrix}

Accordingly, Table A10 shows the association rule of the template and generalized association rules in the connect-4 database.

In the chess database, the number of reduction association rules with confidence

c \geq 0.5

and compared with in [63] are shown in Table A11 of Appendix A. For template

A = {m u l c h_{-} f, s k a c h_{-} f,

w k n a 8_{-} f}

and rule

ϕ \equiv q x m s q_{-} f \land s p c o p_{-} f \land w k n a 8_{-} f ⟶ b k o n 8_{-} f \land h d c h k_{-} f

,

\begin{matrix} L ({m u l c h_{-} f, s k a c h_{-} f, w k n a 8_{-} f}) & = & {m u l c h_{-} f, s k a c h_{-} f}, \\ m i n [{m u l c h_{-} f, s k a c h_{-} f}] & = & {m u l c h_{-} f, s k a c h_{-} f}, \\ L ({q x m s q_{-} f, s p c o p_{-} f, w k n a 8_{-} f}) & = & {q x m s q_{-} f, s p c o p_{-} f}, \\ G ({q x m s q_{-} f, s p c o p_{-} f, w k n a 8_{-} f}) & = & {q x m s q_{-} f, s p c o p_{-} f, w k n a 8_{-} f, s t l m t_{-} f}, \\ L ({b k o n 8_{-} f, h d c h k_{-} f}) & = & {b k o n 8_{-} f, h d c h k_{-} f, r e s k d_{-} f}, \\ G ({b k o n 8_{-} f, h d c h k_{-} f}) & = & {b k o n 8_{-} f, h d c h k_{-} f, r e s k d_{-} f, t h r s k_{-} f, s p c o p_{-} f} . \end{matrix}

Accordingly, Table A12 shows the association rule of the template and generalized association rules in the chess database.

6. Conclusions

Association rules are often generated by frequent itemsets or closed itemsets from transactional databases. In order to obtain these itemsets, many methods have been proposed. In this article, for representing more general associative relations, which are among the items of transaction databases, the topology on itemset of a transactional database can been constructed. The topology on itemset includes frequent itemsets and closed itemsets. The topology on itemset includes frequent itemsets and closed itemsets, which has been proved. Most important of all, the basis of the topology can be used to generate the topology on the itemset, which deduced from the transactional database. Using the basis of the topology can efficiently avoid scanning databases many times in extracting association rules. The quotient lattice of the topology displays the hierarchical structures on all itemsets, because every closed itemset and its generators or minimal generators are limited in an element of the quotient lattice, valid Min-Max association rules can be easily generated in the element. Because the quotient lattice of the topology provides granular concepts to approximate any template of itemset, reductant association rules can be easily generated by granular concepts. The experiment demonstrates that association rules mining using topology for itemset is an efficient method.

Author Contributions

Conceptualization, Z.P.; methodology, Z.P. and B.L.; validation, C.Z. and F.H.; Writing—original draft, B.L.; Writing—review and editing, Z.P., formal analysis, Z.P. and B.L.; All authors have read and agreed to the published version of the manuscript.

Funding

This work has been partially supported by Talent introduction project of Xihua University (Z202104) and the Opening Project of Intelligent Policing Key Laboratory of Sichuan Province (Grant no. ZNJW2022KFMS004, ZNJW2022KFQN002, ZNJW2023KFQN007).

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A

Table A1. Basis

B_{A}

of Zoo database.

Table A1. Basis

B_{A}

of Zoo database.

Attributes	$ha$	$fe$	$eg$	$mi$	$ai$
Basis	${h a, b r}$	${f e, e g, b a, b r, t a}$	${e g}$	${m i, b a, b r}$	${a i, b r}$
Attributes	$a q$	$p r$	$t o$	$b a$	$b r$
Basis	${a q}$	${p r}$	${t o, b a}$	${b a}$	${b r}$
Attributes	$v e$	$f i$	$t a$	$d o$	$c a$
Basis	${v e}$	${a q, t o, b a, f i}$	${t a}$	${d o}$	${c a}$

Table A2. Members of topology and equivalent classes with

s u p p_{r} (*) \geq c

.

Table A2. Members of topology and equivalent classes with

s u p p_{r} (*) \geq c

.

c	0	$0.2$	$0.3$	$0.5$
Members of topology	4159	98	54	11
Equivalent classes	237	75	44	11

Table A3. Equivalent classes used to generate association rules with

s u p p_{r} (*) \geq 0.3

and confidence

c = 1

.

Table A3. Equivalent classes used to generate association rules with

s u p p_{r} (*) \geq 0.3

and confidence

c = 1

.

Equivalent Classes	$\| [*] \|$	$\min [*]$	$P_{τ} (T^{'})$
$[{b a, b r, c a}]$	2	${{b r, c a}}$	${{b r, c a}}$
$[{b a, b r, t a, c a}]$	2	${{b r, t a, c a}}$	${{b r, t a, c a}}$
$[{b a, b r, t o}]$	1	${{b a, b r, t o}}$	${{b r, t o}}$
$[{h a, b r}]$	1	${{h a, b r}}$	${{h a}}$
$[{t o, b a, b r, t a}]$	1	${{t o, b a, b r, t a}}$	${{b r, t o, t a}}$
$[{m i, t o, b a, b r}]$	1	${{m i, t o, b a, b r}}$	${{m i, t o}, {m i, t o, b a},$
			${m i, t o, b r}}$
$[{h a, m i, b a, b r}]$	2	${{h a, b a, b r}}$	${{h a, b a}, {h a, b a, b r}}$
$[{m i, t o, b a, b r, t a}]$	1	${{m i, t o, b a, b r, t a}}$	${{m i, t o, t a}, {m i, t o, b a, t a},$
			${m i, t o, b r, t a}}$
$[{e g, b a, t a}]$	2	${{e g, t a}}$	${{e g, t a}}$
$[{h a, m i, b a, b r, t a}]$	3	${{h a, b r, t a}}$	${{h a, t a}, {h a, b r, t a}}$
$[{h a, m i, t o, b a, b r, t a}]$	2	${{h a, t o, b a, b r, t a}}$	${{h a, t o, t a}, {h a, t o, b a, b r, t a}$ ,
			${h a, t o, b r, t a}, {h a, t o, b a, t a}}$
$[{p r, t o, b a, t a}]$	1	${{p r, t o, b a, t a}}$	${{p r, t o, t a}}$
$[{t o, b a}]$	1	${{t o, b a}}$	${{t o}}$
$[{p r, t o, b a}]$	1	${{p r, t o, b a}}$	${{p r, t o}}$
$[{m i, b a, b r}]$	1	${{m i, b a, b r}}$	${{m i}, {m i, b a}, {m i, b r}}$
$[{m i, b a, b r, t a}]$	1	${{m i, b a, b r, t a}}$	${{m i, t a}, {m i, b a, t a},$
			${m i, b r, t a}}$
$[{t o, b a, c a}]$	1	${{t o, b a, c a}}$	${{t o, c a}}$
$[{m i, b a, b r, c a}]$	1	${{m i, b a, b r, c a}}$	${{m i, c a}, {m i, b a, c a},$
			${m i, b r, c a}}$
$[{m i, t o, b a, b r, c a}]$	2	${{t o, b a, b r, c a}}$	${{t o, b a, c a}, {t o, b a, b r, c a}}$
$[{t o, b a, t a}]$	1	${{t o, b a, t a}}$	${{t o, t a}}$
$[{h a, m i, t o, b a, b r}]$	2	${{h a, t o, b a, b r}}$	${{h a, t o}, {h a, t o, b a},$
			${h a, t o, b r}, {h a, t o, b a, b r}}$
$[{b a, t a, c a}]$	2	${{t a, c a}}$	${{t a, c a}}$

Table A4. All Min-Max association rules with

s u p p_{r} (*) \geq 0.4

and

c = 1

.

Table A4. All Min-Max association rules with

s u p p_{r} (*) \geq 0.4

and

c = 1

.

Numbers	Rule $ϕ ({supp}_{r} (ϕ), c)$
1	$b r \land t o ⟶ b a (0.4653, 1)$
2	$h a ⟶ b r (0.4257, 1)$
3	$t o ⟶ b a (0.604, 1)$
4	$m i ⟶ b a \land b r (0.4059, 1)$
5	$m i \land b a ⟶ b r (0.4059, 1)$
6	$m i \land b r ⟶ b a (0.4059, 1)$
7	$t o \land t a ⟶ b a (0.5149, 1)$

Table A5. Some Min-Max association rules with

s u p p_{r} (*) \geq 0.4

and

0.8 \leq c < 1

.

Table A5. Some Min-Max association rules with

s u p p_{r} (*) \geq 0.4

and

0.8 \leq c < 1

.

Numbers	Rule $ϕ ({supp}_{r} (ϕ), c)$
1	$p r \land b a ⟶ t a (0.4059, 0.8723)$
2	$p r \land t a ⟶ b a (0.4059, 0.9762)$
3	$t a ⟶ b r (0.604, 0.8133)$
4	$p r ⟶ b a (0.4653, 0.8393)$
5	$c a ⟶ b a (0.4257, 0.9773)$
6	$t o ⟶ b a \land t a (0.5149, 0.8525)$
7	$t o \land b a ⟶ t a (0.5149, 0.8525)$
8	$b a ⟶ b r (0.6832, 0.8313)$
9	$b r ⟶ b a (0.6832, 0.8625)$
10	$b a ⟶ t a (0.7327, 0.8916)$
11	$t a ⟶ b a (0.7327, 0.9867)$
12	$t a ⟶ b a \land b r (0.5941, 0.8)$
13	$b a \land b r ⟶ t a (0.5941, 0.8696)$
14	$b a \land t a ⟶ b r (0.5941, 0.8108)$
15	$b r \land t a ⟶ b a (0.5941, 0.9836)$

Figure A1. Change of bases in the mushroom database corresponding to

s u p p_{r} (*)

.

Figure A2. Members of topology, equivalent classes and its generating time in the mushroom database corresponding to bases.

Table A6. All Min-Max association rules in the mushroom database with

s u p p_{r} (*) \geq 0.8

and

c = 1

.

Table A6. All Min-Max association rules in the mushroom database with

s u p p_{r} (*) \geq 0.8

and

c = 1

.

N	Rule $ϕ ({supp}_{r} (ϕ), c)$
1	$g i l l_{-} a t t a c h m e n t_{-} f ⟶ v e i l_{-} t y p e_{-} p (0.9742, 1)$
2	$g i l l_{-} s p a c i n g_{-} c ⟶ v e i l_{-} t y p e_{-} p (0.8385, 1)$
3	$v e i l_{-} c o l o r_{-} w ⟶ v e i l_{-} t y p e_{-} p (0.9754, 1)$
4	$r i n g_{-} n u m b e r_{-} o ⟶ v e i l_{-} t y p e_{-} p (0.9217, 1)$
5	$g i l l_{-} a t t a c h m e n t_{-} f \land v e i l_{-} c o l o r_{-} w ⟶ v e i l_{-} t y p e_{-} p (0.9732, 1)$
6	$g i l l_{-} a t t a c h m e n t_{-} f \land r i n g_{-} n u m b e r_{-} o ⟶ v e i l_{-} t y p e_{-} p (0.8981, 1)$
7	$g i l l_{-} s p a c i n g_{-} c \land v e i l_{-} c o l o r_{-} w ⟶ v e i l_{-} t y p e_{-} p (0.8149, 1)$
8	$g i l l_{-} a t t a c h m e n t_{-} f \land g i l l_{-} s p a c i n g_{-} c ⟶ v e i l_{-} c o l o r_{-} w \land v e i l_{-} t y p e_{-} p (0.8127, 1)$
9	$g i l l_{-} a t t a c h m e n t_{-} f \land g i l l_{-} s p a c i n g_{-} c \land v e i l_{-} t y p e_{-} p ⟶ v e i l_{-} c o l o r_{-} w (0.8127, 1)$
10	$g i l l_{-} a t t a c h m e n t_{-} f \land g i l l_{-} s p a c i n g_{-} c \land v e i l_{-} c o l o r_{-} w ⟶ v e i l_{-} t y p e_{-} p (0.8127, 1)$
11	$v e i l_{-} c o l o r_{-} w \land r i n g_{-} n u m b e r_{-} o ⟶ g i l l_{-} a t t a c h m e n t_{-} f \land v e i l_{-} t y p e_{-} p (0.8971, 1)$
12	$g i l l_{-} a t t a c h m e n t_{-} f \land v e i l_{-} c o l o r_{-} w \land r i n g_{-} n u m b e r_{-} o ⟶ v e i l_{-} t y p e_{-} p (0.8971, 1)$
13	$v e i l_{-} t y p e_{-} p \land v e i l_{-} c o l o r_{-} w \land r i n g_{-} n u m b e r_{-} o ⟶ g i l l_{-} a t t a c h m e n t_{-} f (0.8971, 1)$

Table A7. Min-Max association rules with

s u p p_{r} (*) \geq 0.8

and

0.8 \leq c < 1

.

Table A7. Min-Max association rules with

s u p p_{r} (*) \geq 0.8

and

0.8 \leq c < 1

.

Numbers	Rule $ϕ ({supp}_{r} (ϕ), c)$
1	$v e i l_{-} t y p e_{-} p ⟶ g i l l_{-} a t t a c h m e n t_{-} f (0.9742, 0.9742)$
2	$v e i l_{-} t y p e_{-} p ⟶ g i l l_{-} s p a c i n g_{-} c (0.8385, 0.8385)$
3	$v e i l_{-} t y p e_{-} p ⟶ v e i l_{-} c o l o r_{-} w (0.9754, 0.9754)$
4	$v e i l_{-} t y p e_{-} p ⟶ r i n g_{-} n u m b e r_{-} o (0.9217, 0.9217)$
5	$v e i l_{-} t y p e_{-} p ⟶ g i l l_{-} a t t a c h m e n t_{-} f \land v e i l_{-} c o l o r_{-} w (0.9732, 0.9732)$
6	$v e i l_{-} t y p e_{-} p ⟶ g i l l_{-} a t t a c h m e n t_{-} f \land r i n g_{-} n u m b e r_{-} o (0.8981, 0.8981)$
7	$v e i l_{-} t y p e_{-} p ⟶ g i l l_{-} s p a c i n g_{-} c \land v e i l_{-} c o l o r_{-} w (0.8149, 0.8149)$
8	$v e i l_{-} t y p e_{-} p ⟶ g i l l_{-} a t t a c h m e n t_{-} f \land g i l l_{-} s p a c i n g_{-} c \land v e i l_{-} c o l o r_{-} w (0.8127, 0.8127)$
9	$v e i l_{-} t y p e_{-} p ⟶ g i l l_{-} a t t a c h m e n t_{-} f \land v e i l_{-} c o l o r_{-} w \land r i n g_{-} n u m b e r_{-} o (0.8971, 0.8971)$
10	$g i l l_{-} a t t a c h m e n t_{-} f ⟶ v e i l_{-} t y p e_{-} p \land v e i l_{-} c o l o r_{-} w (0.9732, 0.999)$
11	$g i l l_{-} a t t a c h m e n t_{-} f ⟶ v e i l_{-} t y p e_{-} p \land r i n g_{-} n u m b e r_{-} o (0.8981, 0.9219)$
12	$g i l l_{-} a t t a c h m e n t_{-} f ⟶ g i l l_{-} s p a c i n g_{-} c \land v e i l_{-} c o l o r_{-} w \land v e i l_{-} t y p e_{-} p (0.8127, 0.8342)$
13	$g i l l_{-} a t t a c h m e n t_{-} f ⟶ v e i l_{-} c o l o r_{-} w \land r i n g_{-} n u m b e r_{-} o \land v e i l_{-} t y p e_{-} p (0.8971, 0.9209)$
14	$g i l l_{-} s p a c i n g_{-} c ⟶ g i l l_{-} a t t a c h m e n t_{-} f \land v e i l_{-} c o l o r_{-} w \land v e i l_{-} t y p e_{-} p (0.8127, 0.9692)$
15	$g i l l_{-} s p a c i n g_{-} c ⟶ v e i l_{-} c o l o r_{-} w \land v e i l_{-} t y p e_{-} p (0.8149, 0.9718)$
16	$v e i l_{-} c o l o r_{-} w ⟶ g i l l_{-} s p a c i n g_{-} c \land v e i l_{-} t y p e_{-} p (0.8149, 0.8354)$
17	$v e i l_{-} c o l o r_{-} w ⟶ g i l l_{-} a t t a c h m e n t_{-} f \land g i l l_{-} s p a c i n g_{-} c \land v e i l_{-} t y p e_{-} p (0.8127, 0.8332)$
18	$v e i l_{-} c o l o r_{-} w ⟶ g i l l_{-} a t t a c h m e n t_{-} f \land v e i l_{-} t y p e_{-} p (0.9732, 0.9977)$
19	$v e i l_{-} c o l o r_{-} w ⟶ g i l l_{-} a t t a c h m e n t_{-} f \land r i n g_{-} n u m b e r_{-} o \land v e i l_{-} t y p e_{-} p (0.8971, 0.9197)$
20	$r i n g_{-} n u m b e r_{-} o ⟶ g i l l_{-} a t t a c h m e n t_{-} f \land v e i l_{-} t y p e_{-} p (0.8981, 0.9744)$
21	$r i n g_{-} n u m b e r_{-} o ⟶ g i l l_{-} a t t a c h m e n t_{-} f \land v e i l_{-} c o l o r_{-} w \land v e i l_{-} t y p e_{-} p (0.8971, 0.9733)$
22	$g i l l_{-} a t t a c h m e n t_{-} f \land v e i l_{-} c o l o r_{-} w ⟶ r i n g_{-} n u m b e r_{-} o \land v e i l_{-} t y p e_{-} p (0.8971, 0.9218)$
23	$g i l l_{-} a t t a c h m e n t_{-} f \land v e i l_{-} c o l o r_{-} w ⟶ g i l l_{-} s p a c i n g_{-} c \land v e i l_{-} t y p e_{-} p (0.8127, 0.8351)$
24	$g i l l_{-} a t t a c h m e n t_{-} f \land r i n g_{-} n u m b e r_{-} o ⟶ v e i l_{-} c o l o r_{-} w \land v e i l_{-} t y p e_{-} p (0.8971, 0.9989)$
25	$g i l l_{-} s p a c i n g_{-} c \land v e i l_{-} c o l o r_{-} w ⟶ g i l l_{-} a t t a c h m e n t_{-} f \land v e i l_{-} t y p e_{-} p (0.8127, 0.9973)$
26	$g i l l_{-} a t t a c h m e n t_{-} f \land v e i l_{-} t y p e_{-} p ⟶ v e i l_{-} c o l o r_{-} w \land r i n g_{-} n u m b e r_{-} o (0.8971, 0.9209)$
27	$g i l l_{-} a t t a c h m e n t_{-} f \land v e i l_{-} t y p e_{-} p ⟶ r i n g_{-} n u m b e r_{-} o (0.8981, 0.9219)$
28	$v e i l_{-} t y p e_{-} p \land v e i l_{-} c o l o r_{-} w ⟶ g i l l_{-} a t t a c h m e n t_{-} f \land r i n g_{-} n u m b e r_{-} o (0.8971, 0.9197)$
29	$v e i l_{-} t y p e_{-} p \land r i n g_{-} n u m b e r_{-} o ⟶ g i l l_{-} a t t a c h m e n t_{-} f \land v e i l_{-} c o l o r_{-} w (0.8971, 0.9733)$
30	$v e i l_{-} t y p e_{-} p \land r i n g_{-} n u m b e r_{-} o ⟶ g i l l_{-} a t t a c h m e n t_{-} f (0.8981, 0.9744)$
31	$g i l l_{-} a t t a c h m e n t_{-} f \land v e i l_{-} t y p e_{-} p \land v e i l_{-} c o l o r_{-} w ⟶ r i n g_{-} n u m b e r_{-} o (0.8971, 0.9218)$
32	$g i l l_{-} a t t a c h m e n t_{-} f \land v e i l_{-} t y p e_{-} p \land r i n g_{-} n u m b e r_{-} o ⟶ v e i l_{-} c o l o r_{-} w (0.8971, 0.9989)$
33	$g i l l_{-} a t t a c h m e n t_{-} f \land v e i l_{-} t y p e_{-} p ⟶ g i l l_{-} s p a c i n g_{-} c \land v e i l_{-} c o l o r_{-} w (0.8127, 0.8342)$
34	$g i l l_{-} s p a c i n g_{-} c \land v e i l_{-} t y p e_{-} p ⟶ g i l l_{-} a t t a c h m e n t_{-} f \land v e i l_{-} c o l o r_{-} w (0.8127, 0.9692)$
35	$v e i l_{-} t y p e_{-} p \land v e i l_{-} c o l o r_{-} w ⟶ g i l l_{-} a t t a c h m e n t_{-} f \land g i l l_{-} s p a c i n g_{-} c (0.8127, 0.8332)$
36	$g i l l_{-} a t t a c h m e n t_{-} f \land v e i l_{-} t y p e_{-} p \land v e i l_{-} c o l o r_{-} w ⟶ g i l l_{-} s p a c i n g_{-} c (0.8127, 0.8351)$
37	$g i l l_{-} s p a c i n g_{-} c \land v e i l_{-} t y p e_{-} p \land v e i l_{-} c o l o r_{-} w ⟶ g i l l_{-} a t t a c h m e n t_{-} f (0.8127, 0.9973)$
38	$g i l l_{-} a t t a c h m e n t_{-} f \land v e i l_{-} t y p e_{-} p ⟶ v e i l_{-} c o l o r_{-} w (0.9732, 0.999)$
39	$v e i l_{-} t y p e_{-} p \land v e i l_{-} c o l o r_{-} w ⟶ g i l l_{-} a t t a c h m e n t_{-} f (0.9732, 0.9977)$
40	$g i l l_{-} s p a c i n g_{-} c \land v e i l_{-} t y p e_{-} p ⟶ v e i l_{-} c o l o r_{-} w (0.8149, 0.9718)$
41	$v e i l_{-} t y p e_{-} p \land v e i l_{-} c o l o r_{-} w ⟶ g i l l_{-} s p a c i n g_{-} c (0.8149, 0.8354)$

Table A8. The number of approximate rules and reduction in the mushroom database with confidence

c \geq 0.5

.

Table A8. The number of approximate rules and reduction in the mushroom database with confidence

c \geq 0.5

.

${supp}_{r} (*)$	$TRX$	Min-Max ( $N, Rt$ )	Reliable ( $N, Rt$ )	$TR$	N-Min-Max ( $N, Rt$ )
0.4	2528	(465, 0.82)	(361, 0.86)	1825	(420, 0.77)
0.5	835	(175, 0.79)	(135, 0.84)	514	(190, 0.63)
0.6	228	(59, 0.74)	(52, 0.77)	136	(65, 0.52)
0.7	161	(39, 0.74)	(34, 0.79)	90	(41, 0.54)
Average		0.77	0.82		0.62

Table A9. Number of approximate rules and reduction in the connect-4 database with confidence

c \geq 0.5

.

Table A9. Number of approximate rules and reduction in the connect-4 database with confidence

c \geq 0.5

.

${supp}_{r} (*)$	$TRX$	Min-Max ( $N, Rt$ )	Reliable ( $N, Rt$ )	$TR$	N-Min-Max ( $N, Rt$ )
0.94	199,560	(49,407, 0.75)	(10,220, 0.95)	88,116	( $7914, 0.91$ )
0.95	77,206	(24,794, 0.68)	(5245, 0.93)	39,768	(4731, 0.88)
0.96	26,856	(11,452, 0.57)	(2538, 0.91)	16,356	(2535, 0.85)
0.97	7895	(4439, 0.44)	(1214, 0.85)	5690	(1294, 0.77)
Average		0.61	0.91		0.85

Table A10. Templates, association rules and generalized association rules in the connect-4 database.

Template 1	$A = {d_{5} - b, d_{6} - b, e_{5} - b, f_{5} - b, f_{6} - b} ({supp}_{r} (A) = 0.9573)$
$ψ_{1}$	$d_{5} - b \land d_{6} - b \land f_{5} - b \land f_{6} - b ⟶ e_{5} - b (0.9573, 0.99)$
Reduction of $ψ_{1}$	$d_{5} - b \land f_{5} - b ⟶ e_{5} - b (0.9573, 0.99)$
Rule 1	$ϕ \equiv a_{6} - b \land e_{4} - b \land f_{5} - b ⟶ e_{6} - b \land f_{6} - b (0.9502, 0.9694)$
	$ϕ_{G A} \equiv a_{6} - b ⟶ e_{6} - b \land f_{6} - b (0.9502, 0.9576)$
	$ϕ_{g a} \equiv a_{6} - b \land e_{4} - b \land e_{5} - b \land f_{5} - b ⟶ e_{6} - b \land f_{6} - b (0.9502, 1)$

Table A11. The number of approximate rules and reduction in the chess database with confidence

c \geq 0.5

.

Table A11. The number of approximate rules and reduction in the chess database with confidence

c \geq 0.5

.

${supp}_{r} (*)$	$TRX$	Min-Max ( $N, Rt$ )	Reliable ( $N, Rt$ )	$TR$	N-Min-Max ( $N, Rt$ )
0.90	10,614	(8371, 0.21)	(2483, 0.77)	9230	( $2034, 0.78$ )
0.91	5785	(5050, 0.13)	(1571, 0.73)	5354	(1357, 0.75)
0.93	2338	(1948, 0.17)	(688, 0.71)	2110	(648, 0.69)
0.95	468	(459, 0.02)	(196, 0.58)	466	(195, 0.58)
Average		0.13	0.70		0.70

Table A12. Templates, association rules and generalized association rules in the chess database.

Template 1	$A = {{mulch}_{-} f, {skach}_{-} f, wkna 8_{-} f} ({supp}_{r} (A) = 0.9002)$
$ψ_{1}$	$m u l c h_{-} f \land s k a c h_{-} f ⟶ w k n a 8_{-} f (0.9002, 0.9492)$
Rule 1	$ϕ \equiv q x m s q_{-} f \land s p c o p_{-} f \land w k n a 8_{-} f ⟶ b k o n 8_{-} f \land h d c h k_{-} f \land$ $r e s k d_{-} f \land t h r s k_{-} f (0.8326, 0.9103)$
	$ϕ_{G A} \equiv q x m s q_{-} f \land s p c o p_{-} f ⟶ b k o n 8_{-} f \land h d c h k_{-} f \land$ $r e s k d_{-} f \land t h r s k_{-} f (0.8827, 0.9107)$
	$ϕ_{G C} \equiv q x m s q_{-} f \land s p c o p_{-} f \land w k n a 8_{-} f ⟶ b k o n 8_{-} f \land h d c h k_{-} f \land$ $r e s k d_{-} f (0.8698, 0.9136)$
	$ϕ_{G A C} \equiv q x m s q_{-} f \land s p c o p_{-} f ⟶ b k o n 8_{-} f \land h d c h k_{-} f \land$ $r e s k d_{-} f (0.9215, 0.9507)$
	$ϕ_{g a} \equiv q x m s q_{-} f \land s p c o p_{-} f \land w k n a 8_{-} f \land s t l m t_{-} f ⟶ b k o n 8_{-} f \land$ $h d c h k_{-} f \land r e s k d_{-} f \land t h r s k_{-} f (0.8326, 0.9103)$

References

Agrawal, R.; Imieliński, T.; Swami, A. Mining Association Rules between Sets of Items in Large Databases. In Proceedings of the 1993 ACM SIGMOD International Conference on Management of Data, Washington, DC, USA, 25–28 May 1993; Association for Computing Machinery: New York, NY, USA, 1993; pp. 207–216. [Google Scholar]
Agrawal, R.; Srikant, R. Fast Algorithms for Mining Association Rules in Large Databases. In Proceedings of the 20th International Conference on Very Large Data Bases, Santiago de Chile, Chile, 12–15 September 1994; Morgan Kaufmann Publishers Inc.: San Francisco, CA, USA, 1994; pp. 487–499. [Google Scholar]
Thamer, M.B.; El-Sappagh, S.; El-Shishtawy, T. A Semantic Approach for Extracting Medical Association Rules. Int. J. Intell. Eng. Syst. 2020, 13, 280–292. [Google Scholar] [CrossRef]
Razzak, M.I.; Imran, M.; Xu, G. Big data analytics for preventive medicine. Neural Comput. Appl. 2020, 32, 4417–4451. [Google Scholar] [CrossRef] [PubMed]
Zhang, H.N.; Dwivedi, A.D. Precise Marketing Data Mining Method of E-Commerce Platform Based on Association Rules. Mob. Netw. Appl. 2022. [Google Scholar] [CrossRef]
Reddy, R.V.; Venkateswara Rao, K.; Kameswara Rao, M.; Deepak Kumar, B.P. A Review on Stock Market Analysis Using Association Rule Mining. In Cybernetics, Cognition and Machine Learning Applications; Gunjan, V.K., Suganthan, P.N., Haase, J., Kumar, A., Eds.; Springer Nature Singapore: Singapore, 2023; pp. 171–183. [Google Scholar]
Ahn, K.I.; Kim, J.Y. Efficient Mining of Frequent Itemsets and a Measure of Interest for Association Rule Mining. J. Inf. Knowl. Manag. 2004, 3, 245–257. [Google Scholar] [CrossRef]
Brin, S.; Motwani, R.; Ullman, J.D.; Tsur, S. Dynamic itemset counting and implication rules for market basket data. In Proceedings of the 1997 ACM SIGMOD International Conference on Management of Data, Tucson, AZ, USA, 13–15 May 1997; ACM: New York, NY, USA, 1997; pp. 255–264. [Google Scholar]
Geng, L.; Hamilton, H.J. Interestingness measures for data mining: A survey. ACM Comput. Surv. 2006, 38, 9. [Google Scholar] [CrossRef]
Ghosh, A.; Nath, B. Multi-objective rule mining using genetic algorithms. Inf. Sci. 2004, 163, 123–133. [Google Scholar] [CrossRef]
Silverstein, C.; Brin, S.; Motwani, R. Beyond market baskets: Generalizing association rules to dependence rules. Data Min. Knowl. Discov. 1998, 2, 39–68. [Google Scholar] [CrossRef]
Baralis, E.; Cagliero, L.; Cerquitelli, T.; Garza, P. Generalized association rule mining with constraints. Inf. Sci. 2012, 194, 68–84. [Google Scholar] [CrossRef]
Beiranvand, V.; Mobasher-Kashani, M.; Bakar, A.A. Multi-objective PSO algorithm for mining numerical association rules without a priori discretization. Expert Syst. Appl. 2014, 41, 4259–4273. [Google Scholar] [CrossRef]
Guil, F.; Marín, R. A Theory of Evidence-based method for assessing frequent patterns. Expert Syst. Appl. 2013, 40, 3121–3127. [Google Scholar] [CrossRef]
Guns, T.; Nijssen, S.; De Raedt, L. Itemset Mining: A Constraint Programming Perspective. Artif. Intell. 2011, 175, 1951–1983. [Google Scholar] [CrossRef]
Ji, Y.; Ying, H.; Tran, J.; Dews, P.; Mansour, A.; Massanari, R.M. A Method for Mining Infrequent Causal Associations and Its Application in Finding Adverse Drug Reaction Signal Pairs. IEEE Trans. Knowl. Data Eng. 2013, 25, 721–733. [Google Scholar] [CrossRef]
Kuo, R.J.; Chao, C.M.; Chiu, Y.T. Application of particle swarm optimization to association rule mining. Appl. Soft Comput. 2011, 11, 326–336. [Google Scholar] [CrossRef]
Luna, J.M.; Romero, J.R.; Ventura, S. Grammar-based multi-objective algorithms for mining association rules. Data Knowl. Eng. 2013, 86, 19–37. [Google Scholar] [CrossRef]
Rodríguez, D.M.; Rosete, A.; Alcalá-Fdez, J.; Herrera, F. QAR-CIP-NSGA-II: A new multi-objective evolutionary algorithm to mine quantitative association rules. Inf. Sci. 2014, 258, 1–28. [Google Scholar]
Martínez-Ballesteros, M.; Martínez-Álvarez, F.; Lora, A.T.; Riquelme, J.C. Selecting the best measures to discover quantitative association rules. Neurocomputing 2014, 126, 3–14. [Google Scholar] [CrossRef]
Pei, Z. Extracting association rules based on intuitionistic fuzzy special sets. In Proceedings of the FUZZ-IEEE, Hong Kong, China, 1–6 June 2008; pp. 873–878. [Google Scholar]
Shaharanee, I.N.M.; Hadzic, F.; Dillon, T.S. Interestingness measures for association rules based on statistical validity. Knowl. Based Syst. 2011, 24, 386–392. [Google Scholar] [CrossRef]
Kaushik, M.; Sharma, R.; Peious, S.A.; Shahin, M.; Yahia, S.B.; Draheim, D. A Systematic Assessment of Numerical Association Rule Mining Methods. SN Comput. Sci. 2021, 2, 348. [Google Scholar] [CrossRef]
Kuo, R.J.; Gosumolo, M.; Zulvia, F.E. Multi-objective particle swarm optimization algorithm using adaptive archive grid for numerical association rule mining. Neural Comput. Appl. 2019, 31, 3559–3572. [Google Scholar] [CrossRef]
Wang, H.B.; Gao, Y.J. Research on parallelization of Apriori algorithm in association rule mining. Procedia Comput. Sci. 2021, 183, 641–647. [Google Scholar] [CrossRef]
Bazai, S.U.; Jang-Jaccard, J. In-Memory Data Anonymization Using Scalable and High Performance RDD Design. Electronics 2020, 9, 1732. [Google Scholar] [CrossRef]
Bazai, S.U.; Jang-Jaccard, J.; Alavizadeh, H. A Novel Hybrid Approach for Multi-Dimensional Data Anonymization for Apache Spark. ACM Trans. Priv. Secur. 2021, 25, 1–25. [Google Scholar] [CrossRef]
Bazai, S.U.; Jang-Jaccard, J.; Alavizadeh, H. Scalable, High-Performance, and Generalized Subtree Data Anonymization Approach for Apache Spark. Electronics 2021, 10, 589. [Google Scholar] [CrossRef]
Calders, T.; Dexters, N.; Gillis, J.J.M.; Goethals, B. Mining frequent itemsets in a stream. Inf. Syst. 2014, 39, 233–255. [Google Scholar] [CrossRef]
Han, J.; Cheng, H.; Xin, D.; Yan, X. Frequent pattern mining: Current status and future directions. Data Min. Knowl. Discov. 2007, 15, 55–86. [Google Scholar] [CrossRef]
Pei, J.; Han, J.; Mao, R. CLOSET: An Efficient Algorithm for Mining Frequent Closed Itemsets. In Proceedings of the ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery, Dallas, TX, USA, 14 May 2000; pp. 21–30. [Google Scholar]
Wang, J.; Han, J.; Pei, J. CLOSET+: Searching for the best strategies for mining frequent closed itemsets. In Proceedings of the KDD, Washington, DC, USA, 24–27 August 2003; Getoor, L., Senator, T.E., Domingos, P.M., Faloutsos, C., Eds.; ACM: New York, NY, USA, 2003; pp. 236–245. [Google Scholar]
O’Sullivan, D.; Smyth, B.; Wilson, D.C.; McDonald, K.; Smeaton, A. Improving the Quality of the Personalized Electronic Program Guide. User Model. User-Adapt. Interact. 2004, 14, 5–36. [Google Scholar] [CrossRef]
Kryszkiewicz, M.; Rybinski, H.; Gajek, M. Dataless Transitions Between Concise Representations of Frequent Patterns. J. Intell. Inf. Syst. 2004, 22, 41–70. [Google Scholar] [CrossRef]
Pasquier, N.; Bastide, Y.; Taouil, R.; Lakhal, L. Efficient mining of association rules using closed itemset lattices. Inf. Syst. 1999, 24, 25–46. [Google Scholar] [CrossRef]
Zaki, M.J. Scalable algorithms for association mining. IEEE Trans. Knowl. Data Eng. 2000, 12, 372–390. [Google Scholar] [CrossRef]
Zaki, M.J.; Hsaio, C.J. Efficient Algorithms for Mining Closed Itemsets and Their Lattice Structure. IEEE Trans. Knowl. Data Eng. 2005, 17, 462–478. [Google Scholar] [CrossRef]
Hashem, T.; Ahmed, C.F.; Samiullah, M.; Akther, S.; Jeong, B.S.; Jeon, S. An efficient approach for mining cross-level closed itemsets and minimal association rules using closed itemset lattices. Expert Syst. Appl. 2014, 41, 2914–2938. [Google Scholar] [CrossRef]
Liu, H.; Liu, L.; Zhang, H. A fast pruning redundant rule method using Galois connection. Appl. Soft Comput. 2011, 11, 130–137. [Google Scholar] [CrossRef]
Cagliero, L.; Cerquitelli, T.; Garza, P.; Grimaudo, L. Misleading Generalized Itemset discovery. Expert Syst. Appl. 2014, 41, 1400–1410. [Google Scholar] [CrossRef]
Cagliero, L.; Garza, P. Itemset generalization with cardinality-based constraints. Inf. Sci. 2013, 244, 161–174. [Google Scholar] [CrossRef]
Baralis, E.; Cagliero, L.; Cerquitelli, T.; D’Elia, V.; Garza, P. Expressive generalized itemsets. Inf. Sci. 2014, 278, 327–343. [Google Scholar] [CrossRef]
Boulicaut, J.F.; Bykowski, A.; Rigotti, C. Free-sets: A condensed representation of boolean data for the approximation of frequency queries. Data Min. Knowl. Discov. 2003, 7, 5–22. [Google Scholar] [CrossRef]
Bykowski, A.; Rigotti, C. DBC: A condensed representation of frequent patterns for efficient mining. Inf. Syst. 2003, 28, 949–977. [Google Scholar] [CrossRef]
Chiang, D.A.; Wang, Y.F.; Wang, Y.H.; Chen, Z.Y.; Hsu, M.H. Mining disjunctive consequent association rules. Appl. Soft Comput. 2011, 11, 2129–2133. [Google Scholar] [CrossRef]
Calders, T.; Goethals, B. Non-derivable itemset mining. Data Min. Knowl. Discov. 2007, 14, 171–206. [Google Scholar] [CrossRef]
Li, H.; Chen, H. Mining non-derivable frequent itemsets over data stream. Data Knowl. Eng. 2009, 68, 481–498. [Google Scholar] [CrossRef]
Hamrouni, T.; Yahia, S.B.; Nguifo, E.M. Sweeping the disjunctive search space towards mining new exact concise representations of frequent itemsets. Data Knowl. Eng. 2009, 68, 1091–1111. [Google Scholar] [CrossRef]
Barrenechea, E.; Sola, H.B.; Campión, M.J.; Induráin, E.; Knoblauch, V. Topological interpretations of fuzzy subsets. A unified approach for fuzzy thresholding algorithms. Knowl. Based Syst. 2013, 54, 163–171. [Google Scholar] [CrossRef]
Syau, Y.R.; Lin, E.B. Neighborhood systems and covering approximation spaces. Knowl. Based Syst. 2014, 66, 61–67. [Google Scholar] [CrossRef]
Wang, S.; Liu, D. Knowledge representation and reasoning for qualitative spatial change. Knowl. Based Syst. 2012, 30, 161–171. [Google Scholar] [CrossRef]
Pei, Z.; Ruan, D.; Meng, D.; Liu, Z. Formal concept analysis based on the topology for attributes of a formal context. Inf. Sci. 2013, 236, 66–82. [Google Scholar] [CrossRef]
Zhang, Y.; Pei, Z.; Shi, P. Association rule mining based on topology for attributes of multi-valued information systems. Int. J. Innov. Comput. Inf. Control. Ijicic 2013, 9, 1679–1690. [Google Scholar]
Ganter, B.; Wille, R. Formal Concept Analysis: Mathematical Foundations; Springer: Berlin/Heidelberg, Germany, 1999. [Google Scholar]
Pawlak, Z.; Skowron, A. Rough sets and Boolean reasoning. Inf. Sci. 2007, 177, 41–73. [Google Scholar] [CrossRef]
Qin, K.; Yang, J.; Pei, Z. Generalized rough sets based on reflexive and transitive relations. Inf. Sci. 2008, 178, 4138–4141. [Google Scholar] [CrossRef]
Zhang, H.P.; Ouyang, Y.; Wang, Z. Note on “Generalized rough sets based on reflexive and transitive relations”. Inf. Sci. 2009, 179, 471–473. [Google Scholar] [CrossRef]
Freund, M. On the notion of concept I. Artif. Intell. 2008, 172, 570–590. [Google Scholar] [CrossRef]
Srikant, R.; Agrawal, R. Mining Generalized Association Rules. In Proceedings of the 21st International Conference on Very Large Databases, Zurich, Switzerland, 11–15 September 1995; pp. 407–419. [Google Scholar]
Wu, C.M.; Huang, Y.F. Generalized association rule mining using an efficient data structure. Expert Syst. Appl. 2011, 38, 7277–7290. [Google Scholar] [CrossRef]
Apriori Algorithm. Available online: http://www.mathworks.com/matlabcentral/fileexchange/42541-association-rules/ (accessed on 1 September 2022).
UCI Machine Learning Repository. Available online: http://archive.ics.uci.edu/ml/ (accessed on 1 September 2022).
Xu, Y.; Li, Y.; Shaw, G. Reliable representations for association rules. Data Knowl. Eng. 2011, 70, 555–575. [Google Scholar] [CrossRef]

Figure 1. The lattice of the topology

T (R_{A})

of Example 1.

Figure 2. The complete lattice

(T (R_{A}) / \sim_{τ}, \land, \lor)

of Example 1.

Figure 3. The complete lattice

(T (R_{A}) / \sim_{τ}, \land, \lor)

of Example 2.

Figure 4. Execution time of the two algorithms in Chess dataset for various minSup values.

Figure 5. Execution time of the two algorithms in Connect dataset for various minSup values.

Figure 6. Execution time of the two algorithms in Mushroom dataset for various minSup values.

Figure 7. Execution time of the two algorithms in Zoon dataset for various minSup values.

Figure 8. Memory usage of the two algorithms in Chess dataset for various minSup values.

Figure 9. Memory usage of the two algorithms in Connect dataset for various minSup values.

Figure 10. Memory usage of the two algorithms in Mushroom dataset for various minSup values.

Figure 11. Memory usage of the two algorithms in Zoon dataset for various minSup values.

Table 1. A transactional database

A = (U, A)

.

Table 1. A transactional database

A = (U, A)

.

$U ∖ A$	$a_{1}$	$a_{2}$	$a_{3}$	$a_{4}$	$a_{5}$
$u_{1}$	1	0	0	0	1
$u_{2}$	1	0	1	0	0
$u_{3}$	0	0	1	0	1
$u_{4}$	0	1	1	0	0
$u_{5}$	1	1	1	1	1
$u_{6}$	0	0	1	1	1

Table 2. The binary relation

R_{A}

on A.

Table 2. The binary relation

R_{A}

on A.

$R_{A}$	$a_{1}$	$a_{2}$	$a_{3}$	$a_{4}$	$a_{5}$
$a_{1}$	1	0	0	0	0
$a_{2}$	0	1	1	0	0
$a_{3}$	0	0	1	0	0
$a_{4}$	0	0	1	1	1
$a_{5}$	0	0	0	0	1

Table 3. A transactional database

A = (U, A)

.

Table 3. A transactional database

A = (U, A)

.

$U ∖ A$	$a_{1}$	$a_{2}$	$a_{3}$	$a_{4}$	$a_{5}$
$u_{1}$	1	0	1	0	1
$u_{2}$	0	0	1	1	0
$u_{3}$	0	1	0	0	1
$u_{4}$	1	0	1	0	1
$u_{5}$	0	0	1	1	0
$u_{6}$	1	1	1	0	1
$u_{7}$	1	0	1	1	1
$u_{8}$	1	1	0	0	1
$u_{9}$	1	0	0	0	1
$u_{10}$	0	0	1	1	0

Table 4. A binary relation on A.

$R_{A}$	$a_{1}$	$a_{2}$	$a_{3}$	$a_{4}$	$a_{5}$
$a_{1}$	1	0	0	0	1
$a_{2}$	0	1	0	0	1
$a_{3}$	0	0	1	0	0
$a_{4}$	0	0	1	1	0
$a_{5}$	0	0	0	0	1

Table 5. Closed itemset and minimal generators of each equivalent class.

Equivalent Class	$\cup [T]$	$\min [T]$	$\min P_{τ} (T^{'})$
$[{a_{1}, a_{5}}]$	${a_{1}, a_{5}}$	${a_{1}, a_{5}}$	${a_{1}}$
$[{a_{2}, a_{5}}]$	${a_{2}, a_{5}}$	${a_{2}, a_{5}}$	${a_{2}}$
$[{a_{3}, a_{4}}]$	${a_{3}, a_{4}}$	${a_{3}, a_{4}}$	${a_{4}}$
$[{a_{1}, a_{2}, a_{5}}]$	${a_{1}, a_{2}, a_{5}}$	${a_{1}, a_{2}, a_{5}}$	${a_{1}, a_{2}}$
$[{a_{3}, a_{5}}]$	${a_{1}, a_{3}, a_{5}}$	${a_{3}, a_{5}}$	${a_{3}, a_{5}}$
$[{a_{2}, a_{3}, a_{5}}]$	${a_{1}, a_{2}, a_{3}, a_{5}}$	${a_{2}, a_{3}, a_{5}}$	${a_{2}, a_{3}}$
$[{a_{3}, a_{4}, a_{5}}]$	${a_{1}, a_{3}, a_{4}, a_{5}}$	${a_{3}, a_{4}, a_{5}}$	${a_{4}, a_{5}}$

Table 6. Min-Max association rules with support s and confidence

c = 1

.

Table 6. Min-Max association rules with support s and confidence

c = 1

.

Min-Max Association Rule	(Support, Confidence)
$a_{1} ⟶ a_{5}$	$(6, 1)$
$a_{2} ⟶ a_{5}$	$(3, 1)$
$a_{4} ⟶ a_{3}$	$(4, 1)$
$a_{1} \land a_{2} ⟶ a_{5}$	$(2, 1)$
$a_{3} \land a_{5} ⟶ a_{1}$	$(4, 1)$
$a_{2} \land a_{3} ⟶ a_{5}$	$(1, 1)$
$a_{2} \land a_{3} \land a_{5} ⟶ a_{1}$	$(1, 1)$
$a_{2} \land a_{3} ⟶ a_{1} \land a_{5}$	$(1, 1)$
$a_{4} \land a_{5} ⟶ a_{3}$	$(1, 1)$
$a_{3} \land a_{4} \land a_{5} ⟶ a_{1}$	$(1, 1)$
$a_{4} \land a_{5} ⟶ a_{1} \land a_{3}$	$(1, 1)$

Table 7. Itemsets and their lower approximations of Example 2.

Itemsets	Lower Approximations	The Set of Equivalent Classes
${a_{1}, a_{3}}$	${a_{3}}$	${[{a_{3}}]}$
${a_{3}, a_{5}}$	${a_{3}, a_{5}}$	${[{a_{3}}], [{a_{5}}]}$
${a_{1}, a_{3}, a_{4}}$	${a_{3}, a_{4}}$	${[{a_{3}}], [{a_{3}, a_{4}}]}$

Table 8. Min-Max association rules generated by the itemsets.

Itemsets	Min-Max Association Rules	(Support, Confidence)
${a_{1}, a_{3}}$	$a_{3} ⟶ a_{1}$	$(4, \frac{4}{7})$
${a_{3}, a_{5}}$	$a_{3} ⟶ a_{5}$	$(4, \frac{4}{7})$
${a_{3}, a_{5}}$	$a_{5} ⟶ a_{3}$	$(4, \frac{4}{7})$
${a_{1}, a_{3}, a_{4}}$	$a_{3} ⟶ a_{1} \land a_{4}$	$(1, \frac{1}{7})$
${a_{1}, a_{3}, a_{4}}$	$a_{4} ⟶ a_{1} \land a_{3}$	$(1, \frac{1}{4})$

Table 9. Dataset characteristics.

Dataset	Transactions	Original Attributes	Attributes after Conversion
Zoo	101	17	15
Mushroom	8124	23	126
Connect-4	67,557	43	129
Chess	3196	36	108

Table 10. Number of rules of Apriori and Ours.

Dataset	$minSup (%)$	Number of Rules
Dataset	$minSup (%)$	Apriori	Ours
Chess	95	472	700
	90	10,742	9482
	85	95,482	43,116
	80	552,564	111,768
	75	2,336,556	253,836
Mushroom	50	1146	172
	45	2704	291
	40	5006	483
	35	14,107	903
	30	45,145	903
Connect	98	1544	380
	96	27,340	1480
	94	201,928	3848
Zoon	4	29,288	8136
	3	35,204	8826
	2	48,578	5110
	1	64,868	8174

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Efficient Associate Rules Mining Based on Topology for Items of Transactional Data

Abstract

1. Introduction

2. Preliminaries

3. Lattice Structures on the Topology for the Set of Items

3.1. The Lattice on the Topology

3.2. The Lattice on the Quotient Set of the Topology

4. Association Rules Mining from the Quotient Set of the Topology

4.1. Min-Max Association Rules Mining

4.2. Generalized Association Rules Based on the Lower Approximation

4.3. Generalized Association Rules Based on the Upper Approximation

5. Example Analysis

5.1. The Execution Time

5.2. The Memory Usage

5.3. Numbers of Rules

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Appendix A

References

Article Metrics

Citations

Article Access Statistics