A Quick Algorithm for Binary Discernibility Matrix Simplification using Deterministic Finite Automata

Zhang, Nan; Li, Baizhen; Zhang, Zhongxi; Guo, Yanyan

doi:10.3390/info9120314

Open AccessArticle

A Quick Algorithm for Binary Discernibility Matrix Simplification using Deterministic Finite Automata

by

Nan Zhang

^1,*,

Baizhen Li

^1,2,

Zhongxi Zhang

¹ and

Yanyan Guo

¹

School of Computer and Control Engineering, Yantai University, 264005 Yantai, China

²

Department of Computer Science and Technology, Tongji University, 201804 Shanghai, China

^*

Author to whom correspondence should be addressed.

Information 2018, 9(12), 314; https://doi.org/10.3390/info9120314

Submission received: 30 October 2018 / Revised: 3 December 2018 / Accepted: 6 December 2018 / Published: 7 December 2018

(This article belongs to the Special Issue Multiple-Criteria Decision-Making (MCDM) Techniques for Business Processes Information Management)

Download

Browse Figures

Versions Notes

Abstract

:

The binary discernibility matrix, originally introduced by Felix and Ushio, is a binary matrix representation for storing discernible attributes that can distinguish different objects in decision systems. It is an effective approach for feature selection, knowledge representation and uncertainty reasoning. An original binary discernibility matrix usually contains redundant objects and attributes. These redundant objects and attributes may deteriorate the performance of feature selection and knowledge acquisition. To overcome this shortcoming, row relations and column relations in a binary discernibility matrix are defined in this paper. To compare the relationships of different rows (columns) quickly, we construct deterministic finite automata for a binary discernibility matrix. On this basis, a quick algorithm for binary discernibility matrix simplification using deterministic finite automata (BDMSDFA) is proposed. We make a comparison of BDMR (an algorithm of binary discernibility matrix reduction), IBDMR (an improved algorithm of binary discernibility matrix reduction) and BDMSDFA. Finally, theoretical analyses and experimental results indicate that the algorithm of BDMSDFA is effective and efficient.

Keywords:

rough sets; binary discernibility matrices; deterministic finite automata

1. Introduction

Decision making can be considered as the process of choosing the best alternative from the feasible alternatives. With the development of research, decision making is extended from one attribute to multiple attributes. To solve problems in multiple attribute decision making, various theories such as fuzzy sets, rough sets and utility theory, etc. have been used. Many of significant results [1,2,3,4,5,6,7,8] have been achieved in multiple attribute decision making. Researchers in rough set theory [9] are usually concerned with attribute reduction (or feature selection) problems of multiple attribute decision making. The binary discernibility matrix, proposed by Felix and Ushio [10], is a useful tool for attribute reduction and knowledge acquisition. Recently, many algorithms of attribute reduction based on binary discernibility matrices have been developed [11,12,13]. In 2014, Zhang et al. [14] proposed a binary discernibility matrix for an incomplete information system, and designed a novel algorithm of attribute reduction based on the proposed binary discernibility matrix. In the paper [15], Li et al. developed an attribute reduction algorithm in terms of the improved binary discernibility matrix, and applied the algorithm in customer relationship management. Tiwari et al. [16] developed hardware for a binary discernibility matrix which can be used for attribute reduction and rule acquisition in an information system. Considering mathematical properties of a binary discernibility matrix, Zhi and Miao [17] introduced the so-called binary discernibility matrix reduction (BDMR), which was actually an algorithm for binary discernibility matrix simplification. On the basis of BDMR, two algorithms for attribute reduction and reduction judgement were presented. A binary discernibility matrix with a vertical partition [18] was proposed to deal with big data in attribute reduction. Ren et al. [19] constructed an improved binary discernibility matrix which can be used in an inconsistent information system. Ding et al. [20] discussed several problems about a binary discernibility matrix in an incomplete system. Combining the binary discernibility matrix in an incomplete system, an algorithm of incremental attribute reduction was proposed. In the paper [20], a novel method for calculation of incremental core attribute was introduced firstly. On this basis, an algorithm of attribute reduction was proposed. As is well known that core attributes play a crucial role in heuristic attribute reduction algorithms. Core attributes are computationally expensive in attribute reduction. Hu et al. [21] gave a quick algorithm of the core attribute calculation using a binary discernibility matrix. The computational complexity of the algorithm is

O (| C | | U |)

, where

| C |

is the number of condition attributes and

| U |

is the number of objects in the universe.

An original binary discernibility matrix usually contains redundant objects and attributes. These redundant objects and attributes may deteriorate the performance of feature selection (attribute reduction) and knowledge acquisition based on binary discernibility matrices. In other words, storing or processing all objects and attributes in an original binary discernibility matrix could be computationally expensive, especially in dealing with large scale data sets with high dimensions. So far, however, few works about the binary discernibility matrix simplification have been investigated. The existing algorithms regarding binary discernibility matrix simplification are time-consuming. To tackle this problem, our works in this paper concern on how to improve the time efficiency of algorithms of binary discernibility matrix simplification. On this purpose, we construct deterministic finite automata in a binary discernibility matrix to compare the relationships of different rows (or columns) quickly. By using deterministic finite automata, we develop a quick algorithm of binary discernibility matrix simplification. Experimental results show that the proposed algorithm is effective and efficient. The contributions of this paper are summarized as follows: First, we define row and column relations which can be used for constructing deterministic finite automata in a binary discernibility matrix. Second, deterministic finite automata in a binary discernibility matrix are proposed to compare the relationships of different rows (or columns) quickly. Third, based on this method, a quick algorithm for binary discernibility matrix simplification (BDMSDFA) is proposed. The proposed method in this paper is meaningful in practical applications. First, by using BDMSDFA, we obtain the simplified binary discernibility matrices quickly. These simplified binary discernibility matrices can significantly improve the efficiency of attribute reduction (feature selection) in decision systems. Second, a binary discernibility matrix without redundant objects and attributes will have the high performance of learning algorithms, and need less space for data storage.

The rest of this paper is structured as follows. We review basic notions about rough set theory in the next section. In Section 3, we propose a general binary discernibility matrix, and define row relations and column relations in a binary discernibility matrix. In Section 4, we develop a quick algorithm for binary discernibility matrix simplification which is called BDMSDFA. Experimental results in Section 5 show that the algorithm of BDMSDFA is effective and efficient, it can be applicable to simplification of large-scale binary discernibility matrices. Finally, the whole paper is summarized in Section 6.

2. Preliminaries

Basic notions about rough set theory are briefly reviewed in this section. Some further details about rough set theory can be found in the paper [9]. A Pawlak decision system can be regarded as an original information system with decision attributes which give decision classes for objects.

A Pawlak decision system [9] can be denoted by 4-tuple

D S = (U, A T, V, f)

, where universe

U = {x_{1}, x_{2}, \dots, x_{| U |}}

is a finite non-empty set of objects; attribute set

A T = C \cup D

,

C \cap D = \emptyset

, where

C = {a_{1}, a_{2}, \dots, a_{| C |}}

is called a condition attribute set and

D = {d}

is called a decision attribute set in a decision system;

V_{a_{m}}

is the domain of a condition attribute

a_{m} \in A T

,

V = \cup_{a_{m} \in A T} V_{a_{m}}

and

f : U \times A T \to V

is a function such that

f (x_{i}, a_{m}) = a_{m} (x_{i}) \in V_{a_{m} \in A T}

,

f (x_{i}, d) = d (x_{i}) \in V_{d \in A T}

, where

x_{i} \in U

.

Given a Pawlak decision system

D S = (U, C \cup D, V, f)

, for

\forall x_{i}, x_{j} \in U

, an indiscernibility relation regarding attribute set

B \subseteq C

is defined as

I N D (B) = {(x_{i}, x_{j}) : \forall b \in B, f (x_{i}, b) = f (x_{j}, b)}

. Therefore, the discernibility relation regarding attribute set

B \subseteq C

is given by

D I S (B) = {(x_{i}, x_{j}) : \exists b \in B, f (x_{i}, b) \neq f (x_{j}, b)}

. The indiscernibility relation regarding

B \subseteq C

is reflexive, symmetric and transitive. Meanwhile, the discernibility relation is irreflexive, symmetric, but not transitive. A partition of U derived from

I N D (B)

is denoted by

U / I N D (B)

. The equivalence class in

U / I N D (B)

containing object

x_{i}

is defined as

{[x_{i}]}_{I N D (B)} = {[x_{i}]}_{B} = {x_{j} \in U : (x_{i}, x_{j}) \in I N D (B)}

.

For

\forall B \subseteq C

, the relative indiscernibility relation and discernibility relation with respect to decision attribute set [9] are defined by:

I N D (B | D) = {(x_{i}, x_{j}) : x_{i}, x_{j} \in U, (\forall b \in B \to (f (x_{i}, b) = f (x_{j}, b)) \lor (f (x_{i}, d) = f (x_{j}, d))},

D I S (B | D) = {(x_{i}, x_{j}) : x_{i}, x_{j} \in U, (\exists b \in B \to (f (x_{i}, b) \neq f (x_{j}, b)) \land (f (x_{i}, d) \neq f (x_{j}, d))} .

A relative indiscernibility relation

I N D (B | D)

with respect to

B \subseteq C

is reflexive, symmetric, but not transitive. A relative discernibility relation

D I S (B | D)

with respect to

B \subseteq C

is irreflexive, symmetric, but not transitive.

A discernibility matrix, proposed by Skowron and Rauszer [22], suggests a matrix representation for storing condition attribute sets which can discern objects in the universe. Discernibility matrix is an effective method in reduct construction, data representation and rough logic reasoning, and it is also useful mathematical tool in data mining, machine learning, etc. Many extended models of dicernibility matrices have been studied in recent years [23,24,25,26,27,28,29,30]. Considering the classification property

Δ

, Miao et al. [31] constructed a general discernibility matrix

M_{Δ} = (m_{Δ} (x_{i}, x_{j}))

, where

m_{Δ} (x_{i}, x_{j})

is denoted by:

m_{Δ} (x_{i}, x_{j}) = \{\begin{matrix} {a \in C : f (x_{i}, a) \neq f (x_{j}, a)}, & (x_{i}, x_{j}) \in D I S_{Δ} (C | D) \\ \emptyset & otherwise \end{matrix},

where

(x_{i}, x_{j}) \in D I S_{Δ} (C | D)

denotes objects

x_{i}

and

x_{j}

are discernible with respect to the classification property

Δ

in a decision system

D S

. It should be noted that

Δ

is a general definition on classification property. A general discernibility matrix provides a common solution to attribute reduction algorithms based on discernibility matrices. By constructing different discernibility matrices, the relative attribute reducts with different reduction targets can be obtained. Based on the relative discernibility relation

D I S (C | D)

, Miao et al. [31] introduced a relationship preservation discernibility matrix which can be denoted as follows:

Definition 1.

[31] Let

D S = (U, C \cup D, V, f)

be a decision system, for

\forall x_{i}, x_{j} \in U

,

\forall a \in C

,

1 \leq i < j \leq | U |

,

M_{r e l a t i o n s h i p} = (m_{r e l a t i o n s h i p} (x_{i}, x_{j}))

is a relationship preservation discernibility matrix, where

m_{r e l a t i o n s h i p} (x_{i}, x_{j})

is defined by:

m_{r e l a t i o n s h i p} (x_{i}, x_{j}) = \{\begin{matrix} {a \in C : f (x_{i}, a) \neq f (x_{j}, a)} & (x_{i}, x_{j}) \in D I S (C | D) \\ \emptyset & o t h e r w i s e \end{matrix} .

3. Binary Discernibility Matrices and Their Simplifications

The binary discernibility matrix, initiated by Felix and Ushio [10], is a binary presentation of original discernibility matrix. In this section, we suggest a general binary discernibility matrix. Relations of row pairs and column pairs are discussed respectively. Formally, a binary discernibility matrix [10] is introduced as follows:

Definition 2.

[10] Given a decision system

D S = (U, C \cup D, V, f)

, for

\forall x_{i}, x_{j} \in U

and

\forall a_{m} \in C

.

M_{B D M} = (m_{B D M} (x_{i}, x_{j}))

is a binary discernibility matrix, where the element

m_{B D M} (x_{i}, x_{j})

is denoted by:

m_{B D M} (x_{i}, x_{j}) = \{\begin{matrix} 1 & f (x_{i}, a_{m}) \neq f (x_{j}, a_{m}) \land d (x_{i}) \neq d (x_{j}) \\ 0 & o t h e r w i s e \end{matrix} .

Based on a binary discernibility matrix, discernible attributes about

x_{i}

and

x_{j}

can be easily obtained. A binary discernibility matrix brings us an understandable approach for representations of discernible attributes, and can be used for designing reduction algorithms. To satisfy more application requirements, we extend original binary discernibility matrix to general binary discernibility matrix as follows:

Definition 3.

Given a decision system

D S = (U, C \cup D, V, f)

, for

\forall x_{i}, x_{j} \in U

,

\forall a_{m} \in C

,

M_{B D M}^{Δ} = (m_{B D M}^{Δ} (x_{i}, x_{j}))

regarding Δ is a general binary discernibility matrix, in which

m_{B D M}^{Δ} (x_{i}, x_{j})

is defined by:

m_{B D M}^{Δ} (x_{i}, x_{j}) = \{\begin{matrix} 1 & (x_{i}, x_{j}) \in D I S_{Δ} (C | D) \\ 0 & o t h e r w i s e \end{matrix} .

D I S_{Δ} (C | D)

is the discernibility relation regarding classification property Δ. The set of rows in

M_{_{B D M}}^{Δ}

is presented by

R = {r_{1}, r_{2}, \dots, r_{| R |}}

, where

| R | = (| U | \times (| U | - 1)) / 2

. The set of columns in

M_{_{B D M}}^{Δ}

is presented by

C = {a_{1}, a_{2}, \dots, a_{| C |}}

, where

| C |

is the cardinality of attribute sets in a decision system. For convenience, a general binary discernibility matrix

M_{_{B D M}}^{Δ} = (m_{_{B D M}}^{Δ} (x_{i}, x_{j}))

can be also denoted by

M_{_{B D M}}^{Δ} = (m_{_{B D M}}^{Δ} (e_{p}^{m}, e_{q}^{m}))

. For

\forall a_{m} \in C

,

\forall r_{p}, r_{q} \in R

,

e_{_{P}}^{m}

is the matrix element at row

r_{p}

and column

a_{m}

in

M_{_{B D M}}^{Δ}

, and

e_{q}^{m}

is the matrix element at row

r_{q}

and column

a_{m}

in

M_{_{B D M}}^{Δ}

, where

1 \leq p < q \leq (| U | \times (| U | - 1)) / 2

,

1 \leq m \leq | C |

.

Since a general binary discernibility matrix provides a common structure of binary discernibility matrices in rough set theory, one can construct a binary discernibility matrix according to a given classification property. Any binary discernibility matrix can be also regarded as the special case of the general binary discernibility matrix. Therefore, a general definition of binary discernibility matrix is necessary and important. Based on the relative discernibility relation with respect to D, Definition 2 can be also rewritten as follows:

Definition 4.

Given a decision system

D S = (U, C \cup D, V, f)

, for

\forall x_{i}, x_{j} \in U

,

\forall a_{m} \in C

,

D I S (C | D)

is the relative discernibility relation with respect to a condition attribute set C.

M_{B D M} = (m_{B D M} (x_{i}, x_{j}))

is a binary discernibility matrix, in which the element

m_{B D M} (x_{i}, x_{j})

is denoted by:

m_{B D M} (x_{i}, x_{j}) = \{\begin{matrix} 1 & (x_{i}, x_{j}) \in D I S (C | D) \\ 0 & o t h e r w i s e \end{matrix} .

This definition is equivalent to Definition 2 [10]. It is noted that we calculate binary discernibility matrix in this paper by using the relationship preservation discernibility matrix.

Definition 5.

For

\forall a_{m} \in C

,

\forall r_{p}, r_{q} \in R

,

e_{_{P}}^{m}

and

e_{q}^{m}

are elements in a binary discernibility matrix

M_{_{B D M}}^{} = (m_{B D M} (x_{i}, x_{j}))

, a row pair with respect to attribute

a_{m}

is denoted by

< e_{p}^{m}, e_{q}^{m} > \in

{< 0,0 >, < 0,1 >, < 1,0 >, < 1,1 >}, a binary relation between row

r_{p}

and

r_{q}

is defined as

R_{r o w} = {< e_{p}^{m}, e_{q}^{m} > : 1 \leq p < q \leq | R |}

.

Similar to Definition 5, we define a column pair and a binary relation with respect to columns as follows.

Definition 6.

For

\forall a_{m}, a_{n} \in C

,

\forall r_{p} \in R

, elements

e_{_{P}}^{m}

and

e_{p}^{n}

in a binary discernibility matrix

M_{_{B D M}}^{} = (m_{B D M} (x_{i}, x_{j}))

, a column pair with respect to row

r_{p}

is denoted by

< e_{p}^{m}, e_{p}^{n} > \in

{ < 0,0 >, < 0,1 >, < 1,0 >, < 1,1 >}, a binary relation between column

a_{m}

and

a_{n}

is defined as

R_{c o l} = {< e_{p}^{m}, e_{p}^{n} > : 1 \leq m < n \leq | C |}

.

For the matrix element

e_{p}^{m}

and

e_{q}^{m}

in the same column

a_{m}

, we define three row relations in a binary discernibility matrix as follows.

Definition 7.

Given a binary discernibility matrix

M_{_{B D M}}^{} = (m_{B D M} (x_{i}, x_{j}))

,

\forall r_{p}, r_{q} \in R

,

(1): for $\forall a_{m} \in C$ , $\exists a_{n} \in C$ , $r_{p} \supset r_{q}$ if and only if $e_{p}^{m} + e_{q}^{m} = e_{p}^{m}$ and $e_{p}^{n} \neq e_{q}^{n}$ ; for $\forall a_{m} \in C$ , $\exists a_{n} \in C$ , $r_{q} \supset r_{p}$ if and only if $e_{q}^{m} + e_{p}^{m} = e_{q}^{m}$ and $e_{q}^{n} \neq e_{p}^{n}$ ;
(2): for $\forall a_{m} \in C$ , $r_{p} = r_{q}$ if and only if $e_{p}^{m} = e_{q}^{m}$ ;
(3): for $\exists a_{m}, a_{n} \in C$ , $r_{p} \neq r_{q}$ if and only if $e_{p}^{m} + e_{q}^{m} = e_{q}^{m} (e_{p}^{m} \neq e_{q}^{m})$ and $e_{p}^{n} + e_{q}^{n} = e_{p}^{n}$ $(e_{p}^{n} \neq e_{q}^{n})$ .

Analogous to Definition 7, for matrix elements

e_{p}^{m}

and

e_{p}^{n}

in the same row

r_{p}

, we define column relations in a binary discernibility matrix as follows.

Definition 8.

Given a binary discernibility matrix

M_{_{B D M}}^{} = (m_{B D M} (x_{i}, x_{j}))

,

\forall a_{m}, a_{n} \in C

,

(1): for $\forall r_{p} \in R$ , $\exists r_{q} \in R$ , $a_{m} \supset a_{n}$ if and only if $e_{p}^{m} + e_{p}^{n} = e_{p}^{m}$ and $e_{q}^{m} \neq e_{q}^{n}$ ; for $\forall r_{p} \in R$ , $\exists r_{q} \in R$ , $a_{n} \supset a_{m}$ if and only if $e_{p}^{n} + e_{p}^{m} = e_{p}^{n}$ and $e_{q}^{n} \neq e_{q}^{m}$ ;
(2): for $\forall r_{p} \in R$ , $a_{m} = a_{n}$ if and only if $e_{p}^{m} = e_{p}^{n}$ ;
(3): for $\exists r_{p}, r_{q} \in R$ , $a_{m} \neq a_{n}$ if and only if $e_{p}^{m} + e_{p}^{n} = e_{p}^{m} (e_{p}^{m} \neq e_{p}^{n})$ and $e_{q}^{m} + e_{q}^{n} = e_{q}^{n}$ $(e_{q}^{m} \neq e_{q}^{n})$ .

Let

A_{p}

be the elements’ set of a prime implicant in a disjunctive normal form with row

r_{p}

and

A_{q}

be the elements’ set of a prime implicant in a disjunctive normal form with row

r_{q}

, then

r_{p} \supset r_{q}

means that

A_{p}

is the superset of

A_{q}

. For

\forall a_{m}, a_{n} \in C

,

a_{m} \supset a_{n}

indicates attribute

a_{m}

can distinguish more objects in the universe. In a binary discernibility matrix, the row in which all elements are 0s indicates there are no attribute can discern the related objects, and the column in which all elements are 0s indicates that this attribute cannot discern objects in the universe.

In [17], Zhi and Miao first proposed an algorithm of a binary discernibility matrix simplification shown in Algorithm 1. To improve the efficiency of BDMR, Wang et al. [32] introduced an improved algorithm of binary discernibility matrix reduction shown in Algorithm 2.

Algorithm 1: An algorithm of binary discernibility matrix reduction, BDMR.

Input: Original binary discernibility matrix

M_{B D M}

;

Output: Simplified binary discernibility matrix

M_{B D M}^{^{'}}

1:: delete the row in which all elements are 0 s;
2:: for $p = 1$ to $| R |$ do
3:: for $q = 1$ to $| R |$ do
4:: if $r_{p} \supset r_{q}$ then
5:: delete row $r_{p}$
6:: break
7:: end if
8:: end for
9:: end for
10:: delete the column in which all elements are 0 s;
11:: for $m = 1$ to $| C |$ do
12:: for $n = 1$ to $| C |$ do
13:: if $a_{n} \supset a_{m}$ then
14:: delete column $a_{m}$
15:: break
16:: end if
17:: end for
18:: end for
19:: output a simplified binary discernibility matrix $M_{B D M}^{'}$ ;

Algorithm 2: An improved algorithm of binary discernibility matrix reduction, IBDMR.

Input: Original binary discernibility matrix

M_{B D M}

;

Output: Simplified binary discernibility matrix

M_{B D M}^{^{'}}

1:: delete the row in which all elements are 0 s;
2:: sort rows in ascending order by the quantity of the number ‘1’ in each row
3:: for $p = 1$ to $| R |$ do
4:: for $q = 1$ to $| R |$ do
5:: if $r_{p} \supset r_{q}$ then
6:: delete row $r_{p}$
7:: break
8:: end if
9:: end for
10:: end for
11:: delete the column in which all elements are 0 s;
12:: for $m = 1$ to $| C |$ do
13:: for $n = 1$ to $| C |$ do
14:: if $a_{n} \supset a_{m}$ then
15:: delete column $a_{m}$
16:: break
17:: end if
18:: end for
19:: end for
20:: output a simplified binary discernibility matrix $M_{B D M}^{'}$ ;

4. A Quick Algorithm for Binary Discernibility Matrix Simplification

In this section, we investigate two theorems related to row relations and column relations respectively. Based on the two theorems, deterministic finite automata for row and column relations are introduced. Deterministic finite automata can be carried out to obtain the row relations and column relations quickly. By using deterministic finite automata, we propose an algorithm of binary discernibility matrix simplification using deterministic finite automata (BDMSDFA).

Theorem 1.

Let

M_{B D M} = (m_{B D M} (x_{i}, x_{j}))

be a binary discernibility matrix, for

\forall a_{m} \in C

,

\forall r_{p}, r_{q} \in R

,

| R | = (| U | \times (| U | - 1)) / 2

, we have:

(1): if $r_{p} \supset r_{q}$ , then there exists $< 1, 0 > \in {< e_{p}^{m}, e_{q}^{m} > : 1 \leq p < q \leq | R |}$ ;
(2): if $r_{q} \supset r_{p}$ , then there exists $< 0, 1 > \in {< e_{p}^{m}, e_{q}^{m} > : 1 \leq p < q \leq | R |}$ ;
(3): if $r_{p} = r_{q}$ , then there exists $< 0, 0 > \in {< e_{p}^{m}, e_{q}^{m} > : 1 \leq p < q \leq | R |}$ or
$< 1, 1 > \in {< e_{p}^{m}, e_{q}^{m} > : 1 \leq p < q \leq | R |}$ ;
(4): if $r_{p} \neq r_{q}$ , then there exists $< 1, 0 > \in {< e_{p}^{m}, e_{q}^{m} > : 1 \leq p < q \leq | R |}$ and
$< 0, 1 > \in {< e_{p}^{m}, e_{q}^{m} > : 1 \leq p < q \leq | R |}$ .

Proof.

(1): If there does not exist $< 1, 0 > \in {< e_{p}^{m}, e_{q}^{m} > : 1 \leq p < q \leq | R |}$ , then $< 0, 0 > \in {< e_{p}^{m}, e_{q}^{m} > : 1 \leq p < q \leq | R |}$ or $< 0, 1 > \in {< e_{p}^{m}, e_{q}^{m} > : 1 \leq p < q \leq | R |}$ or $< 1, 1 > \in {< e_{p}^{m}, e_{q}^{m} > : 1 \leq p < q \leq | R |}$ . We have seven binary relations as follows: ${< 0, 0 >}$ , ${< 0, 1 >}$ , ${< 1, 1 >}$ , ${< 0, 0 >, < 0, 1 >}$ , ${< 0, 0 >, < 1, 1 >}$ , ${< 0, 1 >, < 1, 1 >}$ and ${< 0, 0 >, < 0, 1 >, < 1, 1 >}$ . From seven binary relations above, if $\forall a_{m} \in C$ , $\exists a_{n} \in C$ , one cannot get $e_{p}^{m} + e_{q}^{m} = e_{p}^{m} (e_{p}^{n} \neq e_{q}^{n})$ . Thus, there exists $< 1, 0 > \in {< e_{p}^{m}, e_{q}^{m} > : 1 \leq p < q \leq | R |}$ in $M_{B D M}$ .
(2): If there does not exist $< 0, 1 > \in {< e_{p}^{m}, e_{q}^{m} > : 1 \leq p < q \leq | R |}$ , then $< 0, 0 > \in {< e_{p}^{m}, e_{q}^{m} > : 1 \leq p < q \leq | R |}$ or $< 1, 0 > \in {< e_{p}^{m}, e_{q}^{m} > : 1 \leq p < q \leq | R |}$ or $< 1, 1 > \in {< e_{p}^{m}, e_{q}^{m} > : 1 \leq p < q \leq | R |}$ . Thus, we can also have seven binary relations as follows: ${< 0, 0 >}$ , ${< 1, 0 >}$ , ${< 1, 1 >}$ , ${< 0, 0 >, < 1, 0 >}$ , ${< 0, 0 >, < 1, 1 >}$ , ${< 1, 0 >, < 1, 1 >}$ and ${< 0, 0 >, < 1, 0 >, < 1, 1 >}$ . From seven binary relations above, if $\forall a_{m} \in C$ and $\exists a_{n} \in C$ , one cannot get $e_{p}^{m} + e_{q}^{m} = e_{q}^{m} (e_{q}^{n} \neq e_{p}^{n})$ . Thus, there exists $< 0, 1 > \in {< e_{p}^{m}, e_{q}^{m} > : 1 \leq p < q \leq | R |}$ in $M_{B D M}$ .
(3): If there does not exist $< 0, 0 > \in {< e_{p}^{m}, e_{q}^{m} > : 1 \leq p < q \leq | R |}$ or $< 1, 1 > \in {< e_{p}^{m}, e_{q}^{m} > : 1 \leq p < q \leq | R |}$ . There must have seven binary relations as follows: ${< 0, 1 >}$ , ${< 1, 0 >}$ , ${< 0, 1 >, < 1, 0 >}$ , ${< 0, 0 >, < 0, 1 >}$ , ${< 0, 0 >, < 1, 0 >}$ , ${< 1, 1 >, < 0, 1 >}$ , ${< 1, 1 >, < 1, 0 >}$ . From seven binary relations above, we cannot have $e_{p}^{m} = e_{q}^{m}$ . Thus, there exists $< 0, 0 > \in {< e_{p}^{m}, e_{q}^{m} > : 1 \leq p < q \leq | R |}}$ or $< 1, 1 > \in {< e_{p}^{m}, e_{q}^{m} > : 1 \leq p < q \leq | R |}}$ in $M_{B D M}$ .
(4): If there does not exist $< 1, 0 > \in {< e_{p}^{m}, e_{q}^{m} > : 1 \leq p < q \leq | R |}$ and $< 0, 1 > \in {< e_{p}^{m}, e_{q}^{m} > : 1 \leq p < q \leq | R |}$ . We may obtain eleven binary relations as follows: ${< 0, 0 >}$ , ${< 1, 1 >}$ , ${< 0, 1 >}$ , ${< 1, 0 >}$ , ${< 0, 0 >, < 0, 1 >}$ , ${< 0, 0 >, < 1, 0 >}$ , ${< 1, 1 >, < 0, 1 >}$ , ${< 1, 1 >, < 1, 0 >}$ , ${< 0, 0 >, < 1, 1 >}$ , ${< 0, 0 >, < 1, 1 >, < 0, 1 >}$ , ${< 0, 0 >, < 1, 1 >, < 1, 0 >}$ . From eleven binary relations, for $\forall a_{m} \in C$ , we cannot have $e_{p}^{m} \neq e_{q}^{m}$ . Thus, there exists $< 1, 0 > \in {< e_{p}^{m}, e_{q}^{m} > : 1 \leq p < q \leq | R |}$ and $< 0, 1 > \in {< e_{p}^{m}, e_{q}^{m} > : 1 \leq p < q \leq | R |}$ in $M_{B D M}$ .

This completes the proof. ☐

Analogous to Theorem 1, we can easily obtain the following theorem as:

Theorem 2.

Let

M_{B D M} = (m_{B D M} (x_{i}, x_{j}))

be a binary discernibility matrix, for

\forall a_{m}, a_{n} \in C

,

\forall r_{p} \in R

, we can have:

(1): if $a_{m} \supset a_{n}$ , then there exists $< 1, 0 > \in {< e_{p}^{m}, e_{p}^{n} > : 1 \leq m < n \leq | C |}$ ;
(2): if $a_{n} \supset a_{m}$ , then there exists $< 0, 1 > \in {< e_{p}^{m}, e_{p}^{n} > : 1 \leq m < n \leq | C |}$ ;
(3): if $a_{m} = a_{n}$ , then there exists $< 0, 0 > \in {< e_{p}^{m}, e_{p}^{n} > : 1 \leq m < n \leq | C |}$ or
$< 1, 1 > \in {< e_{p}^{m}, e_{p}^{n} > : 1 \leq m < n \leq | C |}$ ;
(4): if $a_{m} \neq a_{n}$ , then there exists $< 1, 0 > \in {< e_{p}^{m}, e_{p}^{n} > : 1 \leq m < n \leq | C |}$ and
$< 0, 1 > \in {< e_{p}^{m}, e_{p}^{n} > : 1 \leq m < n \leq | C |}$ .

Proof.

(1): If there does not exist $< 1, 0 > \in {< e_{p}^{m}, e_{p}^{n} > : 1 \leq m < n \leq | C |}$ , then $< 0, 0 > \in {< e_{p}^{m}, e_{p}^{n} > : 1 \leq m < n \leq | C |}$ or $< 0, 1 > \in {< e_{p}^{m}, e_{p}^{n} > : 1 \leq m < n \leq | C |}$ or $< 1, 1 > \in {< e_{p}^{m}, e_{p}^{n} > : 1 \leq m < n \leq | C |}$ . We have seven binary relations as follows: ${< 0, 0 >}$ , ${< 0, 1 >}$ , ${< 1, 1 >}$ , ${< 0, 0 >, < 0, 1 >}$ , ${< 0, 0 >, < 1, 1 >}$ , ${< 0, 1 >, < 1, 1 >}$ and ${< 0, 0 >, < 0, 1 >, < 1, 1 >}$ . From seven binary relations above, if $\forall r_{p} \in R$ , $\exists r_{q} \in R$ , one cannot get $e_{p}^{m} + e_{p}^{n} = e_{p}^{m} (e_{q}^{m} \neq e_{q}^{n})$ . Thus, there exists $< 1, 0 > \in {< e_{p}^{m}, e_{p}^{n} > : 1 \leq m < n \leq | C |}$ in $M_{B D M}$ .
(2): If there does not exist $< 0, 1 > \in {< e_{p}^{m}, e_{p}^{n} > : 1 \leq m < n \leq | C |}$ , then $< 0, 0 > \in {< e_{p}^{m}, e_{p}^{n} > : 1 \leq m < n \leq | C |}$ or $< 1, 0 > \in {< e_{p}^{m}, e_{p}^{n} > : 1 \leq m < n \leq | C |}$ or $< 1, 1 > \in {< e_{p}^{m}, e_{p}^{n} > : 1 \leq m < n \leq | C |}$ . Thus, we can also have seven binary relations as follows: ${< 0, 0 >}$ , ${< 1, 0 >}$ , ${< 1, 1 >}$ , ${< 0, 0 >, < 1, 0 >}$ , ${< 0, 0 >, < 1, 1 >}$ , ${< 1, 0 >, < 1, 1 >}$ and ${< 0, 0 >, < 1, 0 >, < 1, 1 >}$ . From seven binary relations above, if $\forall r_{p} \in R$ and $\exists r_{q} \in R$ , one cannot get $e_{p}^{m} + e_{p}^{n} = e_{p}^{n} (e_{q}^{m} \neq e_{q}^{n})$ . Thus, there exists $< 0, 1 > \in {< e_{p}^{m}, e_{p}^{n} > : 1 \leq m < n \leq | C |}$ in $M_{B D M}$ .
(3): If there does not exist $< 0, 0 > \in {< e_{p}^{m}, e_{p}^{n} > : 1 \leq m < n \leq | C |}$ or $< 1, 1 > \in {< e_{p}^{m}, e_{p}^{n} > : 1 \leq m < n \leq | C |}$ . There must have seven binary relations as follows: ${< 0, 1 >}$ , ${< 1, 0 >}$ , ${< 0, 1 >, < 1, 0 >}$ , ${< 0, 0 >, < 0, 1 >}$ , ${< 0, 0 >, < 1, 0 >}$ , ${< 1, 1 >, < 0, 1 >}$ , ${< 1, 1 >, < 1, 0 >}$ . From seven binary relations above, we cannot have $e_{p}^{m} = e_{p}^{n}$ . Thus, there must exists $< 0, 0 > \in {< e_{p}^{m}, e_{p}^{n} > : 1 \leq m < n \leq | C |}$ or $< 1, 1 > \in {< e_{p}^{m}, e_{p}^{n} > : 1 \leq m < n \leq | C |}$ in $M_{B D M}$ .
(4): If there does not exist $< 1, 0 > \in {< e_{p}^{m}, e_{p}^{n} > : 1 \leq m < n \leq | C |}$ and $< 0, 1 > \in {< e_{p}^{m}, e_{p}^{n} > : 1 \leq m < n \leq | C |}$ . We may obtain eleven binary relations as follows: ${< 0, 0 >}$ , ${< 1, 1 >}$ , ${< 0, 1 >}$ , ${< 1, 0 >}$ , ${< 0, 0 >, < 0, 1 >}$ , ${< 0, 0 >, < 1, 0 >}$ , ${< 1, 1 >, < 0, 1 >}$ , ${< 1, 1 >, < 1, 0 >}$ , ${< 0, 0 >, < 1, 1 >}$ , ${< 0, 0 >, < 1, 1 >, < 0, 1 >}$ , ${< 0, 0 >, < 1, 1 >, < 1, 0 >}$ . From eleven binary relations, for $\forall r_{p} \in R$ , we cannot have $e_{p}^{m} \neq e_{p}^{n}$ . Thus, there exists $< 1, 0 > \in {< e_{p}^{m}, e_{p}^{n} > : 1 \leq m < n \leq | C |}$ and $< 0, 1 > \in {< e_{p}^{m}, e_{p}^{n} > : 1 \leq m < n \leq | C |}$ in $M_{B D M}$ .

This completes the proof. ☐

Deterministic finite automaton, also called deterministic finite acceptor, is an important concept in theory of computation. A deterministic finite automaton constructs a finite-state machine which can accept or reject symbol strings, and produce a computation of automation for each input string. In what follows, we adopt deterministic finite automata to obtain row relations and column relations in a binary discernibility matrix. Here, we first review the definition of deterministic finite automaton as follows.

Definition 9.

A deterministic finite automaton is a 5-tuple

(Q, \sum, δ, S_{0}, F)

, where Q is a finite nonempty set of states, ∑ is a finite set of input symbols, δ is a transition function,

S_{0} \in Q

is a start state, F is a set of accept states.

Regarding object pair ‘

e_{p}^{m} e_{q}^{m}

’ as the basic granule in input symbols, a deterministic finite automaton for row relations in a binary discernibility matrix is illustrated by the following theorem:

Theorem 3.

A deterministic finite automaton for row relations, denoted by

D F A_{r o w}

, is a 5-tuple

(Q, \sum, δ, S_{0}, F)

, where

Q = {S_{0}, S_{1}, S_{2}, S_{3}, S_{4}}

is a finite set of states,

\sum = {e_{p}^{0} e_{q}^{0} e_{p}^{1} e_{q}^{1} \dots e_{p}^{m} e_{q}^{m} e_{p}^{m + 1} e_{q}^{m + 1} \dots e_{p}^{| C |} e_{q}^{| C |}} (1 < m < | C |, r_{p}, r_{q} \in R)

is an input binary character string, δ is a transition function,

S_{0} \in Q

is a start state,

F = {S_{1}, S_{2}, S_{3}, S_{4}}

is a set of accept states. A deterministic finite automaton for row relations can be illustrated in Figure 1 as follows.

Proof.

In a binary discernibility matrix, relations between

r_{p}

and

r_{q}

can be concluded as

r_{q} \supset r_{p}

,

r_{p} = r_{q}

,

r_{p} \supset r_{q}

and

r_{p} \neq r_{q}

.

We discuss a deterministic finite automaton for row relations from four parts separately, as follows.

(1): According to Definition 5 and Theorem 1, for $\forall a_{m} \in C$ , $\forall r_{p}, r_{q} \in R$ , there must be $< 0, 1 > \in {< e_{p}^{m}, e_{q}^{m} > : 1 \leq p < q \leq | R |}$ . Thus, the regular expression for $r_{q} \supset r_{p}$ can be defined as $[{(00 / 11)}^{*} {(01)}^{+} {(00 / 11)}^{*}]^{+}$ . We can easily have the corresponding deterministic finite automaton in Figure 2 as:
(2): For $r_{p} = r_{q}$ , there must be $< 0, 0 > \in {< e_{p}^{m}, e_{q}^{m} > : 1 \leq p < q \leq | R |}$ or $< 1, 1 > \in {< e_{p}^{m}, e_{q}^{m} > : 1 \leq p < q \leq | R |}$ . The regular expression for $r_{p} = r_{q}$ is denoted by ${(00 / 11)}^{+}$ . So, the corresponding deterministic finite automaton can be illustrated in Figure 3 as:
(3): Analogous to $r_{p} \supset r_{q}$ , for $r_{p}, r_{q} \in R$ , there must be $< 1, 0 > \in {< e_{p}^{m}, e_{q}^{m} > : 1 \leq p < q \leq | R |}$ . Therefore, the regular expression for $r_{p} \supset r_{q}$ can be obtained as ${[{(00 / 11)}^{*} {(10)}^{+} {(00 / 11)}^{*}]}^{+}$ . We can easily have the corresponding deterministic finite automaton in Figure 4 as:
(4): For $r_{p} \neq r_{q}$ , there must be $< 1, 0 > \in {< e_{p}^{m}, e_{q}^{m} > : 1 \leq p < q \leq | R |}$ and $< 0, 1 > \in {< e_{p}^{m}, e_{q}^{m} > : 1 \leq p < q \leq | R |}$ . The regular expression for $r_{p} \neq r_{q}$ is denoted by ${[{(00 / 11)}^{*} {(01)}^{+} {(00 / 11)}^{*} {(10)}^{+} {(00 / 11)}^{*}]}^{+} / [{(00 / 11)}^{*} {(10)}^{+} {(00 / 11)}^{*} {(01)}^{+} {(00 / 11)}^{*}]^{+}$ . Hence, the corresponding deterministic finite automaton can be illustrated in Figure 5 as:

One can construct a deterministic finite automaton for row relations by four deterministic finite automata shown in Figure 1.

This completes the proof. ☐

Similar to the deterministic finite automaton for row relations, we present the deterministic finite automaton for column relations in a binary discernibility matrix as follows.

Theorem 4.

A deterministic finite automaton for column relations

D F A_{c o l}

is a 5-tuple

(Q, \sum, δ, S_{0}, F)

, where

Q = {S_{0}, S_{1}, S_{2}, S_{3}, S_{4}}

is a finite set of states,

\sum = {e_{0}^{m} e_{0}^{n} e_{1}^{m} e_{1}^{n} \dots e_{p}^{m} e_{p}^{n} e_{p + 1}^{m} e_{p + 1}^{n} \dots e_{| R |}^{m} e_{| R |}^{n}} (1 < p < | R |, \forall a_{m}, a_{n} \in C)

is an input binary character string, δ is a transition function,

S_{0} \in Q

is a start state.

F = {S_{1}, S_{2}, S_{3}, S_{4}}

is a set of accept states. A deterministic finite automaton for column relations can be illustrated in Figure 6 as follows:

Proof.

This proof is similar to the proof of Theorem 3. ☐

By means of the proposed deterministic finite automata for row and column relations, we propose a quick algorithm for binary discernibility matrix simplification using deterministic finite automata (BDMSDFA) as follows:

We present the following example to explain Algorithm 3 as follows.

Algorithm 3: A quick algorithm for binary discernibility matrix simplification using deterministic finite automata, BDMSDFA.

Input: Original binary discernibility matrix

M_{B D M}

;

Output: Simplified binary discernibility matrix

M_{B D M}^{^{'}}

1:: delete the row in which all elements are 0 s;
2:: compare the row relation between $r_{p}$ and $r_{q}$ by $D F A_{r o w}$
3:: for $p = 1$ to $| R |$ do
4:: for $q = 1$ to $| R |$ do
5:: if $r_{p} \supset r_{q}$ then
6:: delete row $r_{p}$ from $M_{B D M}$
7:: break
8:: end if
9:: end for
10:: end for
11:: delete the column in which all elements are 0 s;
12:: compare the column relation between $a_{m}$ and $a_{n}$ by $D F A_{c o l}$
13:: for $m = 1$ to $| C |$ do
14:: for $n = 1$ to $| C |$ do
15:: if $a_{n} \supset a_{m}$ then
16:: delete column $a_{m}$ from $M_{B D M}$
17:: break.
18:: end if
19:: end for
20:: end for
21:: output a simplified binary discernibility matrix $M_{B D M}^{'}$ ;

Example 1.

Let

D S = (U, C \cup D, V, f)

be a decision system shown Table 1, where the universe

U = {x_{1}, x_{2}, x_{3}, x_{4}, x_{5}}

, the condition attribute set

C = {a_{1}, a_{2}, a_{3}, a_{4}}

, the decision attribute set

D = {d}

.

For the decision system above, we have the corresponding binary discernibility matrix as follows:

M_{B D M}^{1} = (\begin{matrix} 0 & 0 & 0 & 0 \\ 0 & 0 & 1 & 1 \\ 0 & 1 & 1 & 1 \\ 1 & 1 & 1 & 1 \\ 0 & 0 & 1 & 1 \\ 0 & 1 & 1 & 1 \\ 1 & 1 & 1 & 1 \\ 0 & 1 & 0 & 0 \\ 1 & 1 & 0 & 0 \\ 1 & 1 & 0 & 0 \end{matrix}) .

We delete the row in which all elements are 0 s in

M_{B D M}^{1}

, and obtain the binary discernibility matrix

M_{B D M}^{2}

as follows.

M_{B D M}^{2} = (\begin{matrix} 0 & 0 & 1 & 1 \\ 0 & 1 & 1 & 1 \\ 1 & 1 & 1 & 1 \\ 0 & 0 & 1 & 1 \\ 0 & 1 & 1 & 1 \\ 1 & 1 & 1 & 1 \\ 0 & 1 & 0 & 0 \\ 1 & 1 & 0 & 0 \\ 1 & 1 & 0 & 0 \end{matrix}) .

In the binary discernibility matrix

M_{B D M}^{2}

,

r_{1} : 0011

,

r_{2} : 0111

,

r_{3} : 1111

,

r_{4} : 0011

,

r_{5} : 0111

,

r_{6} : 1111

,

r_{7} : 0100

,

r_{8} : 1100

,

r_{9} : 1100

. According to the definition of the deterministic finite automaton for row relations, we have

\sum_{12} = 00011111

,

\sum_{13} = 01011111

,

\sum_{14} = 00001111

,

\sum_{15} = 00011111

,

\sum_{16} = 01011111

,

\sum_{17} = 00011010

,

\sum_{18} = 01011010

,

\sum_{19} = 01011010

. By using the deterministic finite automaton for row relations shown in Figure 1, we can get the row relations as follows.

r_{2} \supset r_{1}

,

r_{3} \supset r_{1}

,

r_{4} = r_{1}

,

r_{5} \supset r_{1}

,

r_{6} \supset r_{1}

,

r_{7} \neq r_{1}

,

r_{8} \neq r_{1}

,

r_{9} \neq r_{1}

. Therefore, we delete

r_{2}

,

r_{3}

,

r_{5}

and

r_{6}

. Similarly, we get

r_{8} \supset r_{7}

and

r_{9} \supset r_{7}

,

r_{4} \neq r_{7}

, and then delete

r_{8}

and

r_{9}

. Therefore, we have the following binary discernibility matrix:

M_{B D M}^{3} = (\begin{matrix} 0 & 0 & 1 & 1 \\ 0 & 0 & 1 & 1 \\ 0 & 1 & 0 & 0 \end{matrix}) .

We delete the column in which all elements are 0 s in

M_{B D M}^{3}

, and have

M_{B D M}^{4} = (\begin{matrix} 0 & 1 & 1 \\ 0 & 1 & 1 \\ 1 & 0 & 0 \end{matrix}) .

In the binary discernibility matrix

M_{B D M}^{4}

,

a_{1} : 001

,

a_{2} : 110

,

a_{3} : 110

. According to the definition of deterministic finite automaton for column relations, we have

\sum_{12} = 010110

,

\sum_{13} = 010110

,

\sum_{23} = 111100

. By using the deterministic finite automaton for column relations shown in Figure 6, we have

a_{1} \neq a_{2}

,

a_{1} \neq a_{3}

,

a_{2} = a_{3}

. Thus, we cannot delete any column in

M_{B D M}^{4}

, and get the following binary discernibility matrix.

M_{B D M}^{5} = (\begin{matrix} 0 & 1 & 1 \\ 0 & 1 & 1 \\ 1 & 0 & 0 \end{matrix}) .

A

10 \times 4

matrix

M_{B D M}^{1}

is compressed to a

3 \times 3

matrix

M_{B D M}^{5}

. The simplified binary discernibility matrix with fewer objects or columns will be help in improving the efficiency of attribute reduction.

Assume that

t = | R |

and

s = | C |

, the upper bound of time complexity of BDMR is

3 t s (t + s - 2)

, the lower bound of time complexity of BDMR is

2 t s (t + s - 2)

. The upper bound of time complexity of IBDMR is

3 t s (t + s - 2)

and the lower bound of the worst-case time complexity of IBDMR is

2 t s (t + s - 2)

. By employing deterministic finite automata, the algorithm complexity of BDMSDFA is

t s (t + s - 2)

. Obviously, the time complexity of BDMSDFA is lower than that of BDMR and IBDMR. Therefore, it is concluded that the proposed algorithm BDMSDFA reduces the computational time for binary discernibility matrix simplification in general.

The advantages of the proposed method are expressed as follows. (1) Deterministic finite automata in a binary discernibility matrix are constructed, it can provide an understandable approach to comparing the relationships of different rows (columns) quickly. (2) Based on deterministic finite automata, a high efficiency algorithm of binary discernibility matrix simplification is developed. Theoretical analyses and experimental results indicate that the proposed algorithm is effective and efficient. It should be noted that the proposed method is based on Pawlak decision systems, but not suitable for generalized decision systems, such as incomplete decision systems, interval-valued decision systems and fuzzy decision systems. Deterministic finite automata in generalized decision systems will be investigated in the future.

5. Experimental Results and Analyses

The objective of the following experiments in this section is to demonstrate the high efficiency of the algorithm BDMSDFA. The experiments are divided into two aspects. In one aspect, we employ 10 datasets in Table 2 to verify the performance of time consumption of BDMR, IBDMR and BDMSDFA. In the other aspect, the computational times of algorithms BDMR, IBDMR and BDMSDFA with the increase of the size of attributes (or objects) are calculated respectively. We carry out three algorithms on a personal computer with Windows 8.1 (64 bit) and Inter(R) Core(TM) i5-4200U, 1.6 GHz and 4 GB memory. The software is Microsoft Visual Studio 2017 version 15.9 and C++. Data sets used in the experiments are all downloaded from UCI repository of machine learning data sets (http://archive.ics.uci.edu/ml/datasets.html).

Table 2 indicates the computational time of BDMR, IBDMR and BDMSDFA on the 10 data sets. We can see that the algorithm BDMSDFA is much faster than the algorithms BDMR and IBDMR. The computational times of three algorithms follows this order: BDMR ≥ IBDMR > BDMSDFA. The computational time of BDMSDFA is the minimum among the three algorithms. For the data set Auto in Table 2, the computational times of BDMR and IBDMR are 75 ms and 68 ms, while that of BDMSDFA is 36 ms. For the data set Credit_a, the computational times of BDMR and IBDMR are 113 ms and 105 ms, while that of BDMSDFA is 55 ms. For some data sets in Table 2, the computational time of BDMSDFA can reduce over half the computational time of BDMR or IBDMR. In Table 2, for the data set Breast_w, the computational times of BDMR and IBDMR are 75 ms and 73 ms, while that of BDMSDFA is 29 ms. For the data set Promoters, the computational times of BDMR and IBDMR are 1517 ms and 936 ms, while that of BDMSDFA is only 398 ms. For the date sets such as Lung-cancer, Credit_a, Breast_w, Anneal, the computational time of BDMR is close to that of IBDMR. For the data set Labor_neg, the computational time of BDMR is equivalent to that of IBDMR. For each data set in Table 2, difference between BDMR and IBDMR is relatively smaller than difference between BDMR (IBDMR) and BDMSDFA.

We compare the computational times of BDMR, IBDMR and BDMSDFA with the increase of the size of objects. In Figure 7a–f, the x-coordinate pertains to the size of objects in the universe, while the y-coordinate concerns the time consumption of algorithms. We employ 6 data sets (Dermatlogy, Credit_a, Controceptive_Method_Choice, Letter, Flag and Mushroom) to verify the performance of time consumption of BDMR, IBDMR and BDMSDFA. When dealing with the same UCI data sets, the computational time of BDMSDFA is less than that of BDMR and IBDMR, in other words, BDMSDFA is more efficient than BDMR and IBDMR. Figure 7 shows more detailed change trends of each algorithm with the number of objects increasing. The computational times of three algorithms increase with the increase of the number of objects simultaneously. It is obvious to see that the slope of the curve of BDMSDFA is smaller than the curve of BDMR or IBDMR, and the computational time of BDMSDFA increases slowly. The differences between BDMR (IBDMR) and BDMSDFA become distinctly larger when the size of the objects increases. In Figure 7c, the difference of BDMR (IBDMR) and BDMSDFA is not obviously different at the beginning. The computational time of DBMR (IBDMR) increases distinctly when the number of objects is over 450. The computational time of algorithm BDMR increases by 479 ms when the number of objects rises from 450 to 1473, whereas the computational time of algorithm BDMSDFA increases by only 141 ms. In Figure 7e, the computational time of the algorithm IBDMR increases by 104 ms when the number of objects rises from 20 to 160, whereas the time consumption of algorithm BDMSDFA increases by only 49.

In Figure 8a–f, the x-coordinate pertains to the size of attributes, while the y-coordinate concerns the time consumption of algorithms. We also take 6 data sets (Dermatlogy, Credit_a, Controceptive_Method_Choice, Letter, Flag and Mushroom) to verify the performance of the computational times of BDMR, IBDMR and BDMSDFA. The curve of BDMR is similar to that of IBDMR. The curve of BDMSDFA is under the curves of BDMR and IBDMR. Then, the computational time of BDMSDFA is less than that of BDMR or IBDMR. In Figure 8b, the computational time of algorithms BDMR and IBDMR increase by 164 ms and 123 ms respectively, while the computational time of algorithm BDMSDFA increases by 58 ms. In Figure 8c, the curves of BDMR and IBDMR raise profoundly when the size of the attributes increases. In Figure 8e, the computational time of algorithm IBDMR increases from 4 ms to 105 ms when the number of objects rises from 3 to 24, while the computational time of algorithm BDMSDFA increasedly from 2 ms to 50 ms. For Figure 8a–f, it is concluded that the efficiency of BDMSDFA is higher than that of BDMR or IBDMR with the increase of the number of attributes. Difference between BDMR and IBDMR is relatively smaller than difference between BDMR (IBDMR) and BDMSDFA. The computational times of three algorithms increase with the increase of the number of attributes monotonously. When dealing with the same situation, the computational time of BDMSDFA is the minimum among the three algorithms.

Experimental analyses and results show a high efficiency of the algorithm BDMSDFA. The proposed simplification algorithm using deterministic finite automata can be applied as a preprocessing technique for data compression and attribute reduction in large-scale data sets.

6. Conclusions

Original binary discernibility matrices which are not simplified usually have irrelative objects and attributes. These irrelative objects and attributes may lead to inefficiency in attribute reduction, knowledge acquisition, etc. To tackle this problem, a quick method of comparing the relationships of different rows (columns) are introduced in binary discernibility matrices. By using deterministic finite automata, a quick algorithm for binary discernibility matrix simplification (BDMSDFA) is developed. The experiment results indicate that DBMSDFA can get higher performance in the efficiency of binary discernibility matrix simplification. The contributions of this paper can be summarized as follows.

(1): We define row (or column) relations which are used for constructing deterministic finite automata.
(2): Deterministic finite automata are firstly used for comparing the relationships of different rows (columns) in a binary discernibility matrix.
(3): Based on deterministic finite automata, a quick algorithm for binary discernibility matrix simplification is developed. Experimental results indicate that the relationship between the time consumption of BDMSDFA and the number of objects (attributes) is strictly monotonic. With the increase of the size of objects (attributes), the algorithm BDMSDFA is more efficient than BDMR and IBDMR.

It is noted that the proposed quick simplification algorithm for discernibility matrix is only suitable for completed decision systems. However, in practical applications, there exists many generalized decision systems, such as incomplete decision systems, interval-valued decision systems, etc. Researches on quick simplification algorithms in generalized decision systems will be investigated. Combing the researches on fuzzy sets [33,34,35,36], we will propose the fuzzy binary discernibility matrix. Some applications of the (fuzzy) binary discernibility matrix simplification will also be studied in the future.

Author Contributions

N.Z. and B.L. designed the research. N.Z gave theoretical analysis and wrote the whole paper. N.Z. and B.L. designed and conducted the experiments. Z.Z. and Y.G. provided some suggestions on the paper.

Funding

This research was funded by the National Natural Science Foundation of China (No. 61403329, No. 61572418, No. 61502410, No. 61572419).

Acknowledgments

The authors thank the anonymous referees for the constructive comments and suggestions.

Conflicts of Interest

The authors declare no conflict of interest.

References

Arora, R.; Garg, H. A robust correlation coefficient measure of dual hesitant fuzzy soft sets and their application in decision making. Eng. Appl. Artif. Intell. 2018, 72, 80–92. [Google Scholar] [CrossRef]
Garg, H.; Nancy. New Logarithmic operational laws and their applications to multiattribute decision making for single-valued neutrosophic numbers. Cogn. Syst. Res. 2018, 52, 931–946. [Google Scholar] [CrossRef]
Garg, H.; Nancy. Linguistic single-valued neutrosophic prioritized aggregation operators and their applications to multiple-attribute group decision-making. J. Ambient Intell. Hum. Comput. 2018, 9, 1975–1997. [Google Scholar] [CrossRef]
Garg, H.; Arora, R. Dual hesitant fuzzy soft aggregation operators and their application in decision making. Cogn. Comput. 2018, 10, 769–789. [Google Scholar] [CrossRef]
Garg, H.; Kumar, K. An advanced study on the similarity measures of intuitionistic fuzzy sets based on the set pair analysis theory and their application in decision making. Soft Comput. 2018, 22, 4959–4970. [Google Scholar] [CrossRef]
Garg, H.; Kumar, K. Distance measures for connection number sets based on set pair analysis and its applications to decision-making process. Appl. Intell. 2018, 48, 3346–3359. [Google Scholar] [CrossRef]
Badi, I.; Ballem, M. Supplier selection using the rough BWM-MAIRCA model: a case study in pharmaceutical supplying in Libya. Decis. Mak. Appl. Manag. Eng. 2018, 1, 16–33. [Google Scholar] [CrossRef]
Chatterjee, K.; Pamucar, D.; Zavadskas, E.K. Evaluating the performance of suppliers based on using the R’AMATEL-MAIRCA method for green supply chain implementation in electronics industry. J. Clean. Prod. 2018, 184, 101–129. [Google Scholar] [CrossRef]
Pawlak, Z. Rough sets. Int. J. Inf. Comput. Sci. 1982, 11, 341–356. [Google Scholar] [CrossRef]
Felix, R.; Ushio, T. Rough sets-based machine learning using a binary discernibility matrix. In Proceedings of the Second International Conference on Intelligent Processing and Manufacturing of Materials, Honolulu, HI, USA, 10–15 July 1999; pp. 299–305. [Google Scholar]
Ding, M.W.; Zhang, T.F.; Ma, F.M. Incremental attribute reduction algorithm based on binary discernibility matrix. Comput. Eng. 2017, 43, 201–206. [Google Scholar]
Qian, W.B.; Xu, Z.Y.; Yang, B.R.; Huang, L.Y. Efficient incremental updating algorithm for computing core of decision table. J. Chin. Comput. Syst. 2010, 31, 739–743. [Google Scholar]
Wang, Y.Q.; Fan, N.B. Improved algorithms for attribute reduction based on simple binary discernibility matrix. Comput. Sci. 2015, 42, 210–215. [Google Scholar]
Zhang, T.F.; Yang, X.X.; Ma, F.M. Algorithm for attribute relative reduction based on generalized binary discernibility matrix. In Proceedings of the 26th Chinese Control and Decision Conference, Changsha, China, 30 May–2 June 2014; pp. 2626–2631. [Google Scholar]
Li, J.; Wang, X.; Fan, X.W. Improved binary discernibility matrix attribute reduction algorithm in customer relationship management. Procedia Eng. 2010, 7, 473–476. [Google Scholar] [CrossRef]
Tiwari, K.S.; Kothari, A.G.; Keskar, A.G. Reduct generation from binary discernibility matrix: an hardware approach. Int. J. Future Comput. Commun. 2012, 1, 270–272. [Google Scholar] [CrossRef]
Zhi, T.Y.; Miao, D.Q. The binary discernibility matrix’s transformation and high Efficiency attributes reduction algorithm’s conformation. Comput. Sci. 2002, 29, 140–142. [Google Scholar]
Yang, C.J.; Ge, H.; Li, L.S. Attribute reduction of vertically partitioned binary discernibility matrix. Control Decis. 2013, 28, 563–568. [Google Scholar]
Ren, Q.; Luo, Y.T.; Yao, G.S. An new method for modifying binary discernibility matrix and computation of core. J. Chin. Comput. Syst. 2013, 34, 1437–1440. [Google Scholar]
Ding, M.W.; Zhang, T.F.; Ma, F.M. Incremental attribute reduction algorithm based on binary discernibility matrix in incomplete information System. Comput. Sci. 2017, 44, 244–250. [Google Scholar]
Hu, S.P.; Zhang, Q.H.; Yao, L.Y. Effective algorithm for computing attribute core based on binary representation. Comput. Sci. 2016, 43, 79–83. [Google Scholar]
Skowron, A.; Rauszer, C. The discernibility matrices and functions in information systems. In Intelligent Decision Support: Handbook of Applications and Advances of the Rough Sets Theory; Słowiński, R., Ed.; Kluwer Academic Publishers: Dordrecht, The Netherlands, 1992; pp. 331–362. [Google Scholar]
Guan, Y.Y.; Wang, H.K.; Wang, Y.; Yang, F. Attribute reduction and optimal decision rules acquisition for continuous valued information systems. Inf. Sci. 2009, 179, 2974–2984. [Google Scholar] [CrossRef]
Sun, B.Z.; Ma, W.M.; Gong, Z.T. Dominance-based rough set theory over interval-valued information systems. Expert Syst. 2014, 31, 185–197. [Google Scholar] [CrossRef]
Qian, Y.H.; Liang, J.Y.; Dang, C.Y. Interval ordered information systems. Comput. Math. Appl. 2008, 56, 1994–2009. [Google Scholar] [CrossRef] [Green Version]
Qian, Y.H.; Dang, C.Y.; Liang, J.Y.; Tang, D.W. Set-valued ordered information systems. Inf. Sci. 2009, 179, 2809–2832. [Google Scholar] [CrossRef]
Zhang, H.Y.; Leung, Y.; Zhou, L. Variable-precision-dominance-based rough set approach to interval-valued information systems. Inf. Sci. 2013, 244, 75–91. [Google Scholar] [CrossRef]
Kryszkiewicz, M. Rough set approach to incomplete information systems. Inf. Sci. 1998, 112, 39–49. [Google Scholar] [CrossRef]
Leung, Y.; Li, D.Y. Maximal consistent block technique for rule acquisition in incomplete information systems. Inf. Sci. 2003, 153, 85–106. [Google Scholar] [CrossRef]
Guan, Y.Y.; Wang, H.K. Set-valued information systems. Inf. Sci. 2006, 176, 2507–2525. [Google Scholar] [CrossRef]
Miao, D.Q.; Zhao, Y.; Yao, Y.Y.; Li, H.; Xu, F. Relative reducts in consistent and inconsistent decision tables of the Pawlak rough set model. Inf. Sci. 2009, 179, 4140–4150. [Google Scholar] [CrossRef]
Wang, X.; Ma, Y.; Wang, L. Research on space complexity of binary discernibility matrix. J. Tianjin Univ. Sci. Technol. 2006, 21, 50–53. [Google Scholar] [CrossRef]
Kaur, G.; Garg, H. Generalized cubic intuitionistic fuzzy aggregation operators using t-norm operations and their applications to group decision-making process. Arab. J. Sci. Eng. 2018. [Google Scholar] [CrossRef]
Garg, H. New logarithmic operational laws and their aggregation operators for Pythagorean fuzzy set and their applications. Int. J. Intell. Syst. 2019, 34, 82–106. [Google Scholar] [CrossRef]
Garg, H.; Singh, S. A novel triangular interval type-2 intuitionistic fuzzy sets and their aggregation operators. Iran. J. Fuzzy Syst. 2018, 15, 69–93. [Google Scholar]
Garg, H.; Kumar, K. Improved possibility degree method for ranking intuitionistic fuzzy numbers and their application in multiattribute decision making. Granul. Comput. 2018. [Google Scholar] [CrossRef]

Figure 1. A Deterministic Finite Automaton for Row Relations.

Figure 2. A Deterministic Finite Automaton for

r_{q} \supset r_{p}

.

Figure 2. A Deterministic Finite Automaton for

r_{q} \supset r_{p}

.

Figure 3. A Deterministic Finite Automaton for

r_{p} = r_{q}

.

Figure 3. A Deterministic Finite Automaton for

r_{p} = r_{q}

.

Figure 4. A Deterministic Finite Automaton for

r_{p} \supset r_{q}

.

Figure 4. A Deterministic Finite Automaton for

r_{p} \supset r_{q}

.

Figure 5. A Deterministic Finite Automaton for

r_{p} \neq r_{q}

.

Figure 5. A Deterministic Finite Automaton for

r_{p} \neq r_{q}

.

Figure 6. A Deterministic Finite Automaton for Column Relations.

Figure 7. (a) Dermatlogy; (b) Credit_a; (c) Controceptive_Method_Choice; (d) Letter; (e) Flag; (f) Mushroom.

Figure 8. (a) Dermatlogy; (b) Credit_a; (c) Controceptive_Method_Choice; (d) Letter; (e) Flag; (f) Mushroom.

Table 1. A decision system.

	a₁	a₂	a₃	a₄	d
x₁	1	2	0	0	1
x₂	1	2	0	0	1
x₃	1	2	1	1	2
x₄	1	3	1	1	3
x₅	2	4	1	1	4

Table 2. Time consumption of BDMR, IBDMR and BDMSDFA.

Data Sets	Num. of Objects	Num. of Attributes	Num. of Rows	Num. of Columns	Time of BDMR (ms)	Time of IBDMR (ms)	Time of BDMSDFA (ms)
Labor_neg	40	15	52	14	2	2	1
Lung-cancer	32	57	206	53	11	9	3
Heart_statlog	270	14	96	13	32	27	14
Autos	250	26	36	19	75	68	36
Credit_a	690	16	27	12	113	105	55
Breast_w	699	10	20	9	75	73	29
Anneal	898	39	9	9	206	194	134
Promoters	106	58	2761	57	1517	936	398
Dermatology	366	35	1347	31	3239	2638	1318
Connect_4	67,557	43	697	42	2,444,328	2,433,863	1,759,917

© 2018 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zhang, N.; Li, B.; Zhang, Z.; Guo, Y. A Quick Algorithm for Binary Discernibility Matrix Simplification using Deterministic Finite Automata. Information 2018, 9, 314. https://doi.org/10.3390/info9120314

AMA Style

Zhang N, Li B, Zhang Z, Guo Y. A Quick Algorithm for Binary Discernibility Matrix Simplification using Deterministic Finite Automata. Information. 2018; 9(12):314. https://doi.org/10.3390/info9120314

Chicago/Turabian Style

Zhang, Nan, Baizhen Li, Zhongxi Zhang, and Yanyan Guo. 2018. "A Quick Algorithm for Binary Discernibility Matrix Simplification using Deterministic Finite Automata" Information 9, no. 12: 314. https://doi.org/10.3390/info9120314

APA Style

Zhang, N., Li, B., Zhang, Z., & Guo, Y. (2018). A Quick Algorithm for Binary Discernibility Matrix Simplification using Deterministic Finite Automata. Information, 9(12), 314. https://doi.org/10.3390/info9120314

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Quick Algorithm for Binary Discernibility Matrix Simplification using Deterministic Finite Automata

Abstract

1. Introduction

2. Preliminaries

3. Binary Discernibility Matrices and Their Simplifications

4. A Quick Algorithm for Binary Discernibility Matrix Simplification

5. Experimental Results and Analyses

6. Conclusions

Author Contributions

Funding

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI