Matrix Method for the Optimal Scale Selection of Multi-Scale Information Decision Systems

Chen, Ying Sheng; Li, Jin Jin; Huang, Jian Xin

doi:10.3390/math7030290

Open AccessArticle

Matrix Method for the Optimal Scale Selection of Multi-Scale Information Decision Systems

by

Ying Sheng Chen

¹,

Jin Jin Li

^1,2,* and

Jian Xin Huang

¹

Fujian Province University Key Laboratory of Computational Science, School of Mathematics Sciences, Huaqiao University, Quanzhou 362021, China

²

School of Mathematics Sciences and Statistics, Minnan Normal University, Zhangzhou 363000, China

^*

Author to whom correspondence should be addressed.

Mathematics 2019, 7(3), 290; https://doi.org/10.3390/math7030290

Submission received: 22 January 2019 / Revised: 13 March 2019 / Accepted: 17 March 2019 / Published: 21 March 2019

Download

Browse Figure

Versions Notes

Abstract

:

In multi-scale information systems, the information is often characterized at multi scales and multi levels. To facilitate the computational process of multi-scale information systems, we employ the matrix method to represent the multi-scale information systems and to select the optimal scale combination of multi-scale decision information systems in this study. To this end, we first describe some important concepts and properties of information systems using some relational matrices. The relational matrix is then introduced into multi-scale information systems, and used to describe some main concepts in systems, including the lower and upper approximate sets and the consistence of systems. Furthermore, from the view of the relation matrix, the scale significance is defined to describe the global optimal scale and the local optimal scale of multi-scale information systems. Finally, the relational matrix is used to compute the scale significance and to construct the optimal scale selection algorithms. The efficiency of these algorithms is examined by several practical examples and experiments.

Keywords:

rough sets; information system; relation matrix; multi-scale decision system; scale selection

1. Introduction

Granular computing [1,2] originated from fuzzy information granulation is a mathematical method for knowledge representation and data mining. The purpose is to solve complex problems by dividing the massive information into relatively simple blocks according to its respective characteristics and performance. Since the concept of granular computing was put forward, it has become a hot research topic and has been widely used in many practical applications [3,4,5,6,7,8,9,10,11,12,13].

The theory of rough set plays an important role in the promotion and development of granular computing [14,15,16,17,18,19,20]. Pawlak [14] used an information system to study granular computing. If a decision is required, it is usually an information system with decision attributes.

People can process a lot of information at different levels and scales. Based on this view point, multi-scale information systems have been developed and widely studied [21,22,23,24,25,26,27]. Ferone et al. used feature granulation to study feature selection [12,13]. In papers [21,22], Wu and Leung et al. proposed a multi-scale decision information system model. In their model, the objects are granulated with granularity methods from fine to coarse. The optimal scale selection of multi-scale decision tables was also investigated. Gu and Wu [23] presented a formal method of knowledge acquisition measured at different granularity. Wu and Qian [24] measured the uncertainty in incomplete multi-scale information tables with the Dempster-Shafer evidence theory [28]. Shen et al. [25] employed a local method to induce decision rules in multi-scale decision tables. Li and Hu [26] introduced a step-wise method for the optimal scale selection of multi-scale decision tables.

Due to the massive information provided in multi-scale information systems, which leads to much time consuming in the computation of the concepts in the systems, this article aims to employ the Boolean matrix and matrix computation in order to facilitate the knowledge description and the optimal scale selection in multi-scale decision tables. Indeed, the matrix method has had a computation advantage in many backgrounds [29,30,31,32,33,34,35,36,37], such as in classical rough set and Information systems [29,30,31], static and dynamic information systems [32,35,36], covering information systems [33,34], and so on. For multi-scale information systems, it is worthy to use the relational matrix to obtain the optimal scale selection and develop some matrix methods to facilitate the computational process.

The paper is structured as follows. We first introduce a relational matrix to represent the relative concepts in information systems. We then examine some properties of information systems based on the Boolean matrix. In Section 3, we introduce multi-scale information systems and the Boolean matrix representation. In Section 4, we use the relational matrix to define the scale significance, and to investigate the optimal scale selection of multi-scale information systems. In Section 5, we seek the optimal scale of multi-scale information systems by using the scale significance based on the relation matrix. The paper is finally concluded with a summary. In Section 7, the effectiveness of the matrix method is illustrated by experiments.

2. Preliminary Background

2.1. Rough Set and Information Systems

In this section we introduce some basic concepts of rough set and information systems.

Definition 1

([1]). The Pawlak approximate space is defined as

(U, R)

, where

U = {x_{1}, x_{2}, \dots, x_{n}}

is a non-empty finite set called the universe, and R is the equivalent relation on U, the lower and the upper approximation sets of

X \subseteq U

are defined as

\underset{̲}{R} (X)

,

\bar{R} (X)

, respectively:

\underset{̲}{R} (X) = {x \in U | {[x]}_{R} \subseteq X}, \bar{R} (X) = {x \in U | {[x]}_{R} \cap X \neq \emptyset} .

where

{[x]}_{R} = {y \in U | (x, y) \in R}

is the equivalent class containing x.

Definition 2

([14]). Let

S = (U, A)

be an information system, where

U = {x_{1}, x_{2}, \dots, x_{n}}

is a non-empty finite set called the universe,

A = {a_{1}, a_{2}, \dots, a_{m}}

is a non-empty finite set of attributes. For any

a \in A

, there is a surjective map

a : U \to V_{a}

ie.

a (x) \in V_{a}, x \in U

, where

V_{a} = {a (x) | x \in U}

is called the domain of attribute a. For any

B \subseteq A

, the equivalent relation

R_{B}

On U is determined as

R_{B} = {(x, y) \in U \times U | a (x) = a (y), \forall x \in B}

The equivalence class of

x \in U

in terms of

R_{B}

is denoted as

{[x]}_{B} = {y \in U | (x, y) \in R_{B}} .

The lower approximation set and the upper approximate set of

X \subseteq U

corresponding to the attribute subset B are represented as

{\underset{̲}{R}}_{B} (X)

and

{\bar{R}}_{B} (X)

, respectively,

{\underset{̲}{R}}_{B} (X) = {x \in U | {[x]}_{B} \subseteq X}, {\bar{R}}_{B} (X) = {x \in U | {[x]}_{B} \cap X \neq \emptyset} .

Definition 3

([14]). Let

S = (U, A ⋃ d)

be a decision information system, where

S = (U, A)

is an information system,

d \notin A

is called the decision attribute, and

d : U ⟶ V_{d}

is a surjective map, where

V_{d} = {d (x) | x \in U}

is called the domain of d.

Similarly, the equivalent relation on U induced by

R_{d}

is given by

R_{d} = {(x, y) \in U \times U | d (x) = d (y)} .

The equivalence class of

x \in U

is denoted as

{[x]}_{d} = {y \in U | (x, y) \in R_{d}} .

Let

S = (U, A ⋃ d)

be a decision information system, we summarize the following two kinds of consistent:

(1) If

R_{A} \subseteq R_{d}

, then S is globally consistent;

(2) For

x \in U

, if

{[x]}_{A} \subseteq {[x]}_{d}

, then S is locally consistent for x.

2.2. Boolean Matrix Characterization of Decision Information Systems

Definition 4.

A non-negative matrix

M_{B} = {(r_{i j})}_{n \times n}

is called a Boolean matrix, where

r_{i j} \in {0, 1}

. Let

A = {(a_{i j})}_{n \times m},

B = {(b_{i j})}_{n \times m},

C = {(c_{i j})}_{m \times l}

are Boolean matrices, define the following operations:

(1) Order relation

A ⩽ B \Leftrightarrow a_{i j} ⩽ b_{i j}

;

(2) Meet

A \land B = {(a_{i j} \land b_{i j})}_{n \times m};

(3) Boolean product

A C = {(d_{i j})}_{n \times l}

, where

d_{i j} = \sum_{k = 1}^{m} a_{i k} \lor c_{k j}

;

(4) Complement

\sim A = {(1 - a_{i j})}_{n \times m}

.

where

a_{i k} \lor c_{k j} = m a x {a_{i k}, c_{k j}}

,

a_{i j} \land b_{i j} = m i n {a_{i j}, b_{i j}} .

It can be seen that

a_{i j} \land b_{i j} = a_{i j} \times b_{i j}

and

A ⩽ B \Leftrightarrow A \land (\sim B) = 0 .

Definition 5

([33]). Let

U = {x_{1}, x_{2}, \dots, x_{n}}

and

X \subseteq U

, The characteristic function of X is defined as

f (X) = (f (x_{1}), f (x_{2}), \dots, f (x_{n}))

, where

f (x_{i}) = \{\begin{matrix} 1, & x_{i} \in X, \\ 0, & x_{i} \notin X . \end{matrix}

It can be seen that

X \subseteq Y \Leftrightarrow f (X) ⩽ f (Y) \Leftrightarrow f (X) \land (\sim f (Y)) = 0 .

Definition 6

([33]). Let

(U, R)

be an approximate space,

U = {x_{1}, x_{2}, \dots, x_{n}}

, The relation matrix of R is defined as

M_{R} = {(r_{i j})}_{n \times n}

, where

m_{i j} = \{\begin{matrix} 1, & (x_{i}, x_{j}) \in R, \\ 0, & (x_{i}, x_{j}) \notin R . \end{matrix}

Obviously, by Definitions 5 and 6, we have

M_{R} = [\begin{matrix} f ({[x_{1}]}_{R}) \\ f ({[x_{2}]}_{R}) \\ ⋮ \\ f ({[x_{n}]}_{R}) \end{matrix}]

Theorem 1

([30]). Let

(U, R)

be an approximate space,

U = {x_{1}, x_{2}, \dots, x_{n}}

, and

X \subseteq U

, then

f (\underset{̲}{R} (X)) = \sim f (\sim X) M_{R}, f (\bar{R} (X)) = f (X) M_{R} .

Proof.

Let

f (\underset{̲}{R} (X)) = (f_{1}, f_{2}, \dots, f_{n})

,

f (\sim X) M_{R} = (g_{1}, g_{2}, \dots, g_{n})

,

\sim f (\sim X) M_{R} = (h_{1}, h_{2}, \dots, h_{n})

.

According to the symmetry of matrix

M_{R}

and Definition 6, we have

M_{R} = M_{R}^{T} = (f {({[x_{1}]}_{R})}^{T}, f {({[x_{2}]}_{R})}^{T}, \dots, f {({[x_{n}]}_{R})}^{T})

and

g_{i} = f (\sim X) f {({[x_{i}]}_{R})}^{T}

. Therefore

f_{i} = 1 \Leftrightarrow x_{i} \in \underset{̲}{R} (X) \Leftrightarrow {[x_{i}]}_{R} \subseteq X \Leftrightarrow {[x_{i}]}_{R} \cap (\sim X) = ⌀ \Leftrightarrow f (\sim X) f {({[x_{i}]}_{R})}^{T}) = 0 \Leftrightarrow g_{i} = 0 \Leftrightarrow h_{i} = 1

. Similarly,

f_{i} = 0 \Leftrightarrow h_{i} = 0

. Therefore

f (\underset{̲}{R} (X)) = \sim f (\sim X) M_{R} .

By

\bar{R} (X) = \sim \underset{̲}{R} (\sim X)

, we have

f (\bar{R} (X)) = f (\sim \underset{̲}{R} (\sim X)) = \sim f (\underset{̲}{R} (\sim X)) = \sim f (\sim X)) M_{R} = f (X) M_{R} .

□

Example 1.

Let

U = {x_{1}, x_{2}, x_{3}, x_{4}, x_{5}}, U / R = {{x_{1}}, {x_{2}}, {x_{3}, x_{4}}, {x_{5}}}, X = {x_{1}, x_{2}, x_{3}}

f (\underset{̲}{R} (X)) = \sim f (\sim X) M_{R} = \sim [\begin{matrix} 0 & 0 & 0 & 1 & 1 \end{matrix}] [\begin{matrix} 1 & 0 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 & 0 \\ 0 & 0 & 1 & 1 & 0 \\ 0 & 0 & 1 & 1 & 0 \\ 0 & 0 & 0 & 0 & 1 \end{matrix}] = [\begin{matrix} 1 & 1 & 0 & 0 & 0 \end{matrix}],

f (\bar{R} (X)) = f (X) M_{R} = [\begin{matrix} 1 & 1 & 1 & 0 & 0 \end{matrix}] [\begin{matrix} 1 & 0 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 & 0 \\ 0 & 0 & 1 & 1 & 0 \\ 0 & 0 & 1 & 1 & 0 \\ 0 & 0 & 0 & 0 & 1 \end{matrix}] = [\begin{matrix} 1 & 1 & 1 & 1 & 0 \end{matrix}] .

Therefore,

\underset{̲}{R} (X) = {x_{1}, x_{2}}, \bar{R} (X)) = {x_{1}, x_{2}, x_{3}, x_{4}}

.

Theorem 2

([29]). Let U be an universe,

R_{1}, R_{2}

are equivalent relations on U, then the following conclusions hold:

(1)

M_{R_{1} \cap R_{2}} = M_{R_{1}} \land M_{R_{2}}

;

(2) If

R_{1} \subseteq R_{2}

, then

M_{R_{1}} ⩽ M_{R_{2}} .

Let

S = (U, A ⋃ d)

be an decision information system,

B \subseteq A

, the relation matrix of the equivalent relation

R_{B}

is denote as

M_{B}

, the relation matrix corresponding to

R_{d}

is denoted as

M_{d}

,

Theorem 3.

Let

S = (U, A ⋃ d)

be an decision information system,

B \subseteq A

, then

M_{B} = {(m_{i j})}_{n \times n}

, where

m_{i j} = \{\begin{matrix} 1, & a (x_{i}) = a (x_{j}), \forall a \in B, \\ 0, & e l s e . \end{matrix}

and

M_{d} = {(r_{i j})}_{n \times n}

, where

r_{i j} = \{\begin{matrix} 1, & d (x_{i}) = d (x_{j}), \\ 0, & e l s e . \end{matrix}

Proof.

By Definition 2,

(x_{i}, x_{j}) \in R_{B} \Leftrightarrow a (x_{i}) = a (x_{j}), \forall a \in B

, by Definition 6, we have

m_{i j} = 1 \Leftrightarrow a (x_{i}) = a (x_{j}), \forall a \in B

. Similarly

r_{i j} = 1 \Leftrightarrow d (x_{i}) = d (x_{j}) .

□

Theorem 4.

Let

S = (U, A ⋃ d)

be a decision information system, then the following conclusions are obtained:

(1) S is locally consistent for

x \in U

if and only

f ({[x]}_{A}) ⩽ f ({[x]}_{d})

hold;

(2) S is globally consistent if and only if

M_{A} ⩽ M_{d}

is hold.

Proof.

By Definition 5, S is locally consistent for

x \in U \Leftrightarrow {[X]}_{A} \subseteq {[x]}_{d} \Leftrightarrow f ({[x]}_{A}) ⩽ f ({[x]}_{d})

;

By Theorem 2, we have S is globally consistent

\Leftrightarrow R_{A} \subseteq R_{d}

\Leftrightarrow M_{A} ⩽ M_{d}

. □

Example 2.

Determining the globally consistent of the following system Table 1.

M_{A}

is the relational matrix corresponding to A,

M_{d}

is the relational matrix corresponding to d. According to Theorem 3, by calculation

M_{A} = [\begin{matrix} 1 & 0 & 1 & 0 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 & 0 & 0 & 1 \\ 1 & 0 & 1 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 1 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 1 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 1 & 0 \\ 0 & 1 & 0 & 0 & 0 & 0 & 1 \end{matrix}], M_{d} = [\begin{matrix} 1 & 0 & 1 & 0 & 0 & 1 & 0 \\ 0 & 1 & 0 & 0 & 0 & 0 & 1 \\ 1 & 0 & 1 & 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 1 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 1 & 0 & 0 \\ 1 & 0 & 1 & 0 & 0 & 1 & 0 \\ 0 & 1 & 0 & 0 & 0 & 0 & 1 \end{matrix}]

Obviously

M_{A} \leq M_{d}

, by Theorem 4, the

(U, A \cup {d})

is globally consistent.

3. Relational Matrix in Generalized Multi-Scale Information Systems

The theoretical model of multi-scale decision information systems was first proposed by Wu and Leung [21]. Some scholars have participated in this study [21,22,23,24,25,26,27,38,39,40,41], Li and Hu generalized this model in paper [27]. Among many studies, optimal scale selection is an important subject. Wu, Li et al. [29,38,40] studied the optimal scale selection of multi-scale decision information systems. Gu et al. [39] studied the optimal scale selection of incomplete multi-scale decision information systems. Li and Hu [27] studied the optimal scale selection for generalized multi-scale decision information systems. Li et al. [26] gave an attribute significance of generalized multi-scale decision information systems to search for optimal scale selection. Decision information system models involve a large number of set operations. The matrix itself is a powerful tool. The use of matrices to study multi-scale information systems helps to improve algorithm efficiency and further develop multi-scale information table theory. In this part, we introduce relation matrix into multi-scale information systems to prepare for optimal scale selection.

Definition 7

([21,27]). A multi-scale information system is a tuple

S = (U, A)

, where

U = {x_{1}, x_{2}, \dots, x_{n}}

is a nonempty and finite set of objects called the universe,

A = {a_{1}, a_{2}, \dots, a_{m}}

is a nonempty and finite set of attributes, and each

a_{j}

has

I_{j} (j = 1, 2, \dots, n)

scales. Then a multi-scale information system

S = (U, A)

can be represented as

S = (U, {a_{j}^{k} | k = 1, 2, \dots, I_{j}, j = 1, 2, \dots, m}) .

where

a_{j}^{k} : U ⟶ V_{j}^{k}

is a surjective function and

V_{j}^{k}

is the domain of

a_{j}

corresponding to the kth scale, and furthermore, for any

j = 1, 2, \dots, m, 1 ⩽ k ⩽ I_{j} - 1

, there exists a surjective

g_{j}^{k, k + 1} : V_{j}^{k} ⟶ V_{j}^{k + 1}

such that

a_{j}^{k + 1} = g_{j}^{k, k + 1} . a_{j}^{k + 1}

, i.e.,

a_{j}^{k + 1} (x) = g_{j}^{k, k + 1} . a_{j}^{k + 1} (x), x \in U

,

g_{j}^{k, k + 1}

is called the information granularity transformation function.

When

I_{1} = I_{2} = \dots = I_{j} = I

, the information system defined above is the multi-scale information system proposed by Wu-Leung in paper [21].

The equivalent relation determined by attribute

a_{j}^{l_{j}} (1 ⩽ l_{j} ⩽ I_{j})

is

R_{a_{j}^{l_{j}}} = {(x, y) \in U \times U | a_{j}^{l_{j}} (x) = a_{j}^{l_{j}} (y)} .

By the existence of the surjective

g_{j}^{k, k + 1} : V_{j}^{k} ⟶ V_{j}^{k + 1}

, for any

a_{j} \in U

, we have

R_{a_{j}^{1}} \subseteq R_{a_{j}^{2}} \subseteq \dots \subseteq R_{a_{j}^{I_{j}}} .

For a multi-scale information system

S = (U, A)

, if the attribute

a_{j} \in U

is restricted on their

l_{j}

scale, we call

K = (l_{1}, l_{1}, \dots, l_{m})

to be a scale combination of the system

S = (U, A)

, and all the scale combinations of the system

S = (U, A)

are denoted as ∑ and ∑ forms a partial ordered lattice [27].

The information system corresponding to the

K = (l_{1}, l_{1}, \dots, l_{m})

scale combination is denoted as

S^{K} = (U, A^{K}) = (U, {a_{1}^{l_{1}}, a_{2}^{l_{2}}, \dots, a_{m}^{l_{m}}}) .

The equivalent relation induced by

A^{K}

is denoted as

R_{A^{K}}

, while the relation matrix of

R_{A^{K}}

is denoted as

M_{A^{K}}

.

Definition 8

([27]). Let

S = (U, A)

be a multi-scale information system,

K, L \in \sum,

K = (l_{1}, l_{2}, \dots, l_{m}),

L = (h_{1}, h_{2}, \dots, h_{m})

, if

l_{i} ⩽ h_{i} (i = 1, 2, \dots, m)

, we called the scale combination to be K is finer than L, denoted as

K ⩽ L

, and further if

K ⩽ L

and there is at least one of

i \in {1, 2, \dots, m}

such that

l_{i} < h_{i}

, we called K is strictly finer than L, denoted as

K < L

.

Example 3.

The following example gives a comprehensive evaluation of three courses for eight students, and scores are divided into four criteria, as shown in Table 2.

We use attribute

a_{1}

to evaluate the results of the first course, which divided into four levels, attribute

a_{2}

for the second course, which divided into three levels, and attribute

a_{3}

for the third course, which divided into two levels. Attribute d is called decision attribute, 1 is qualified and 0 is unqualified. According to the scores of each student, the following multi-scale decision-making information system is obtained, as shown in Table 3. In order to facilitate the expression below, we denote excellent by "E", and similarly "G" denotes good, "F" denotes fair, "P" denotes pass, "B" denotes bad, "S" denotes super, "M" denotes middle, "L" denotes low, "Y" denotes yes, "N" denotes no.

There are 24 different scale combinations in multi-scale information system in Table 3. According to the scale relation in Definition 8, we can obtain a partial ordered lattices [27], as shown in Figure 1.

We now discuss some properties of multi-scale information systems using the relation matrix.

Let

S = (U, A)

be a multi-scale information system and ∑ is the set of all scale combinations for the system,

K = (l_{1}, l_{2}, \dots, l_{m}) \in \sum,

the decision information system corresponding to the scale combination K is

S^{K} = (U, A^{K} ⋃ d)

, the relation matrix of

R_{A^{K}}

is denote as

M_{A^{K}}

.

Theorem 5.

Let

S = (U, A)

be a multi-scale information system,

K = (l_{1}, l_{2}, \dots, l_{m}) \in \sum,

then

M_{A^{K}} = {(m_{i j})}_{n \times n}

, where

m_{i j} = \{\begin{matrix} 1, & a_{k}^{l_{k}} (x_{i}) = a_{k}^{l_{k}} (x_{j}), k = 1, 2, \dots, m \\ 0, & e l s e . \end{matrix}

Proof.

This conclusion can be obtained from Theorem 3. □

Theorem 6.

Let

S = (U, A)

be a multi-scale information system and ∑ is the set of all scale combinations for the system,

K, L \in \sum,

K < L,

X \subseteq U

. Then the following conclusions hold:

(1) For any attribute

a_{j} \in A (j = 1, 2, \dots, m),

M_{a_{j}}^{1} ⩽ M_{a_{j}}^{2} ⩽ \dots ⩽ M_{a_{j}}^{I_{j}}

;

(2)

M_{A^{K}} ⩽ M_{A^{L}}

;

(3)

M_{A^{K}} = \land_{l \in K} M_{a^{l}}

;

(4)

f (\underset{̲}{R_{A^{K}}} (X)) = \sim (f (\sim X)) M_{A^{K}}

;

(5)

f (\bar{R_{A^{K}}} (X)) = (f (X)) M_{A^{K}}

;

(6)

f (\underset{̲}{R_{A^{L}}} (X)) ⩽ f (\underset{̲}{R_{A^{K}}} (X))

;

(7)

f (\bar{R_{A^{K}}} (X)) ⩽ f (\bar{R_{A^{L}}} (X))

.

Proof.

(1) By Definition 7, as a result of

R_{a_{j}^{1}} \subseteq R_{a_{j}^{2}} \subseteq \dots \subseteq R_{a_{j}^{I_{j}}}

, according to Theorem 2, we have

M_{a_{j}^{1}} ⩽ M_{a_{j}^{2}} ⩽ \dots ⩽ M_{a_{j}^{I_{j}}}

;

(2)

K ⩽ L \Rightarrow R_{A^{K}} \subseteq R_{A^{L}} \Rightarrow M_{A^{K}} ⩽ M_{A^{L}}

;

(3) Suppose

K = (l_{1}, l_{2}, \dots, l_{m})

, then

A^{K} = (a^{l_{1}}, a^{l_{2}}, \dots, a^{l_{m}})

,

R_{A^{K}} = R_{A^{l_{1}}} \cap R_{A^{l_{2}}} \cap \dots \cap R_{A^{l_{m}}}

, by Theorem 2, we have

M_{A^{K}} = M_{A_{1}^{l}} \land M_{A_{2}^{l}} \land \dots \land M_{A_{m}^{l}} = \land_{l \in K} M_{a^{l}}

;

(4) (4) and (5) can be directly obtained by Theorem 1;

(6)

K < L \Rightarrow R_{A^{K}} \subseteq R_{A^{L}} \Rightarrow \underset{̲}{R_{A^{L}}} (X) \subseteq \underset{̲}{R_{A^{K}}} (X) \Rightarrow f (\underset{̲}{R_{A^{L}}} (X)) ⩽ f (\underset{̲}{R_{A^{K}}} (X))

;

(7)

K < L \Rightarrow R_{A^{K}} \subseteq R_{A^{L}} \Rightarrow \bar{R_{A^{K}}} (X) \subseteq \bar{R_{A^{L}}} (X) \Rightarrow f (\bar{R_{A^{K}}} (X)) ⩽ f (\bar{R_{A^{K}}} (X))

. □

4. Optimal Scale Selection for Consistent Multi-Scale Decision Information Systems

Definition 9

([21,27]). Let

S = (U, A \cup d)

be a multi-scale decision information system, where

S = (U, A)

is a multi-scale information system, ∑ is the collection of all scale combinations,

d \notin A

is a decision attribute and

d : V \to V_{d}

is a surjective map, where

V_{d} = {d (x) | x \in U}

is called the domain of d.

4.1. Global Optimal Scale

The decision information system corresponding to the

K = (l_{1}, l_{1}, \dots, l_{m})

scale combination is denoted as

S^{K} = (U, A^{K} \cup d) = (U, {a_{1}^{l_{1}}, a_{2}^{l_{2}}, \dots, a_{m}^{l_{m}}} \cup d) .

Obviously,

S^{K} = (U, A^{K} \cup d)

is globally consistent if and only if

R_{A^{K}} \subseteq R_{d}

The relation matrix corresponding to the equivalent relation

R_{A^{k}}

is

M_{A^{K}}

, the relation matrix corresponding to the equivalent relation

R_{d}

is

M_{d}

.

By Theorem 2,

S^{K} = (U, A^{K} \cup d)

is globally consistent if and only if

M_{A^{K}} ⩽ M_{d}

.

Definition 10

([21]). Let

S = (U, A \cup d)

be a multi-scale decision information system, and ∑ is the collection of all scale combinations, For the finer scale

K_{0} = (1, 1, \dots, 1)

, if

R_{A^{K_{0}}} \subseteq R_{d}

, then we call S to be consistent. If there is a

K \in \sum

such that

R_{A^{K}} \subseteq R_{d}

, but for any

H \in \sum

,

K < H

,

R_{A^{H}} \subseteq R_{d}

does not hold, we call K is a global optimal scale combination.

Theorem 7.

Let

S = (U, A \cup d)

be a multi-scale decision information system, and ∑ the collection of all scale combinations. Then the following conclusions hold:

(1) S is globally consistent if and only if

M_{A^{K_{0}}} ⩽ M_{d}

;

(2) S is globally consistent if and only if

M_{A^{K_{0}}} \land (\sim M_{d}) = 0

;

(3) A scale combination

K \in \sum

is the global optimal scale combination if and only if

M_{A^{K}} ⩽ M_{d}

holds, and for any

H \in \sum

with

K < H

,

M_{A^{H}} ⩽ M_{d}

does not hold;

(4) Suppose

K = (l_{1}, l_{2}, \dots, l_{m}) \in \sum,

let

K_{j} = (l_{1}, \dots, l_{j}, \dots, l_{m})

, then

M_{A^{K_{j}}} = M_{A^{K}} \land M_{a^{l_{j} - 1}}

.

Proof.

(1)S is global consistent

\Leftrightarrow R_{A^{K_{0}}} \subseteq R_{A^{d}} \Leftrightarrow M_{A^{K_{0}}} ⩽ M_{d}

;

(2) The result is obvious;

(3) By the definition of the global optimal scale combination, it is easy to be obtained;

(4) As a result of

M_{a^{l_{j} - 1}} ⩽ M_{a^{l_{j}}}

, we have

M_{a^{l_{j} - 1}} \land M_{a^{l_{j}}} = M_{a^{l_{j} - 1}}

, Therefore,

M_{A^{K_{j}}} = M_{a^{l_{1}} \dots a^{l_{j} - 1} \dots a^{l_{m}}} = M_{a^{l_{1}}} \land \dots M_{a}^{l_{j} - 1} \land \dots M_{a}^{l_{m}} = M_{A^{K}} \land M_{a^{l_{j} - 1}} .

□

Definition 11.

Let

S = (U, A \cup d)

be a multi-scale decision information system and ∑ is the collection of all scale combinations,

K = (l_{1}, l_{2}, \dots, l_{m}) \in \sum

. Define the significance of scale combination K as

s i g (K) = ∥ M_{A^{K}} \land (\sim M_{d}) ∥

where

∥ A ∥

represents the numbers of 1 in the matrix A.

Let

S = (U, A \cup d)

be a multi-scale decision information system, and ∑ is the collection of all scale combinations. By Definition 7, the decision information system corresponding to scale combination K is represented as

S^{K} = (U, A^{K} \cup d)

. Obviously,

S^{K} = (U, A^{K} \cup d)

is globally consistent if

s i g (K) = 0

.

For the scale combination

K = (l_{1}, l_{2}, \dots, l_{m})

, if the decision information system

S^{K} = (U, A^{K} \cup d)

is globally inconsistent, then

M_{A^{K}} ⩽ M_{d}

is not true, i.e.,

s i g (K) \neq 0

. The direct subsequent scale of scale combination K is denoted as

K_{j} = (l_{1}, \dots, l_{j} - 1, \dots, l_{m}) (j = 1, 2, \dots, m)

, The global consistence of the decision information system

S^{K_{j}}

is compared, i.e., the following significance is calculated as

s i g (K_{j}) = ∥ M_{A^{K_{j}}} \land (\sim M_{d}) ∥ = ∥ M_{A^{K}} \land M_{a_{j}^{l_{j} - 1}} \land (\sim M_{d}) ∥

It can be seen that if

s i g (K_{j})

is more smaller, the scale

K_{j} = (l_{1}, \dots, l_{j} - 1, \dots, l_{m})

becomes more significant. This means that this metric combination is selected first, in other words, the information system corresponding to this scale combination is closer to consistent. Based on this idea, a global optimal scale combination selection algorithm of a multi-scale decision system can be described as Algorithm 1.

As Algorithm 1 shows, the algorithm sets up the scale combination variable

o p t K

and significant variable

o p t S i g

with the coarsest scale and maximun value initially, and then the scale combination K is put into a queue and starts a iteration. In the iteration, the current scale combination K is take out from the queue and

s i g (K)

is calculated. If the current

s i g (K)

is smaller than

o p t S i g

, then

o p t S i g

and

o p t K

are replaced by the current scale combination K and

s i g (K)

. During the iterating, if the value of

o p t S i g

become zero then iteration is terminal and optimal scale combination is come out. When the queue is empty and

o p t S i g \neq 0

, consequence finer scale combinations are constructed from

o p t K

and are put into the queue, and starts another iteration.

Algorithm 1: Selecting the global optimal scale combination of a multi-scale decision system

Input: A multi-scale decision system

S = (U, A \cup d)

;
Output: The global optimal scale combination.
calculates

M_{d}

;
Let Queue=NULL
Let

o p t K = (I_{1}, I_{2}, \dots, I_{m})

//set the coarsest scale combination initially
Let

o p t S i g

= Integer. MAX

\underset{̲}{}

VALUE //set maximum value to

o p t S i g

initially
Queue. put(

o p t K

)
While (true)
While (not Queue. empty)
Let

K =

Queue. get() //pick a scale combination K from Queue
Let

s i g (K) = ∥ M_{A^{K}} \land (\sim M_{d}) ∥

//calculate

s i g (K)

If (

s i g (K) ⩽

o p t S i g

) //choose the minimum

s i g (K)

and its K
Let

o p t S i g

=

s i g (K)

Let

o p t K

=K
EndIf
If (

o p t S i g

==0) //if the optimal scale combination is found, return.
return

o p t K

         EndIf
EndWhile              //loop until Queue. empty
    Queue.clear()       //empty the Queue, and construct next finer scale combination
    Let

(l_{1}, l_{2}, \dots, l_{m}) = o p t K

For each

j \in (1, 2, \dots, m)

//set finer scale to each attribute, respectively
If

(l_{j} > 1)

Let

K = (l_{1}, l_{2}, \dots, l_{j - 1}, l_{j} - 1, l_{j + 1}, \dots, l_{m})

              Queue. put(K)
         EndIf
    EndFor               //next to search finer scale combination
EndWhile

Example 4.

In Example 2, we seek the global optimal scale combination of the system corresponding to Table 3.

By Definition 11, the sufficient condition for the decision information system

S^{K} = (U, A^{K} ⋃ d)

being global consistent is

s i g (K) = ∥ M_{A^{K}} \land (\sim M_{d}) ∥ = 0

, where

K = (l_{1}, l_{2}, \dots, l_{j}, \dots, l_{m}) \in \sum

.

We apply scale combination relation diagram 1 to illustrate the calculation steps of optimal scale selection in Table 3. We first judge whether the system is consistent or not, and then, according to the definition of scale significant, we look for the optimal scale combination from the scale relation diagram 1 from top to bottom.

(1) For the smallest scale combination

K_{0} = (1, 1, 1)

, according to Theorems 3 and 5, by calculation

M_{A^{K_{0}}} = (\begin{matrix} 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 \end{matrix}), \sim M_{d} = (\begin{matrix} 0 & 1 & 0 & 0 & 1 & 1 & 0 & 1 \\ 1 & 0 & 1 & 1 & 0 & 0 & 1 & 0 \\ 0 & 1 & 0 & 0 & 1 & 1 & 0 & 1 \\ 0 & 1 & 0 & 0 & 1 & 1 & 0 & 1 \\ 1 & 0 & 1 & 1 & 0 & 0 & 1 & 0 \\ 1 & 0 & 1 & 1 & 0 & 0 & 1 & 0 \\ 0 & 1 & 0 & 0 & 1 & 1 & 0 & 1 \\ 1 & 0 & 1 & 1 & 0 & 0 & 1 & 0 \end{matrix})

By Definition 11,

s i g (K_{0}) = ∥ M_{A_{0}^{K}} \land (\sim M_{d}) ∥ = 0

, So, the decision information system

S^{K_{0}} = (U, A^{K_{0}} \cup d)

is global consistent, by Definition 10, we see that

S = (U, A \cup d)

is global consistent;

(2) For the coarsest scale combination

K = (4, 3, 2)

, similarly, by calculation,

s i g (4, 3, 2) = 8

;

(3) For each scale combination that belongs to (4,3,2) for the second layer, similarly, by calculation,

s i g (3, 3, 2) = 8;

s i g (4, 2, 2) = 2;

s i g (4, 3, 1) = 4

, therefore, the scale combination

K = (4, 2, 2)

is the most important. We continue to search for the optimal scale according to this branch;

(4) For each scale combination that belongs to (4,2,2) for the third layer, by calculation,

s i g (3, 2, 2) = 2;

s i g (4, 1, 2) = 2

,

s i g (4, 2, 1) = 0

therefore the decision information system

S^{(4, 2, 2)}

is global consistent and the scale combination

(4, 2, 1)

is the global optimal scale combination.

This means that the evaluation of the eight students in Table 3 can be fully decision-making and (4, 2, 2) is an optimal scale combination for maintaining decision-making.

4.2. Local Optimal Scale

Definition 12.

Let

S = (U, A \cup d)

be a consistent multi-scale decision system. and ∑ is the collection of all scale combinations, For

x \in U

and the scale combination

K \in \sum

, if

{[x]}_{A^{K}} \subseteq {[x]}_{d}

, but for any

H \in \sum, K < H, {[x]}_{A^{H}} \subseteq {[x]}_{d}

does not hold, we call K is a local optimal scale combination of S for x.

Theorem 8.

Let

S = (U, A \cup d)

be a consistent multi-scale decision system, and ∑ the collection of all scale combinations. Given a scale combination

K \in \sum

, the following conclusion holds:

{[x]}_{A^{K}} \subseteq {[x]}_{d} \Leftrightarrow f ({[x]}_{A^{K}}) ⩽ f ({[x]}_{d}) .

Definition 13.

Let

S = (U, A \cup d)

be a multi-scale decision information system, and ∑ is the collection of all scale combinations,

K \in \sum

. then the significance of scale combination K for

x_{i}

is defined as

s i g (x_{i}, K) = ∥ f ({[x_{i}]}_{A^{K}}) \land (\sim f ({[x_{i}]}_{d})) ∥

where

∥ f ({[x_{i}]}_{A^{K}}) \land (\sim f ({[x_{i}]}_{d})) ∥

represents the the numbers of 1 in the boolean vector

f ({[x_{i}]}_{A^{K}}) \land (\sim f ({[x_{i}]}_{d}))

.

If

s i g (x_{i}, K) = 0

, then

f ({[x]}_{A^{K}}) \subseteq f ({[x]}_{d})

, then the decision information system

S^{K} = (U, A^{K} \cup d)

is locally consistent for

x_{i}

. Based on this idea, a local optimal scale combination selection algorithm of a multi-scale decision system can be described as Algorithm 2.

The computation process of Algorithm 2 is similar to Algorithm 1, except the value of

s i g (K)

is the value of

s i g (x, K)

relevant to the given object x.

Example 5.

Look for the local optimal scale combination for

x_{1}

in Table 3.

We apply scale combination relation diagram 1 to illustrate the calculation steps of optimal local scale selection for

x_{1}

, by Definition 13,

s i g (x_{i}, K) = ∥ f ({[x_{i}]}_{A^{K}}) \land (\sim f ({[x_{i}]}_{d})) ∥ .

(1) For the smallest scale combination,

K_{0} = (1, 1, 1),

by calculation,

s i g (x_{1}, K) = 0

, the system

S^{K_{0}}

is local consistent for

x_{1}

;

(2) For the coarsest scale combination

K = (4, 3, 2),

by calculation,

s i g (x_{1}, K) = 1

, Hence,

S^{K} = (U, A^{K} \cup d)

is local inconsistent for

x_{1}

;

(3) For each scale combination that belongs to (4,3,2) for the second layer, by calculation,

s i g (x_{1}, 3, 3, 2) = 1;

s i g (x_{1}, 4, 2, 2) = 0;

s i g (x_{1}, 4, 3, 1) = 1

, Hence the scale combination

(4, 2, 2)

is the local optimal scale combination for

x_{1}

.

Algorithm 2: Selecting the local optimal scale combination of a multi-scale decision system

Input: A consistent multi-scale decision systems

S = (U, A \cup d)

and an object

x \in U

.
Output: A local optimal scale combination for x.
Calculates

f ({[x]}_{d})

;
Let Queue=NULL
Let

o p t K = (I_{1}, I_{2}, \dots, I_{m})

//set the coarsest scale combination initially
Let

o p t S i g

= Integer. MAX

\underset{̲}{}

VALUE //set maximum value to

o p t S i g

initially
Queue. put(

o p t K

)
While (true)
While (not Queue.empty)
Let

K =

Queue.get() //pick a scale combination K from Queue
Let

s i g (K) = ∥ f ({[x]}_{A^{K}}) \land (\sim f ({[x]}_{d}) ∥

//calculate

s i g (x, K)

If (

s i g (K) ⩽

o p t S i g

) //choose the minimum

s i g (x, K)

and its K
Let

o p t S i g

=

s i g (K)

Let

o p t K = K

EndIf
If (

o p t S i g

==0) //if the optimal scale combination is found, return.
return

o p t K

         EndIf
    EndWhile              //loop until Queue. empty
    Queue.clear()       //empty the Queue, and construct next finer scale combination
    Let

(l_{1}, l_{2}, \dots, l_{m}) = o p t K

For each

j \in (1, 2, \dots, m)

//set finer scale to each attribute, respectively
If

(l_{j} > 1)

Let

K = (l_{1}, l_{2}, \dots, l_{j - 1}, l_{j} - 1, l_{j + 1}, \dots, l_{m})

              Queue. put(K)
         EndIf
    EndFor                                    //next to search finer scale combinations
EndWhile

5. Local Optimal Scale Selection for Inconsistent Generalized Decision Information Systems

Let

S = (U, A ⋃ d)

be a decision information system which defined by Definition 3, where

U = {x_{1}, x_{2}, \dots, x_{n}}

is the universe,

A = {a_{1}, a_{2}, \dots, a_{m}}

is the set of conditional attributes,

d \notin A

is the decision attribute. If the decision information system S is locally consistent for

x \in U

, then

d ({[x]}_{A})

is unique. If the decision information system is globally consistent,

d ({[x]}_{A})

is unique for any

x \in U

. For an inconsistent decision information system,

\exists x \in U

, such that

d ({[x]}_{A})

is not unique. For this reason, Wu and Leung put forward a definition of generalized decision in their paper [21].

Definition 14

([21]). Let

S = (U, A ⋃ d)

be a decision information system. For

B \subseteq A

, the generalized decisions of object

x \in U

is defined as

\partial_{B} (x) = d ({[x]}_{B}) = {d (y) | y \in {[x]}_{B}} .

Definition 15.

let

A = {(a_{i j})}_{m \times n}

,

B = {(b_{i j})}_{m \times n}

are

m \times n

matrices. Define

A • B = {(a_{i j} \times b_{i j})}_{m \times n} .

The following theorem is easily obtained from Definitions 14 and 15.

Theorem 9.

Let

S = (U, A ⋃ d)

be a decision information system, Let

d_{j} = d (x_{j}), x_{j} \in U

,

K_{A} (i) = f ({[x_{i}]}_{A}) • (d_{1}, d_{2}, \dots, d_{n})

,

H_{A} (i)

represents the set of all component elements in vector

K_{A} (i)

, for convenience, we denote

H_{A} (i) = {d | d \in K_{A} (i)}

. If

d_{j} \neq 0, (j = 1, 2, \dots, n)

, then

\partial_{A} (x_{i}) = H_{A} (i) .

It should be pointed out that if there exist a

x \in U

and

d (x) = 0

, the above theorem is not suitable. For this case, we may assign another value to the attribute d for x, which does not affect the classification and decision making.

Example 6.

Find the generalized decision of each object in the following inconsistent decision information system Table 4.

By calculation

d = (d (x_{1}), d (x_{2}), \dots, d (x_{n})) = (4, 2, 1, 3, 2, 2)

,

f ({[x_{1}]}_{A}) = (1, 1, 0, 0, 0, 0)

. According to Theorem 9, we have

K_{A} (1) = f ({[x_{1}]}_{A}) • d = (4, 2, 0, 0, 0, 0)

, therefore

\partial_{A} (x_{1}) = H_{A} (1) = {2, 4}

.

\partial_{A} (x_{1}) = {2, 4}, \partial_{A} (x_{2}) = {2, 4}, \partial_{A} (x_{3}) = {1, 2}, \partial_{A} (x_{4}) = {3}, \partial_{A} (x_{5}) = {2}, \partial_{A} (x_{6}) = {1, 2}

can be obtained by the same method.

Definition 16.

Let

S = (U, A \cup d)

be a multi-scale inconsistent decision information system, and ∑ is the collection of all scale combinations, the decision information systems corresponding to the scale combination K is

S^{K} = (U, A^{K} \cup d)

. If

\partial_{A^{K}} (x) = \partial_{A^{K_{0}}} (x)

holds, we call

S^{K}

to be local generalized consistent for x. If

S^{K}

is local generalized consistent for x, but for any

L = (l_{1}, l_{2}, \dots, l_{m})

with

K < L

,

S^{L} = (U, A^{L} \cup d)

is local generalized inconsistent for x, we call K the optimal local generalized consistence for x.

Theorem 10.

S^{K}

is a local generalized consistence for

x_{i}

if and only if

H_{A^{K}} (i) = H_{A^{K_{0}}} (i)

holds.

Proof.

By Theorem 9,

S^{K}

is local consistent for

x_{i} \Leftrightarrow \partial_{A^{K}} (x_{i}) = \partial_{A_{0}^{K}} (x_{i}) \Leftrightarrow H_{A^{K}} (i) = H_{A^{K_{0}}} (i) .

Theorem 11.

Let

S = (U, A \cup d)

be an inconsistent multi-scale decision system, and ∑ is the collection of all scale combinations,

K \in \sum

. Then K is the optimal scale combination for

x_{i}

if and only if

H_{A^{K}} (i) = H_{A^{K_{0}}} (i)

holds. Additionally, for any

K < L, H_{A^{L}} (i) \neq H_{A^{K_{0}}} (i) .

Definition 17.

Let

S = (U, A \cup d)

be an inconsistent multi-scale decision system, and ∑ is the collection of all scale combinations,

K = (l_{1}, \dots, l_{j}, \dots, l_{m}) \in \sum

. Then the significance of scale combination K for

x_{i}

is defined as

S I G (x_{i}, K) = ∥ H_{A^{K}} (i) - H_{A^{K_{0}}} (i) ∥

where

∥ H_{A^{K}} (i) - H_{A^{K_{0}}} (i) ∥

represents the cardinality of the set

H_{A^{K}} (i) - H_{A^{K_{0}}} (i)

.

If

S I G (x_{i}, K) = 0

, then

H_{A^{K}} (i) - H_{A^{K_{0}}} (i)

. Therefore, the decision information system

S = (U, A \cup d)

is locally generalized consistent for

x_{i}

. The computation process of Algorithm 3 is similar to Algorithm 2.

Algorithm 3: Selecting the global optimal scale combination of a multi-scale decision system

Input: A inconsistent multi-scale decision system

S = (U, A \cup d)

and an object

x \in U

.
Output: A local generalized optimal scale combination for x.
Let

K_{0} = (1, 1, \dots, 1)

; //set the finest scale combination
Calculates

H_{A^{K_{0}}} (x)

; //

H_{A^{K_{0}}} (x)

correspondents to

H_{A} (i)

with

K_{0}

and

x_{i}

in Theorem 9
Let Queue=NULL
Let

o p t K = (I_{1}, I_{2}, \dots, I_{m})

//set the coarsest scale combination initially
Let

o p t S i g

= Integer. MAX

\underset{̲}{}

VALUE //set maximum value to

o p t S i g

initially
Queue. put(

o p t K

)
While (true)
While (not Queue.empty)
Let

K =

Queue. get() //pick a scale combination K from Queue
Calculates

H_{A^{K}} (x)

; //

H_{A^{K}} (x)

correspondents to

H_{A} (i)

with K and

x_{i}

in Theorem 9
Let

s i g (x, K) = ∥ H_{A^{K}} (x) - H_{A^{K_{0}}} (x) ∥

//calculate

s i g (x, K)

If (

s i g (x, K) ⩽

o p t S i g

) //choose the minimum

s i g (x, K)

and its K
Let

o p t S i g

=

s i g (x, K)

Let

o p t K = K

EndIf
If (

o p t S i g

==0) //if the optimal scale combination is found, return.
return

o p t K

         EndIf
    EndWhile              //loop until Queue. empty
    Queue.clear()       //empty the Queue, and construct next finer scale combination
    Let

(l_{1}, l_{2}, \dots, l_{m}) = o p t K

For each

j \in 1, 2, \dots, m

//set finer scale to each attribute, respectively
If

(l_{j} > 1)

Let

K = (l_{1}, l_{2}, \dots, l_{j - 1}, l_{j} - 1, l_{j + 1}, \dots, l_{m})

              Queue. put(K)
         EndIf
    EndFor              //next to search finer scale combination
EndWhile

Example 7.

Search for the generalized local optimal scale combination of the following system Table 5 for

x_{i} .

(1)

d = (d (x_{1}), d (x_{2}), d (x_{3}), d (x_{4}), d (x_{5}), d (x_{6})) = (1, 2, 4, 4, 4, 3)

, for scale combination

K_{0} = (1, 1, 1), f ({[x_{1}]}_{A^{K_{0}}}) \land d = (1, 2, 0, 0, 0, 0),

H_{A^{K_{0}}} (1) = (1, 2)

,

(2) For the coarsest scale

K = (3, 3, 2),

f ({[x_{1}]}_{A^{K}}) • d = (1, 2, 4, 0, 0, 3),

H_{A^{K}} (1) = (1, 2, 3, 4)

and

S I G (x_{1}, K) = 2

therefore,

K = (3, 3, 2)

is not the generalized local optimal scale for

x_{1};

(3) For each scale combination that belongs to (3,3,2) for the second layer, by calculation,

S I G (x_{1}, 2, 3, 2) = 1;

S I G (x_{1}, 3, 2, 2) = 1;

S I G (x_{1}, 3, 3, 1) = 0,

and hence the

(3, 3, 1)

is the generalized local optimal scale combination for

x_{1}

.

6. Conclusions

This paper introduces relational matrices and matrix calculations into multi-scale decision information systems. Some properties of multi-scale decision information systems are discussed based on matrix methods. Using the relation matrix to introduce the significance of scale combination, the method of optimal scale combination selection is studied, and the related algorithm is designed. The effectiveness of the method is illustrated by experiments. In the future, we will use the matrix method to study the classification and decision-making methods of multi-scale information systems.

7. Experiments and Analysis

In order to verify whether the Algorithms 1 and 2 are able to practically apply to choose the optimal combined-scale in multi-scale decision information system, some University of California Irvine (UCI) datasets are applied in the experiments. Table 6 shows the description of these datasets.

These data sets in Table 6 are single-scale decision datasets, and to extend these datasets from single-scale to multi-scale, some methods in [27] are adopted, which are described below.

Firstly, the finest level of scale of each attribute, that is,

a^{1} (x)

is the value of

x \in U

at attribute a in the original dataset.

Secondly, for each attribute a, the standard deviation and minimum are denoted as

s t d (a)

and

m i n (a)

, then the second level of scale of

x \in U

at the attribute a in multi-scale dataset is denoted:

a^{2} (x) = ⌊\frac{a (x) - m i n (a)}{s t d (a)}⌋,

where

⌊y⌋

means the largest integer v satisfied

v \leq y

.

Thirdly, based on the previous level of scale, the next scale of attribute a can be made by merging some equivalence classes in previous level of scale. For example, suppose the equivalence classes of previous level of scale denoted

S^{k} = {a^{k} (x) | x \in U} = {{5}, {6}, {8}, {12}, {20}}

, then in the next level of scale, the class

{5}

and class

{6}

are merged into

{5, 6}

, which denoted by

{6}

. That is, if

a^{k} (x)

is belong to

{5}

or

{6}

, then

a^{k + 1} (x)

is

{6}

, and the consequence equivalence classes is

S^{k + 1} = {{6}, {8}, {12}, {20}}

.

Table 7 shows the result of experiments using multi-scale optimal combined dataset according to the Algorithm 1. In Table 7 it presents the Support Vector Machine (SVM) classification accuracy both under the raw datasets and the refined datasets of optimal combined multi-scale, and it demonstrates the Algorithm 1 works well.

The Table 7 also presents the rates of optimal, that is the percent of the attributes with level of scale ≥2 in the given refined dataset. For example, there are seven attributes in Auto_MPG raw dataset, the optimal combined scale of the refined dataset outcomes in the experiment is

[1, 1, 1, 5, 5, 3, 1]

, which indicates three of seven attributes are optimal, and the rate of optimal is

\frac{3}{7} = 42.9 %

. Similarly, the average of level of scale (LoS) in the refined Auto_MPG dataset is

\frac{17}{7} = 2.43

, and relatively the LoS of raw Auto_MPG dataset is 1.

To compare the local multi-scale optimal method of Algorithm 2 with the global multi-scale optimal method of Algorithm 1, each datasets randomly select ≤5% instances of all and the final optimal combined-scale is the intersection of every optimal combined-scales obtained by the every single instance. For example, two instances are randomly selected in Auto_MPG, and their optimal combined-scales are [1,3,4,5,5,7,4] and [3,2,4,5,5,6,1] respectively, thus their final intersected combined-scale is [1,3,4,5,5,7,4]∧[3,2,4,5,5,6,1]=[1,2,4,5,5,6,1].

Table 8 shows the SVM accuracy, Rate of optimal, Average of LoS and time cost in Algorithm 2 relatively compare to Algorithm 2. It demonstrates that, the same as optimal performances of global multi-scale optimal algorithm can be achieved by choosing a small amount of instances in local multi-scale optimal algorithm.

Author Contributions

All authors contributed equally to this work.

Funding

This work was supported in part by the National Natural Science Foundation of China under Grant (No:11701258,11871259), Program for Innovative Research Team in Science and Technology in Fujian Province University, and Quanzhou High-Level Talents Support Plan under Grant 2017ZT012.

Conflicts of Interest

The authors declare no conflict of interest.

References

Zadeh, L.A. Fuzzy sets and information granularity. Adv. Fuzzy Set Theory Appl. 1979, 11, 3–18. [Google Scholar]
Pedrycz, W. Granular Computing: An Introduction; Physica-Verlag: Heidelberg, Germany, 2000; pp. 309–328. [Google Scholar] [CrossRef]
Yao, Y.Y.; Miao, D.Q.; Xu, F.F. Granular Structures and Approximations in Rough Sets and Knowledge Spaces. In Rough Set Theory: A True Landmark in Data Analysis; Springer: Berlin/Heidelberg, Germany, 2009; pp. 71–84. [Google Scholar] [CrossRef]
Li, J.; Ren, Y.; Mei, C.; Qian, Y.; Yang, X. A comparative study of multigranulation rough sets and concept lattices via rule acquisition. Knowl.-Based Syst. 2016, 91, 152–164. [Google Scholar] [CrossRef]
Pal, S.K.; Ray, S.S.; Ganivada, A. Introduction to Granular Computing, Pattern Recognition and Data Mining. In Granular Neural Networks, Pattern Recognition and Bioinformatics; Springer International Publishing: Cham, Switzerland, 2017; pp. 1–37. [Google Scholar] [CrossRef]
Han, S.E. Roughness measures of locally finite covering rough sets. Int. J. Approx. Reason. 2019, 105, 368–385. [Google Scholar] [CrossRef]
Liu, H.; Cocea, M. Granular computing-based approach for classification towards reduction of bias in ensemble learning. Granul. Comput. 2017, 2, 131–139. [Google Scholar] [CrossRef]
Wang, G.; Yang, J.; Xu, J. Granular computing: From granularity optimization to multi-granularity joint problem solving. Granul. Comput. 2017, 2, 105–120. [Google Scholar] [CrossRef]
Sun, B.Z.; Ma, W.M.; Qian, Y.H. Multigranulation fuzzy rough set over two universes and its application to decision making. Knowl.-Based Syst. 2017, 123, 61–74. [Google Scholar] [CrossRef]
Yao, Y. Three-way decision and granular computing. Int. J. Approx. Reason. 2018, 103, 107–123. [Google Scholar] [CrossRef]
Mo, J.; Huang, H.L. (T, S)-Based Single-Valued Neutrosophic Number Equivalence Matrix and Clustering Method. Mathematics 2019, 7, 36. [Google Scholar] [CrossRef]
Petrosino, A.; Ferone, A. Feature Discovery through Hierarchies of Rough Fuzzy Sets. In Granular Computing and Intelligent Systems: Design with Information Granules of Higher Order and Higher Type; Pedrycz, W., Chen, S.M., Eds.; Springer: Berlin/Heidelberg, Germany, 2011; pp. 57–73. [Google Scholar] [CrossRef]
Ferone, A. Feature selection based on composition of rough sets induced by feature granulation. Int. J. Approx. Reason. 2018, 101, 276–292. [Google Scholar] [CrossRef]
Pawlak, Z. Imprecise Categories, Approximations and Rough Sets. In Rough Sets: Theoretical Aspects of Reasoning about Data; Springer: Dordrecht, The Netherlands, 1991; pp. 9–32. [Google Scholar] [CrossRef]
Zhu, P.F.; Hu, Q.H.; Zuo, W.M.; Yang, M. Multi-granularity distance metric learning via neighborhood granule margin maximization. Inf. Sci. 2014, 282, 321–331. [Google Scholar] [CrossRef]
Zhu, P.F.; Hu, Q.H. Adaptive neighborhood granularity selection and combination based on margin distribution optimization. Inf. Sci. 2013, 249, 1–12. [Google Scholar] [CrossRef]
Tan, A.H.; Wu, W.Z.; Li, J.J.; Lin, G.P. Evidence-theory-based numerical characterization of multigranulation rough sets in incomplete information systems. Fuzzy Sets Syst. 2016, 294, 18–35. [Google Scholar] [CrossRef]
Yao, Y.; She, Y. Rough set models in multigranulation spaces. Inf. Sci. 2016, 327, 40–56. [Google Scholar] [CrossRef]
Lin, G.P.; Liang, J.Y.; Qian, Y.H. An information fusion approach by combining multigranulation rough sets and evidence theory. Inf. Sci. 2015, 314, 184–199. [Google Scholar] [CrossRef]
Yang, X.B.; Song, X.N.; Chen, Z.H.; Yang, J.Y. On multigranulation rough sets in incomplete information system. Int. J. Mach. Learn. Cybern. 2012, 3, 223–232. [Google Scholar] [CrossRef]
Wu, W.Z.; Leung, Y. Theory and applications of granular labelled partitions in multi-scale decision tables. Inf. Sci. 2011, 181, 3878–3897. [Google Scholar] [CrossRef]
Wu, W.Z.; Leung, Y. Optimal scale selection for multi-scale decision tables. Int. J. Approx. Reason. 2013, 54, 1107–1129. [Google Scholar] [CrossRef]
Gu, S.M.; Wu, W.Z. On knowledge acquisition in multi-scale decision systems. Int. J. Mach. Learn. Cybern. 2013, 4, 477–486. [Google Scholar] [CrossRef]
Wu, W.Z.; Qian, Y.; Li, T.J.; Gu, S.M. On rule acquisition in incomplete multi-scale decision tables. Inf. Sci. 2017, 378, 282–302. [Google Scholar] [CrossRef]
She, Y.H.; Li, J.H.; Yang, H.L. A local approach to rule induction in multi-scale decision tables. Knowl.-Based Syst. 2015, 89, 398–410. [Google Scholar] [CrossRef]
Li, F.; Hu, B.Q.; Wang, J. Stepwise optimal scale selection for multi-scale decision tables via attribute significance. Knowl.-Based Syst. 2017, 129, 4–16. [Google Scholar] [CrossRef]
Li, F.; Hu, B.Q. A new approach of optimal scale selection to multi-scale decision tables. Inf. Sci. 2017, 381, 193–208. [Google Scholar] [CrossRef]
Shafer, G. A Mathematical Theory of Evidence; Princeton University Press: Princeton, NJ, USA, 1976. [Google Scholar]
Guan, J.W.; Bell, D.A.; Guan, Z. Matrix computation for information systems. Inf. Sci. 2001, 131, 129–156. [Google Scholar] [CrossRef]
Liu, G.L. Rough Sets over the Boolean Algebras. In Rough Sets, Fuzzy Sets, Data Mining, and Granular Computing; Ślęzak, D., Wang, G., Szczuka, M., Düntsch, I., Yao, Y., Eds.; Springer: Berlin/Heidelberg, Germany, 2005; pp. 124–131. [Google Scholar]
Wang, L.; Li, T.R. Matrix-Based Computational Method for Upper and Lower Approximations of Rough Sets. Pattern Recognit. Artif. Intell. 2011, 24, 756–762. [Google Scholar] [CrossRef]
Huang, Y.Y.; Li, T.R.; Luo, C.; Horng, S.J. Matrix-Based Rough Set Approach for Dynamic Probabilistic Set-Valued Information Systems. In Rough Sets; Springer International Publishing: Cham, Switzerland, 2016; pp. 197–206. [Google Scholar]
Tan, A.H.; Li, J.J.; Lin, Y.J.; Lin, G.P. Matrix-based set approximations and reductions in covering decision information systems. Int. J. Approx. Reason. 2015, 59, 68–80. [Google Scholar] [CrossRef]
Tan, A.H.; Li, J.J.; Lin, G.P.; Lin, Y.J. Fast approach to knowledge acquisition in covering information systems using matrix operations. Knowl.-Based Syst. 2015, 79, 90–98. [Google Scholar] [CrossRef]
Luo, C.; Li, T.R.; Yi, Z.; Fujita, H. Matrix approach to decision-theoretic rough sets for evolving data. Knowl.-Based Syst. 2016, 99, 123–134. [Google Scholar] [CrossRef]
Hu, C.X.; Liu, S.X.; Liu, G.X. Matrix-based approaches for dynamic updating approximations in multigranulation rough sets. Knowl.-Based Syst. 2017, 122, 51–63. [Google Scholar] [CrossRef]
Karczmarek, P.; Kiersztyn, A.; Pedrycz, W. An Application of Graphic Tools and Analytic Hierarchy Process to the Description of Biometric Features. In Artificial Intelligence and Soft Computing; Rutkowski, L., Scherer, R., Korytkowski, M., Pedrycz, W., Tadeusiewicz, R., Zurada, J.M., Eds.; Springer International Publishing: Cham, Switzerland, 2018; pp. 137–147. [Google Scholar]
Xu, Y.H.; Wu, W.Z.; Tan, A.H. Optimal Scale Selections in Consistent Generalized Multi-scale Decision Tables. Rough Sets 2017, 10313, 185–198. [Google Scholar] [CrossRef]
Gu, S.; Gu, J.; Wu, W.; Li, T.; Chen, C. Local optimal granularity selections in incomplete multi-granular decision systems. J. Comput. Res. Dev. 2017, 54, 1500–1509. [Google Scholar] [CrossRef]
Wu, W.Z.; Chen, C.J.; Li, T.J.; Xu, Y.H. Comparative Study on Optimal Granularities in Inconsistent Multi-granular Labeled Decision Systems. Pattern Recognit. Artif. Intell. 2016, 29, 1095–1103. [Google Scholar] [CrossRef]
Wu, W.Z.; Yang, L.; Tan, A.H.; Xu, Y.H. Granularity Selections in Generalized Incomplete Multi-Granular Labeled Decision Systems. J. Comput. Res. Dev. 2018, 55, 1263–1272. [Google Scholar] [CrossRef]

Figure 1. Partial ordered lattices of scale combinations.

Table 1. A decision syetem.

U	$a_{1}$	$a_{2}$	$a_{3}$	$a_{4}$	d
$x_{1}$	2	1	1	1	1
$x_{2}$	3	2	2	2	2
$x_{3}$	2	1	1	1	1
$x_{4}$	2	2	1	1	3
$x_{5}$	1	1	4	2	5
$x_{6}$	1	1	3	2	1
$x_{7}$	3	2	2	2	2

Table 2. Evaluation criteria.

$Score$	$Evaluation Criteria 1$	$Evaluation Criteria 2$	$Evaluation Criteria 3$	$Evaluation Criteria 4$
$(90, 100]$	1	$e x c e l l e n t$	$s u p e r$	Y
$[80, 89)$	2	$g o o d$	$s u p e r$	Y
$[70, 79)$	3	$f a i r$	$m i d d l e$	Y
$[60, 69)$	4	$p a s s$	$m i d d l e$	Y
$[50, 59)$	5	$b a d$	$l o w$	N
$[0, 49)$	6	$b a d$	$l o w$	N

Table 3. A generalized multi-scale information system.

U	$Subject 1$ $a_{1}^{1}$	$a_{1}^{2}$	$a_{1}^{3}$	$a_{1}^{4}$	$Subject 2$	$a_{2}^{1}$	$a_{2}^{2}$	$a_{2}^{3}$	$Subject 3$	$a_{3}^{1}$	$a_{3}^{2}$	d
$x_{1}$	78	3	F	M	Y	94	1	E	Y	75	3	Y	1
$x_{2}$	61	4	P	M	Y	62	4	P	Y	60	4	Y	0
$x_{3}$	69	4	P	M	Y	78	3	F	Y	68	4	Y	1
$x_{4}$	98	1	E	S	Y	58	5	B	N	97	1	Y	1
$x_{5}$	36	6	B	L	N	72	3	F	Y	92	1	Y	0
$x_{6}$	63	4	P	M	Y	76	3	F	Y	47	5	N	0
$x_{7}$	58	5	B	L	N	95	1	E	Y	92	1	Y	1
$x_{8}$	35	6	B	L	N	90	1	E	Y	73	3	Y	0

Table 4. An inconsistent decision system.

U	$a_{1}$	$a_{2}$	$a_{3}$	d
$x_{1}$	0	E	S	4
$x_{2}$	0	E	S	2
$x_{3}$	1	F	L	1
$x_{4}$	2	B	M	3
$x_{5}$	0	F	S	2
$x_{6}$	1	F	L	2

Table 5. A generalized inconsistent multi-scale information system.

U	$a_{1}^{1}$	$a_{1}^{2}$	$a_{1}^{3}$	$a_{2}^{1}$	$a_{2}^{2}$	$a_{2}^{3}$	$a_{3}^{1}$	$a_{3}^{2}$	d
$x_{1}$	G	S	Y	2	S	Y	1	S	1
$x_{2}$	G	S	Y	2	S	Y	1	S	2
$x_{3}$	F	M	Y	4	M	Y	2	S	4
$x_{4}$	B	L	N	5	M	Y	8	L	4
$x_{5}$	B	L	N	7	L	N	6	M	4
$x_{6}$	E	S	Y	3	S	Y	2	S	3

Table 6. The UCI dataset description.

Datasets	Instances	Attributes	Classes
Auto_MPG	392	7	3
CarEvaluation	1728	6	3
Wine	178	13	3
German	1000	20	2
WDBC	569	30	2
Ionosphere	351	34	2

Table 7. Experiments results of Algorithm 1.

Datasets	Orignal-Dataset (SVM)	Optimal MSC-Dataset (SVM)	Rates of Optimal	Average of LoS
Auto_MPG	0.6611	0.6949	42.9%	2.43
CarEvaluation	0.9267	0.9499	50.0%	1.50
Wine	0.3518	0.8703	92.3%	5.15
German	0.7271	0.7233	80.0%	2.65
WDBC	0.6141	0.9592	93.3%	4.43
Ionosphere	0.9152	0.9339	91.2%	3.76

Table 8. The experiments results of Algorithm 2 compares to Algorithm 1.

Datasets	Percent of	Relative Accuracy	Relative Rate	Relative Average	Relative Time
	Instances	(SVM)	of Optimal	of LoS	Cost
Auto_MPG	0.5%	108.5%	200.0%	158.8%	40.2%
CarEvaluation	0.8%	98.2%	133.3%	111.1%	203.1%
Wine	2.2%	104.3%	108.3%	100.0%	78.3%
German	1.0%	95.4%	125.0%	128.3%	14.8%
WDBC	0.4%	87.8%	103.6%	151.1%	70.6%
Ionosphere	4.8%	99.0%	100.0%	97.7%	60.9%

© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Chen, Y.S.; Li, J.J.; Huang, J.X. Matrix Method for the Optimal Scale Selection of Multi-Scale Information Decision Systems. Mathematics 2019, 7, 290. https://doi.org/10.3390/math7030290

AMA Style

Chen YS, Li JJ, Huang JX. Matrix Method for the Optimal Scale Selection of Multi-Scale Information Decision Systems. Mathematics. 2019; 7(3):290. https://doi.org/10.3390/math7030290

Chicago/Turabian Style

Chen, Ying Sheng, Jin Jin Li, and Jian Xin Huang. 2019. "Matrix Method for the Optimal Scale Selection of Multi-Scale Information Decision Systems" Mathematics 7, no. 3: 290. https://doi.org/10.3390/math7030290

APA Style

Chen, Y. S., Li, J. J., & Huang, J. X. (2019). Matrix Method for the Optimal Scale Selection of Multi-Scale Information Decision Systems. Mathematics, 7(3), 290. https://doi.org/10.3390/math7030290

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Matrix Method for the Optimal Scale Selection of Multi-Scale Information Decision Systems

Abstract

1. Introduction

2. Preliminary Background

2.1. Rough Set and Information Systems

2.2. Boolean Matrix Characterization of Decision Information Systems

3. Relational Matrix in Generalized Multi-Scale Information Systems

4. Optimal Scale Selection for Consistent Multi-Scale Decision Information Systems

4.1. Global Optimal Scale

4.2. Local Optimal Scale

5. Local Optimal Scale Selection for Inconsistent Generalized Decision Information Systems

6. Conclusions

7. Experiments and Analysis

Author Contributions

Funding

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI