Matrix Method for the Optimal Scale Selection of Multi-Scale Information Decision Systems

In multi-scale information systems, the information is often characterized at multi scales and multi levels. To facilitate the computational process of multi-scale information systems, we employ the matrix method to represent the multi-scale information systems and to select the optimal scale combination of multi-scale decision information systems in this study. To this end, we first describe some important concepts and properties of information systems using some relational matrices. The relational matrix is then introduced into multi-scale information systems, and used to describe some main concepts in systems, including the lower and upper approximate sets and the consistence of systems. Furthermore, from the view of the relation matrix, the scale significance is defined to describe the global optimal scale and the local optimal scale of multi-scale information systems. Finally, the relational matrix is used to compute the scale significance and to construct the optimal scale selection algorithms. The efficiency of these algorithms is examined by several practical examples and experiments.


Introduction
Granular computing [1,2] originated from fuzzy information granulation is a mathematical method for knowledge representation and data mining.The purpose is to solve complex problems by dividing the massive information into relatively simple blocks according to its respective characteristics and performance.Since the concept of granular computing was put forward, it has become a hot research topic and has been widely used in many practical applications [3][4][5][6][7][8][9][10][11][12][13].
The theory of rough set plays an important role in the promotion and development of granular computing [14][15][16][17][18][19][20].Pawlak [14] used an information system to study granular computing.If a decision is required, it is usually an information system with decision attributes.
People can process a lot of information at different levels and scales.Based on this view point, multi-scale information systems have been developed and widely studied [21][22][23][24][25][26][27].Ferone et al. used feature granulation to study feature selection [12,13].In papers [21,22], Wu and Leung et al. proposed a multi-scale decision information system model.In their model, the objects are granulated with granularity methods from fine to coarse.The optimal scale selection of multi-scale decision tables was also investigated.Gu and Wu [23] presented a formal method of knowledge acquisition measured at different granularity.Wu and Qian [24] measured the uncertainty in incomplete multi-scale information tables with the Dempster-Shafer evidence theory [28].Shen et al. [25] employed a local method to induce decision rules in multi-scale decision tables.Li and Hu [26] introduced a step-wise method for the optimal scale selection of multi-scale decision tables.
Due to the massive information provided in multi-scale information systems, which leads to much time consuming in the computation of the concepts in the systems, this article aims to employ the Boolean matrix and matrix computation in order to facilitate the knowledge description and the optimal scale selection in multi-scale decision tables.Indeed, the matrix method has had a computation advantage in many backgrounds [29][30][31][32][33][34][35][36][37], such as in classical rough set and Information systems [29][30][31], static and dynamic information systems [32,35,36], covering information systems [33,34], and so on.For multi-scale information systems, it is worthy to use the relational matrix to obtain the optimal scale selection and develop some matrix methods to facilitate the computational process.
The paper is structured as follows.We first introduce a relational matrix to represent the relative concepts in information systems.We then examine some properties of information systems based on the Boolean matrix.In Section 3, we introduce multi-scale information systems and the Boolean matrix representation.In Section 4, we use the relational matrix to define the scale significance, and to investigate the optimal scale selection of multi-scale information systems.In Section 5, we seek the optimal scale of multi-scale information systems by using the scale significance based on the relation matrix.The paper is finally concluded with a summary.In Section 7, the effectiveness of the matrix method is illustrated by experiments.

Rough Set and Information Systems
In this section we introduce some basic concepts of rough set and information systems.

Definition 1 ([1]
).The Pawlak approximate space is defined as (U, R), where U = {x 1 , x 2 , • • • , x n } is a non-empty finite set called the universe, and R is the equivalent relation on U, the lower and the upper approximation sets of X ⊆ U are defined as R(X), R(X), respectively: where[x] R = {y ∈ U|(x, y) ∈ R} is the equivalent class containing x.

Definition 2 ([14]
).Let S = (U, A) be an information system, where U = {x 1 , x 2 , • • • , x n } is a non-empty finite set called the universe, A = {a 1 , a 2 , • • • , a m } is a non-empty finite set of attributes.For any a ∈ A, there is a surjective map a : U → V a ie.a(x) ∈ V a , x ∈ U, where V a = {a(x)|x ∈ U} is called the domain of attribute a.For any B ⊆ A, the equivalent relation R B On U is determined as The lower approximation set and the upper approximate set of X ⊆ U corresponding to the attribute subset B are represented as R B (X) and R B (X), respectively,

Definition 3 ([14]
).Let S = (U, A d) be a decision information system, where S = (U, A) is an information system, d ∈ A is called the decision attribute, and d Similarly, the equivalent relation on U induced by R d is given by Let S = (U, A d) be a decision information system, we summarize the following two kinds of consistent:

Boolean Matrix Characterization of Decision Information Systems
Obviously, by Definitions 5 and 6, we have According to the symmetry of matrix M R and Definition 6, we have Theorem 2 ([29]).Let U be an universe, R 1 , R 2 are equivalent relations on U, then the following conclusions hold: Let S = (U, A d) be an decision information system, B ⊆ A, the relation matrix of the equivalent relation R B is denote as M B , the relation matrix corresponding to R d is denoted as M d , Theorem 3. Let S = (U, A d) be an decision information system, B ⊆ A, then M B = (m ij ) n×n , where and M d = (r ij ) n×n , where Proof.By Definition 2, (x i , x j ) ∈ R B ⇔ a(x i ) = a(x j ), ∀a ∈ B, by Definition 6, we have Theorem 4. Let S = (U, A d) be a decision information system, then the following conclusions are obtained: (1) S is locally consistent for x ∈ U if and only f ( (2) S is globally consistent if and only if M A M d is hold.
Example 2. Determining the globally consistent of the following system Table 1.

Relational Matrix in Generalized Multi-Scale Information Systems
The theoretical model of multi-scale decision information systems was first proposed by Wu and Leung [21].Some scholars have participated in this study [21][22][23][24][25][26][27][38][39][40][41], Li and Hu generalized this model in paper [27].Among many studies, optimal scale selection is an important subject.Wu, Li et al. [29,38,40] studied the optimal scale selection of multi-scale decision information systems.Gu et al. [39] studied the optimal scale selection of incomplete multi-scale decision information systems.Li and Hu [27] studied the optimal scale selection for generalized multi-scale decision information systems.Li et al. [26] gave an attribute significance of generalized multi-scale decision information systems to search for optimal scale selection.Decision information system models involve a large number of set operations.The matrix itself is a powerful tool.The use of matrices to study multi-scale information systems helps to improve algorithm efficiency and further develop multi-scale information table theory.In this part, we introduce relation matrix into multi-scale information systems to prepare for optimal scale selection.Definition 7 ([21,27]).A multi-scale information system is a tuple S = (U, A), where U = {x 1 , x 2 , • • • , x n } is a nonempty and finite set of objects called the universe, A = {a 1 , a 2 , • • • , a m } is a nonempty and finite set of attributes, and each a j has I j (j = 1, 2, • • • , n) scales.Then a multi-scale information system S = (U, A) can be represented as S = (U, where a k j : U −→ V k j is a surjective function and V k j is the domain of a j corresponding to the kth scale, and furthermore, for any j j is called the information granularity transformation function.
, the information system defined above is the multi-scale information system proposed by Wu-Leung in paper [21].

The equivalent relation determined by attribute
By the existence of the surjective g k,k+1 j , for any a j ∈ U, we have For a multi-scale information system S = (U, A), if the attribute a j ∈ U is restricted on their l j scale, we call K = (l 1 , l 1 , • • • , l m ) to be a scale combination of the system S = (U, A), and all the scale combinations of the system S = (U, A) are denoted as ∑ and ∑ forms a partial ordered lattice [27].

The information system corresponding to the
The equivalent relation induced by A K is denoted as R A K , while the relation matrix of , we called the scale combination to be K is finer than L, denoted as K L, and further if K L and there is at least one of i ∈ {1, 2, • • • , m} such that l i < h i , we called K is strictly finer than L, denoted as K < L.
Example 3. The following example gives a comprehensive evaluation of three courses for eight students, and scores are divided into four criteria, as shown in Table 2.We use attribute a 1 to evaluate the results of the first course, which divided into four levels, attribute a 2 for the second course, which divided into three levels, and attribute a 3 for the third course, which divided into two levels.Attribute d is called decision attribute, 1 is qualified and 0 is unqualified.According to the scores of each student, the following multi-scale decision-making information system is obtained, as shown in Table 3.In order to facilitate the expression below, we denote excellent by "E", and similarly "G" denotes good, "F" denotes fair, "P" denotes pass, "B" denotes bad, "S" denotes super, "M" denotes middle, "L" denotes low, "Y" denotes yes, "N" denotes no.Table 3.A generalized multi-scale information system.
There are 24 different scale combinations in multi-scale information system in Table 3.According to the scale relation in Definition 8, we can obtain a partial ordered lattices [27], as shown in Figure 1.We now discuss some properties of multi-scale information systems using the relation matrix.Let S = (U, A) be a multi-scale information system and ∑ is the set of all scale combinations for the system, K = (l 1 , l 2 , • • • , l m ) ∈ ∑, the decision information system corresponding to the scale combination Proof.This conclusion can be obtained from Theorem 3. Theorem 6.Let S = (U, A) be a multi-scale information system and ∑ is the set of all scale combinations for the system, K, L ∈ ∑, K < L, X ⊆ U. Then the following conclusions hold: (1) For any attribute a j ∈ A , according to Theorem 2, we have 4) and ( 5) can be directly obtained by Theorem 1;

Optimal Scale Selection for Consistent Multi-Scale Decision Information Systems
Definition 9 ([21,27]).Let S = (U, A ∪ d) be a multi-scale decision information system, where S = (U, A) is a multi-scale information system, ∑ is the collection of all scale combinations, d / ∈ A is a decision attribute and d : V → V d is a surjective map, where V d = {d(x)|x ∈ U} is called the domain of d.

Global Optimal Scale
The decision information system corresponding to the K = (l 1 , l 1 , • • • , l m ) scale combination is denoted as Obviously, The relation matrix corresponding to the equivalent relation

Definition 10 ([21]
).Let S = (U, A ∪ d) be a multi-scale decision information system, and ∑ is the collection of all scale combinations, For the finer scale The result is obvious; (3) By the definition of the global optimal scale combination, it is easy to be obtained; Definition 11.Let S = (U, A ∪ d) be a multi-scale decision information system and ∑ is the collection of all scale combinations, K = (l 1 , l 2 , • • • , l m ) ∈ ∑.Define the significance of scale combination K as where A represents the numbers of 1 in the matrix A.
Let S = (U, A ∪ d) be a multi-scale decision information system, and ∑ is the collection of all scale combinations.By Definition 7, the decision information system corresponding to scale combination K is represented as For the scale combination The global consistence of the decision information system S K j is compared, i.e., the following significance is calculated as It can be seen that if sig(K j ) is more smaller, the scale K j = (l 1 , • • • , l j − 1, • • • , l m ) becomes more significant.This means that this metric combination is selected first, in other words, the information system corresponding to this scale combination is closer to consistent.Based on this idea, a global optimal scale combination selection algorithm of a multi-scale decision system can be described as Algorithm 1.
As Algorithm 1 shows, the algorithm sets up the scale combination variable optK and significant variable optSig with the coarsest scale and maximun value initially, and then the scale combination K is put into a queue and starts a iteration.In the iteration, the current scale combination K is take out from the queue and sig(K) is calculated.If the current sig(K) is smaller than optSig, then optSig and optK are replaced by the current scale combination K and sig(K).During the iterating, if the value of optSig become zero then iteration is terminal and optimal scale combination is come out.When the queue is empty and optSig = 0, consequence finer scale combinations are constructed from optK and are put into the queue, and starts another iteration.

Algorithm 1: Selecting the global optimal scale combination of a multi-scale decision system
Input: A multi-scale decision system S = (U, A ∪ d) ; Output: The global optimal scale combination.calculates //set finer scale to each attribute, respectively If (l j > 1) //next to search finer scale combination EndWhile Example 4. In Example 2, we seek the global optimal scale combination of the system corresponding to Table 3.

Local Optimal Scale
Definition 12. Let S = (U, A ∪ d) be a consistent multi-scale decision system.and ∑ is the collection of all scale combinations, For x ∈ U and the scale combination not hold, we call K is a local optimal scale combination of S for x.Theorem 8. Let S = (U, A ∪ d) be a consistent multi-scale decision system, and ∑ the collection of all scale combinations.Given a scale combination K ∈ ∑, the following conclusion holds: Definition 13.Let S = (U, A ∪ d) be a multi-scale decision information system, and ∑ is the collection of all scale combinations, K ∈ ∑ .then the significance of scale combination K for x i is defined as ), then the decision information system S K = (U, A K ∪ d) is locally consistent for x i .Based on this idea, a local optimal scale combination selection algorithm of a multi-scale decision system can be described as Algorithm 2.
The computation process of Algorithm 2 is similar to Algorithm 1, except the value of sig(K) is the value of sig(x, K) relevant to the given object x.
Example 5. Look for the local optimal scale combination for x 1 in Table 3.
Algorithm 2: Selecting the local optimal scale combination of a multi-scale decision system Input: A consistent multi-scale decision systems S = (U, A ∪ d) and an object x ∈ U. Output: A local optimal scale combination for x.
//choose the minimum sig(x, K) and its K Let optSig=sig(K) //set finer scale to each attribute, respectively If (l j > 1) //next to search finer scale combinations EndWhile

Local Optimal Scale Selection for Inconsistent Generalized Decision Information Systems
Let S = (U, A d) be a decision information system which defined by Definition 3, where For an inconsistent decision information system, ∃x ∈ U, such that d([x] A ) is not unique.For this reason, Wu and Leung put forward a definition of generalized decision in their paper [21].Definition 14 ([21]).Let S = (U, A d) be a decision information system.For B ⊆ A, the generalized decisions of object x ∈ U is defined as The following theorem is easily obtained from Definitions 14 and 15.
Theorem 9. Let S = (U, A d) be a decision information system, Let d j = d(x j ), It should be pointed out that if there exist a x ∈ U and d(x) = 0, the above theorem is not suitable.For this case, we may assign another value to the attribute d for x, which does not affect the classification and decision making.Example 6. Find the generalized decision of each object in the following inconsistent decision information system Table 4.
Table 4.An inconsistent decision system.
{1, 2} can be obtained by the same method.Definition 16.Let S = (U, A ∪ d) be a multi-scale inconsistent decision information system, and ∑ is the collection of all scale combinations, the decision information systems corresponding to the scale combination K is S K = (U, A K ∪ d).If ∂ A K (x) = ∂ A K 0 (x) holds, we call S K to be local generalized consistent for x.If S K is local generalized consistent for x, but for any L = (l 1 , l 2 , • • • , l m ) with K < L, S L = (U, A L ∪ d) is local generalized inconsistent for x, we call K the optimal local generalized consistence for x.Theorem 10. S K is a local generalized consistence for x i if and only if H A K (i) = H A K 0 (i) holds.
Proof.By Theorem 9, S K is local consistent for Theorem 11.Let S = (U, A ∪ d) be an inconsistent multi-scale decision system, and ∑ is the collection of all scale combinations, K ∈ ∑ .Then K is the optimal scale combination for x i if and only if Definition 17.Let S = (U, A ∪ d) be an inconsistent multi-scale decision system, and ∑ is the collection of all scale combinations, K = (l 1 , • • • , l j , • • • , l m ) ∈ ∑.Then the significance of scale combination K for x i is defined as Therefore, the decision information system S = (U, A ∪ d) is locally generalized consistent for x i .The computation process of Algorithm 3 is similar to Algorithm 2.

Algorithm 3: Selecting the global optimal scale combination of a multi-scale decision system
Input: A inconsistent multi-scale decision system S = (U, A ∪ d) and an object x ∈ U. Output: A local generalized optimal scale combination for x.
//next to search finer scale combination EndWhile Example 7. Search for the generalized local optimal scale combination of the following system Table 5 for x i .Table 5.A generalized inconsistent multi-scale information system.

Conclusions
This paper introduces relational matrices and matrix calculations into multi-scale decision information systems.Some properties of multi-scale decision information systems are discussed based on matrix methods.Using the relation matrix to introduce the significance of scale combination, the method of optimal scale combination selection is studied, and the related algorithm is designed.The effectiveness of the method is illustrated by experiments.In the future, we will use the matrix method to study the classification and decision-making methods of multi-scale information systems.

Experiments and Analysis
In order to verify whether the Algorithms 1 and 2 are able to practically apply to choose the optimal combined-scale in multi-scale decision information system, some University of California Irvine (UCI) datasets are applied in the experiments.Table 6 shows the description of these datasets.These data sets in Table 6 are single-scale decision datasets, and to extend these datasets from single-scale to multi-scale, some methods in [27] are adopted, which are described below.
Firstly, the finest level of scale of each attribute, that is, a 1 (x) is the value of x ∈ U at attribute a in the original dataset.
Secondly, for each attribute a, the standard deviation and minimum are denoted as std(a) and min(a), then the second level of scale of x ∈ U at the attribute a in multi-scale dataset is denoted: Thirdly, based on the previous level of scale, the next scale of attribute a can be made by merging some equivalence classes in previous level of scale.For example, suppose the equivalence classes of previous level of scale denoted S k = {a k (x)|x ∈ U} = {{5}, {6}, {8}, {12}, {20}}, then in the next level of scale, the class {5} and class {6} are merged into {5, 6}, which denoted by {6}.That is, if a k (x) is belong to {5} or {6}, then a k+1 (x) is {6}, and the consequence equivalence classes is S k+1 = {{6}, {8}, {12}, {20}}.
Table 7 shows the result of experiments using multi-scale optimal combined dataset according to the Algorithm 1.In Table 7 it presents the Support Vector Machine (SVM) classification accuracy both under the raw datasets and the refined datasets of optimal combined multi-scale, and it demonstrates the Algorithm 1 works well.
Table 8 shows the SVM accuracy, Rate of optimal, Average of LoS and time cost in Algorithm 2 relatively compare to Algorithm 2. It demonstrates that, the same as optimal performances of global multi-scale optimal algorithm can be achieved by choosing a small amount of instances in local multi-scale optimal algorithm.

Definition 4 .
A non-negative matrix M B = (r ij ) n×n is called a Boolean matrix, where r ij ∈ {0, 1}.Let A = (a ij ) n×m , B = (b ij ) n×m , C = (c ij ) m×l are Boolean matrices, define the following operations: (1) Order relation

a 2
(x) = a(x) − min(a) std(a) ,where y means the largest integer v satisfied v ≤ y.

1 Evaluation Criteria 2 Evaluation Criteria 3 Evaluation Criteria 4
not hold, we call K is a global optimal scale combination.Let S = (U, A ∪ d) be a multi-scale decision information system, and ∑ the collection of all scale combinations.Then the following conclusions hold:(1) S is globally consistent if and only ifM A K 0 M d ; (2) S is globally consistent if and only if M A K 0 ∧ (∼ M d ) = 0 ;(3) A scale combination K ∈ ∑ is the global optimal scale combination if and only if M A K M d holds, and for any H global consistent, by Definition 10, we see that S = (U, A ∪ d) is global consistent;(2) For the coarsest scale combination K = (4, 3, 2), similarly, by calculation, sig(4, 3, 2) = 8; the optimal scale combination is found, return.
1, • • • , 1); //set the finest scale combination CalculatesH A K 0 (x); //H A K 0 (x) correspondents to H A (i) with K 0 and x i in Theorem 9 Let Queue=NULL Let optK= (I 1 , I 2 , • • • , I m ) (x); //H A K (x) correspondents to H A (i) with K and x i in Theorem 9 Let sig(x, K) = H A K (x) − H A K 0 (x)

Table 6 .
The UCI dataset description.

Table 7 .
Experiments results of Algorithm 1.

Table 8 .
The experiments results of Algorithm 2 compares to Algorithm 1.