Next Article in Journal
An App that Changes Mentalities about Mobile Learning—The EduPARK Augmented Reality Activity
Next Article in Special Issue
Multilingual Ranking of Wikipedia Articles with Quality and Popularity Assessment in Different Topics
Previous Article in Journal
Detecting Website Defacements Based on Machine Learning Techniques and Attack Signatures
Previous Article in Special Issue
The Application of Ant Colony Algorithms to Improving the Operation of Traction Rectifier Transformers
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Homogenous Granulation and Its Epsilon Variant †

by
Krzysztof Ropiak
and
Piotr Artiemjew
*,‡
Faculty of Mathematics and Computer Science, University of Warmia and Mazury in Olsztyn, 10-710 Olsztyn, Poland
*
Author to whom correspondence should be addressed.
Extended version of paper “A Study in Granular Computing: homogenous granulation” presented at the 24rd International Conference on Information and Software Technologies (ICIST 2018), Vilnius, Lithuania, 5–6 October 2018.
These authors contributed equally to this work.
Computers 2019, 8(2), 36; https://doi.org/10.3390/computers8020036
Submission received: 14 February 2019 / Revised: 28 April 2019 / Accepted: 7 May 2019 / Published: 10 May 2019

Abstract

:
In the era of Big data, there is still place for techniques which reduce the data size with maintenance of its internal knowledge. This problem is the main subject of research of a family of granulation techniques proposed by Polkowski. In our recent works, we have developed new, really effective and simple techniques for decision approximation, homogenous granulation and epsilon homogenous granulation. The real problem in this family of methods was the choice of an effective parameter of approximation for any datasets. It was resolved by homogenous techniques. There is no need to estimate the optimal parameters of approximation for these methods, because those are set in a dynamic way according to the data internal indiscernibility level. In this work, we have presented an extension of the work presented at ICIST 2018 conference. We present results for homogenous and epsilon homogenous granulation with the comparison of its effectiveness.

1. Introduction

Granular rough computing is one of the techniques used for decision system approximation. This method relies on knowledge granules which are formed from objects with selected, similar features. The main goal is to reduce the amount of data being used for classification or regression, maintaining internal knowledge of the decision system. In the era of processing large datasets these techniques can play a significant role. Basic granulation methods were proposed by Polkowski [1,2]. In the works of Artiemjew [3,4], Polkowski [1,2,5,6,7,8], and Polkowski and Artiemjew [9,10,11,12,13,14] we have presented standard granulation, concept dependent and layered granulation in the context of data reduction, missing values absorbtion and usage in the classification process.
Our motivation to perform this research was an idea to determine effective indiscernibility ratio of decision system approximation without its estimation. The ratio of approximation has influence on the original data size reduction. In our previous methods. we had to estimate this parameter reviewing the set of radii from 0 to 1. In the methods proposed in this work, we do not have to perform this operation. The ratio, for particular central object, is chosen in an automatic way, by extending it until the set of objects is homogenous in the sense of belonging to the decision class. Instead of performing granulation several times, depending on the number of attributes of the object, this process is performed only once, solving the problem of optimal radii search. Our results are showing reduction of training dataset size by up to 50 percent maintaining the internal knowledge at a satisfying level which was measured by efficiency of the classification process. The method is simple, has a square time complexity, U 2 main operations time the scalar | A | . U is the set of objects of decision system, A the set of conditional attributes.
In this work, we have described the results of our previous research, presented in detail in [15,16]. The results are prepared for nominal (Homogenous granulation) data and numerical data (Epsilon homogenous granulation). It is worth to mention that our new methods were implemented in really effective new ensemble model; see [17].
The paper has the following content. In Section 2 there is a theoretical background. In Section 3 and Section 4 we present a description of a our granulation techniques. In Section 5 we present a description of a classifier used in the experimental part. In Section 6 there are the results of our experiments and the conclusion is presented in Section 7.
There are three basic steps of the granulation process. The granules are computed for each training object, then, the training dataset is covered using the selected strategy and in the last step, majority voting is being used to get granular reflection of the training system.
In the next section, we describe the first step of the mentioned procedure.

2. Granular Rough Inclusions

Some more theory about rough inclusions can be found in Polkowski [1,6,7,18,19], a detailed discussion may be found in Polkowski [8].
For given objects u and v from training decision system U , A , d , where U is the universe of objects, A the set of conditional attributes, and d is the decision attribute. The standard rough inclusion μ is defined as
μ ( v , u , r ) | I N D ( u , v ) | | A | r
where
I N D ( u , v ) = { a A : a ( u ) = a ( v ) } ,
The parameter r is the granulation radius from the set { 0 , 1 | A | , 2 | A | , , 1 } .

2.1. ε –Modification of the Standard Rough Inclusion

Given a parameter ε valued in the unit interval [ 0 , 1 ] , we define the set
I n d ε ( u , v ) = { a A : d i s t ( a ( u ) , a ( v ) ) ε } ,
and, we set
μ ε ( v , u , r ) | I n d ε ( u , v ) | | A | r

2.2. Covering of Universe of Training Objects

During the process of covering the objects of the training system are covered based on chosen strategy. Simple random choice was used in this experiment, because it is the most effective method among studied ones; see [14]).
The last step of the granulation process is shown in the next section.

2.3. Granular Reflections

In this step the granular reflections of the original training system are formed based on the granules from the found coverage. Each granule g C O V ( U , μ , r ) from the coverage is finally represented by a single object which attributes are chosen using the Majority Voting ( M V ) strategy.
{ M V ( { a ( u ) : u g } ) : a A { d } }
The granular reflection of the decision system D = ( U , A , d ) is the decision system C O V ( U , μ , r ) , the set of objects formed from granules.
v g r c d ( u ) if   and   only   if μ ( v , u , r ) and ( d ( u ) = d ( v ) )
for a given rough (weak) inclusion μ .
Detailed information about our new method of granulation is presented in the next section.

3. Homogenous Granulation

In this section we have a formal definition of the homogenous granulation process. In plain words, considering the set of samples from a decision system, we can try to lower the size of the system by searching for groups of objects similar in a fixed ratio. Having those sets (the granules), we can cover the original system searching for granules, which represent all the knowledge from original decision systems. In this particular method, we form the group of objects, which belong to the same decision class and have the lowest possible indiscernibility ratio. It means that the similarity of samples is as low as possible until they are in the same class. The granule according to this assumption can be defined as g r u h o m o g e n o u s ; see the equation below.
The granules are formed as follows,
g r u h o m o g e n o u s = { v U : | g r u c d | | g r u | = 0 , f o r m i n i m a l r u f u l f i l l s t h e e q u a t i o n }
where
g r u c d = { v U : I N D ( u , v ) | A | r u A N D d ( u ) = d ( v ) }
and
g r u = { v U : I N D ( u , v ) | A | r u }
r u = { 0 | A | , 1 | A | , , | A | | A | }

3.1. Simple Example of Homogenous Granulation

In the Table 1, we have exemplary training decision system, which we based on while computing homogenous granules defined in previous section. The decision system ( U t r n , B , d ) is the set of resolved problems, useful in modelling the automatic decision process. U t r n is the set of objects from u 1 until u 24 , B is the set of conditional attributes (description of samples) and contains values from b 1 until b 13 . d is a decision attribute, which contains the expert decision used for creating the model of the classification. In our case two possible classes exist: d D = { 1 , 2 } . Lets explain the process of granules formation. For given object u 1 , which belongs to class 1 we are looking for the objects from the same class starting from the identical objects (similar in degree 1) until the objects are indiscernible in smallest possible degree (are the least similar to u 1 ) and all of them are in class 1. In case we greatly lower the indiscernibility ratio, the objects r-indiscernible will not point on the decision class in an unambiguous way. In our example ratio 0.385 = 5 13 for granule g 0.385 ( u 1 ) means that the set contain objects, which are identical with the central one ( u 1 ) at least on 5 positions. For instance, object u 1 and u 6 have the following common descriptors: a 1 = 0 , a 6 = 0 , a 7 = 2 , a 7 = 2 , a 9 = 1 and a 13 = 3 . In the covering part we are looking for the set of granules, which represent each object from U t r n at least once.
Considering training decision system from Table 1.
Homogenous granules are formed as follows:
  • g 0.385 ( u 1 ) = ( u 1 , u 6 , u 10 , u 11 , u 12 , u 18 , u 20 ) ,
  • g 0.462 ( u 2 ) = ( u 2 , u 3 , u 4 , u 5 , u 9 , u 23 ) ,
  • g 0.539 ( u 3 ) = ( u 2 , u 3 , u 5 ) ,
  • g 0.615 ( u 4 ) = ( u 4 ) ,
  • g 0.539 ( u 5 ) = ( u 3 , u 5 , u 21 , u 23 ) ,
  • g 0.462 ( u 6 ) = ( u 4 , u 6 , u 16 , u 20 , u 21 ) ,
  • g 0.539 ( u 7 ) = ( u 7 , u 15 , u 17 ) ,
  • g 0.462 ( u 8 ) = ( u 7 , u 8 , u 13 ) ,
  • g 0.462 ( u 9 ) = ( u 2 , u 4 , u 9 ) ,
  • g 0.615 ( u 10 ) = ( u 10 ) ,
  • g 0.385 ( u 11 ) = ( u 1 , u 6 , u 11 , u 12 , u 20 ) ,
  • g 0.385 ( u 12 ) = ( u 1 , u 11 , u 12 , u 18 , u 20 ) ,
  • g 0.615 ( u 13 ) = ( u 13 ) ,
  • g 0.385 ( u 14 ) = ( u 14 , u 15 , u 24 ) ,
  • g 0.615 ( u 15 ) = ( u 15 ) ,
  • g 0.539 ( u 16 ) = ( u 16 ) ,
  • g 0.539 ( u 17 ) = ( u 7 , u 15 , u 17 ) ,
  • g 0.389 ( u 18 ) = ( u 1 , u 2 , u 6 , u 10 , u 12 , u 18 , u 20 , u 21 , u 23 ) ,
  • g 0.615 ( u 19 ) = ( u 19 ) ,
  • g 0.462 ( u 20 ) = ( u 1 , u 6 , u 11 , u 12 , u 18 , u 20 ) ,
  • g 0.462 ( u 21 ) = ( u 3 , u 5 , u 6 , u 21 , u 23 ) ,
  • g 0.615 ( u 22 ) = ( u 22 ) ,
  • g 0.462 ( u 23 ) = ( u 2 , u 3 , u 5 , u 21 , u 23 ) ,
  • g 0.462 ( u 24 ) = ( u 7 , u 15 , u 24 ) ,
We cover the universe of objects by random choice:
  • g 0.462 ( u 2 ) = ( u 2 , u 3 , u 4 , u 5 , u 9 , u 23 ) ,
  • g 0.539 ( u 3 ) = ( u 2 , u 3 , u 5 ) ,
  • g 0.462 ( u 6 ) = ( u 4 , u 6 , u 16 , u 20 , u 21 ) ,
  • g 0.462 ( u 8 ) = ( u 7 , u 8 , u 13 ) ,
  • g 0.385 ( u 12 ) = ( u 1 , u 11 , u 12 , u 18 , u 20 ) ,
  • g 0.385 ( u 14 ) = ( u 14 , u 15 , u 24 ) ,
  • g 0.539 ( u 17 ) = ( u 7 , u 15 , u 17 ) ,
  • g 0.385 ( u 18 ) = ( u 1 , u 2 , u 6 , u 10 , u 12 , u 18 , u 20 , u 21 , u 23 ) ,
  • g 0.615 ( u 19 ) = ( u 19 ) ,
  • g 0.462 ( u 21 ) = ( u 3 , u 5 , u 6 , u 21 , u 23 ) ,
  • g 0.615 ( u 22 ) = ( u 22 ) ,
Final granular system is in Table 2.
Exemplary visualization of granulation process is presented in Figure 1.

4. Epsilon Variant of Homogenous Granulation

The only difference according to homogenous granulation described in Section 3.1 is the addition of the parameter ε , which allows us to use a floating point value in the process of granulation. The rest of the techniques are similar.
The method is defined in the following way,
g r u ε , h o m o g e n o u s = { v U : | g r u ε c d | | g r u ε | = 0 , f o r m i n i m a l r u f u l f i l l s t h e e q u a t i o n }
where
g r u ε , c d ( u ) = { v U : I N D ε ( u , v ) | A | r u A N D d ( u ) = d ( v ) }
and
g r u ε ( u ) = { v U : I N D ε ( u , v ) | A | r u }
r u = { 0 | A | , 1 | A | , , | A | | A | }
I N D ε ( u , v ) = { a A : | a ( u ) a ( v ) | m a x a m i n a ε }
where m a x a , m i n a are the maximal and minimal attribute values for a A in the original dataset.
The metrics for epsilon granulation and classification are defined in Equations (9) and (10) respectively. The Hamming metric for symbolic data is placed in Equation (9). ε -normalized Hamming metric as modification for numerical values, for given ε is in Equation (10).
d H ( u , v ) = | { a A : a ( u ) a ( v ) } | .
d H , ε ( u , v ) = | { a A : | a ( u ) a ( v ) | m a x a m i n a > ε } | .
Considering training decision system from Table 3 the hand example of ε homogenous granulation is as follows.
The granules are computed below:
  • g 0.571429 ( u 1 ) = ( u 1 ) ,
  • g 0.5 ( u 2 ) = ( u 2 , u 4 , u 15 , u 21 ) ,
  • g 0.571429 ( u 3 ) = ( u 3 , u 9 , u 19 , u 20 ) ,
  • g 0.5 ( u 4 ) = ( u 1 , u 2 , u 4 , u 6 , u 21 ) ,
  • g 0.5 ( u 5 ) = ( u 5 , u 10 , u 19 , u 24 ) ,
  • g 0.5 ( u 6 ) = ( u 1 , u 4 , u 6 ) ,
  • g 0.5 ( u 7 ) = ( u 7 ) ,
  • g 0.5 ( u 8 ) = ( u 8 , u 9 , u 11 , u 17 ) ,
  • g 0.642857 ( u 9 ) = ( u 9 , u 10 , u 11 , u 17 , u 19 , u 20 ) ,
  • g 0.642857 ( u 10 ) = ( u 9 , u 10 , u 19 ) ,
  • g 0.642857 ( u 11 ) = ( u 9 , u 11 , u 17 , u 19 , u 20 ) ,
  • g 0.642857 ( u 12 ) = ( u 12 ) ,
  • g 0.571429 ( u 13 ) = ( u 13 ) ,
  • g 0.428571 ( u 14 ) = ( u 2 , u 14 , u 16 , u 21 ) ,
  • g 0.5 ( u 15 ) = ( u 2 , u 12 , u 15 , u 21 ) ,
  • g 0.5 ( u 16 ) = ( u 1 , u 14 , u 16 ) ,
  • g 0.642857 ( u 17 ) = ( u 9 , u 11 , u 17 , u 20 ) ,
  • g 0.642857 ( u 18 ) = ( u 18 ) ,
  • g 0.571429 ( u 19 ) = ( u 3 , u 9 , u 10 , u 11 , u 17 , u 19 , u 20 , u 24 ) ,
  • g 0.642857 ( u 20 ) = ( u 9 , u 11 , u 17 , u 19 , u 20 ) ,
  • g 0.5 ( u 21 ) = ( u 2 , u 4 , u 14 , u 15 , u 21 ) ,
  • g 0.642857 ( u 22 ) = ( u 22 ) ,
  • g 0.642857 ( u 23 ) = ( u 23 ) ,
  • g 0.642857 ( u 24 ) = ( u 24 ) ,
Granules covering training system by random choice:
  • Covering granules: g 0.5 ( u 2 ) = ( u 2 , u 4 , u 15 , u 21 ) ,
  • g 0.571429 ( u 3 ) = ( u 3 , u 9 , u 19 , u 20 ) ,
  • g 0.5 ( u 5 ) = ( u 5 , u 10 , u 19 , u 24 ) ,
  • g 0.5 ( u 6 ) = ( u 1 , u 4 , u 6 ) ,
  • g 0.5 ( u 7 ) = ( u 7 ) ,
  • g 0.5 ( u 8 ) = ( u 8 , u 9 , u 11 , u 17 ) ,
  • g 0.642857 ( u 12 ) = ( u 12 ) ,
  • g 0.571429 ( u 13 ) = ( u 13 ) ,
  • g 0.5 ( u 16 ) = ( u 1 , u 14 , u 16 ) ,
  • g 0.642857 ( u 18 ) = ( u 18 ) ,
  • g 0.642857 ( u 20 ) = ( u 9 , u 11 , u 17 , u 19 , u 20 ) ,
  • g 0.5 ( u 21 ) = ( u 2 , u 4 , u 14 , u 15 , u 21 ) ,
  • g 0.642857 ( u 22 ) = ( u 22 ) ,
  • g 0.642857 ( u 23 ) = ( u 23 ) ,
Final approximation of training decision system is in Table 4:
In the Figure 2 there is a simple visualization of granulation process.

5. Description of Classifier Used for Evaluation of the Granulation

A k N N classifier has been used in the experiments to verify the effectiveness of approximation. The procedure is as follows.
Step 1.
The training granular decision system ( G r g r a n t r n , A , d ) and the test decision system ( U t s t , A , d ) are given, where A is a set of conditional attributes, d is the decision attribute, and r g r a n a granulation radius.
Step 2.
Classification of test objects, by means of granules of training objects, is performed as follows.
For all conditional attributes a A , training objects v G t r n , and test objects u U t s t , we compute weights w ( u , v ) based on the Hamming metric.
In the voting procedure of the k N N classifier, we use optimal k estimated by CV5, details of the procedure are highlighted in the next section.
If the cardinality of the smallest training decision class is less than k, we apply the value for k = | t h e s m a l l e s t t r a i n i n g d e c i s i o n c l a s s | .
The test object u is classified by means of weights computed for all training objects v. Weights are sorted in ascending order as,
w 1 c 1 ( u , v 1 c 1 ) w 2 c 1 ( u , v 2 c 1 ) w | C 1 | c 1 ( u , v | C 1 | c 1 ) ;
w 1 c 2 ( u , v 1 c 2 ) w 2 c 2 ( u , v 2 c 2 ) w | C 2 | c 2 ( u , v | C 2 | c 2 ) ;
w 1 c m ( u , v 1 c m ) w 2 c m ( u , v 2 c m ) w | C m | c m ( u , v | C m | c m ) ,
where C 1 , C 2 , , C m are all decision classes in the training set.
Based on computed and sorted weights, training decision classes vote by means of the following parameter, where c runs over decision classes in the training set,
C o n c e p t _ w e i g h t c ( u ) = i = 1 k w i c ( u , v i c ) .
Finally, the test object u is classified into the class c with a minimal value of C o n c e p t _ w e i g h t c ( u ) .
After all test objects u are classified, the quality parameter of accuracy (acc) is computed, according to the formula
a c c = n u m b e r o f c o r r e c t l y c l a s s i f i e d o b j e c t s n u m b e r o f c l a s s i f i e d o b j e c t s .

Parameter Estimation in k N N Classifier

In our experiments, we use the classical version of k N N classifier based on the Hamming metric. In the first step, we estimate the optimal k based on 5× CV5 cross-validation on the part of dataset. In the next step, we use the estimated value of k in order to find k nearest objects for each decision class and then voting is performed to select the decision. If the value of k is larger than the smallest training decision class cardinality then k value is equal to cardinality of this class.
In Table 5 we can see the estimated values of k for all tested datasets. These values were chosen as optimal based on the experiments with various values of k and results estimated by multiple C V 5 operations.

6. The Results of Experiments

To show the effectiveness of the new method, we have carried out a series of experiments with real data from University of Irvine Repository (see [20]). The reference classifier is k N N with Cross Validation 5 model. Data for experiments are listed in Table 6. The k parameter was evaluated in our previous works [14]. The list of optimal parameters of k is shown in Table 5. The single test consists of splitting the data into training and test set, where the training samples are granulated using our homogenous method. The results of the experiments are presented in Table 7. We have shown the comparable effectiveness of this new method in comparison with our best concept dependent granulation method; see Table 8. The new technique is significantly different from existing methods. Dynamic tuning of radius during granulation results with granules directed on decisions of their central objects. The radius is selected in automatic way during granulation process so there is no need to estimate optimal radius of granulation. The approximation level depends on objects indiscernibility ratio in the particular decision classes. Epsilon variant; see Table 9 is fully comparable to the homogenous method and works more precisely for numerical data.

7. Conclusions

In this work, we have the results of experiments for our new granulation techniques; homogenous and epsilon homogenous granulation. The main advantage of this methods is that there is no need of parameter estimation during approximation. The parameters are tuned in an automatic way by lowering the indiscernibility ratio until the granule contains objects from the same decision class. The reduction of the size of the original decision systems is up to 50 percent. In future works, we plan to check the best classification methods for our new approximation algorithms. Additionally, we wonder if tolerating a fixed percentage of objects from other classes in the granule could improve the quality of classification.

Author Contributions

Conceptualization, P.A. and K.R.; Methodology, P.A. and K.R.; Software, P.A. and K.R.; Validation, P.A. and K.R.; Formal Analysis, P.A. and K.R.; Investigation, P.A. and K.R.; Resources, P.A. and K.R.; Writing—Original Draft Preparation, P.A. and K.R.; Writing—Review and Editing, P.A. and K.R.; Visualization, P.A. and K.R.; Project Administration, P.A. and K.R. Funding Acquisition, P.A. and K.R.

Funding

This work has been fully supported by the grant from Ministry of Science and Higher Education of the Republic of Poland under the project number 23.610.007-300.

Conflicts of Interest

The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.

References

  1. Polkowski, L. Formal granular calculi based on rough inclusions. In Proceedings of the IEEE 2005 Conference on Granular Computing GrC05, Beijing, China, 25–27 July 2005; pp. 57–62. [Google Scholar]
  2. Polkowski, L. A model of granular computing with applications. In Proceedings of the IEEE 2006 Conference on Granular Computing GrC06, Atlanta, GA, USA, 10–12 May 2006; pp. 9–16. [Google Scholar]
  3. Artiemjew, P. Classifiers from Granulated Data Sets: Concept Dependent and Layered Granulation. In Proceedings of the RSKD’07. The Workshops at ECML/PKDD’07, Warsaw, Poland, 21 September 2007; pp. 1–9. [Google Scholar]
  4. Artiemjew, P. A Review of the Knowledge Granulation Methods: Discrete vs. Continuous Algorithms. In Rough Sets and Intelligent Systems. ISRL 43; Skowron, A., Suraj, Z., Eds.; Springer: Berlin, Germany, 2013; pp. 41–59. [Google Scholar]
  5. Polkowski, L. The paradigm of granular rough computing. In Proceedings of the ICCI’07, Lake Tahoe, NV, USA, 6–8 August 2007; pp. 145–163. [Google Scholar]
  6. Polkowski, L. Granulation of knowledge in decision systems: The approach based on rough inclusions. The method and its applications. In Proceedings of the RSEISP 07, Warsaw, Poland, 28–30 June 2007; Lecture Notes in Artificial Intelligence. Springer: Berlin, Germany, 2007; Volume 4585, pp. 271–279. [Google Scholar]
  7. Polkowski, L. A unified approach to granulation of knowledge and granular computing based on rough mereology: A survey. In Handbook of Granular Computing; Pedrycz, W., Skowron, A., Kreinovich, V., Eds.; John Wiley and Sons Ltd.: Chichester, UK, 2008; pp. 375–400. [Google Scholar]
  8. Polkowski, L. Approximate Reasoning by Parts. An Introduction to Rough Mereology; Springer: Berlin, Germany, 2011. [Google Scholar]
  9. Polkowski, L.; Artiemjew, P. Granular computing: Granular classifiers and missing values. In Proceedings of the ICCI’07, Lake Tahoe, NV, USA, 6–8 August 2007; pp. 186–194. [Google Scholar]
  10. Polkowski, L.; Artiemjew, P. On granular rough computing with missing values. In Proceedings of the RSEISP 07, Warsaw, Poland, 28–30 June 2007; Lecture Notes in Artificial Intelligence. Springer: Berlin, Germany, 2007; Volume 4585, pp. 271–279. [Google Scholar]
  11. Polkowski, L.; Artiemjew, P. Towards Granular Computing: Classifiers Induced from Granular Structures. In Proceedings of the RSKD’07. The Workshops at ECML/PKDD’07, Warsaw, Poland, 21 September 2007; pp. 43–53. [Google Scholar]
  12. Polkowski, L.; Artiemjew, P. On granular rough computing: Factoring classifiers through granulated decision systems. In Proceedings of the RSEISP 07, Warsaw, Poland, 28–30 June 2007; Lecture Notes in Artificial Intelligence. Springer: Berlin, Germany, 2007; Volume 4585, pp. 280–290. [Google Scholar]
  13. Polkowski, L.; Artiemjew, P. Classifiers based on granular structures from rough inclusions. In Proceedings of the 12th International Conference on Information Processing and Management of Uncertainty in Knowledge-Based Systems IPMU’08, Torremolinos (Malaga), Spain, 22–27 June 2008; pp. 1786–1794. [Google Scholar]
  14. Polkowski, L.; Artiemjew, P. Granular Computing in Decision Approximation—An Application of Rough Mereology. In Intelligent Systems Reference Library 77; Springer: Berlin, Germany, 2015; pp. 1–422. ISBN 978-3-319-12879-5. [Google Scholar]
  15. Ropiak, K.; Artiemjew, P. On Granular Rough Computing: epsilon homogenous granulation. In Proceedings of the International Joint Conference on Rough Sets, IJCRS’18, Quy Nhon, Vietnam, 20–24 August 2018; Lecture Notes in Computer Science (LNCS). Springer: Heidelberg, Germany, 2018. [Google Scholar]
  16. Ropiak, K.; Artiemjew, P. A Study in Granular Computing: homogenous granulation. In Information and Software Technologies. ICIST 2018. Communications in Computer and Information Science; Dregvaite, G., Damasevicius, R., Eds.; Springer: Berlin, Germany, 2018. [Google Scholar]
  17. Artiemjew, P.; Ropiak, K. A Novel Ensemble Model—The Random Granular Reflections. In Proceedings of the 27th International Workshop on Concurrency, Specification and Programming, Berlin, Germany, 24–26 September 2018. [Google Scholar]
  18. Polkowski, L. Rough Sets. Mathematical Foundations; Physica Verlag: Heidelberg, Germany, 2002. [Google Scholar]
  19. Polkowski, L. A rough set paradigm for unifying rough set theory and fuzzy set theory. In Proceedings of the RSFDGrC 2003: Rough Sets, Fuzzy Sets, Data Mining, and Granular Computing, Chongqing, China, 26–29 May 2003. [Google Scholar]
  20. Dua, D.; Graff, C. UCI Machine Learning Repository; University of California, School of Information and Computer Science: Irvine, CA, USA, 2019; Available online: https://archive.ics.uci.edu/ml/index.php (accessed on 14 February 2019).
Figure 1. Simple demonstration of granulation for objects represented by the pairs of attributes. In the picture we have objects of two classes, circles and triangles. Granulating the decision system in homogenous way we can obtain g 0.5 ( o b 1 ) = { o b 1 , o b 5 } , g 1 ( o b 2 ) = { o b 2 } , g 0.5 ( o b 3 ) = { o b 3 } , g 1 ( o b 4 ) = { o b 4 } , g 0.5 ( o b 1 ) = { o b 5 , o b 1 } . The set of possible radii is { 0 2 , 1 2 , 2 2 } .
Figure 1. Simple demonstration of granulation for objects represented by the pairs of attributes. In the picture we have objects of two classes, circles and triangles. Granulating the decision system in homogenous way we can obtain g 0.5 ( o b 1 ) = { o b 1 , o b 5 } , g 1 ( o b 2 ) = { o b 2 } , g 0.5 ( o b 3 ) = { o b 3 } , g 1 ( o b 4 ) = { o b 4 } , g 0.5 ( o b 1 ) = { o b 5 , o b 1 } . The set of possible radii is { 0 2 , 1 2 , 2 2 } .
Computers 08 00036 g001
Figure 2. Exemplary toy demonstration for objects represented as pairs of attributes. We have two decision concepts: circles and rectangles. Epsilon homogenous granules can be g 0.5 ε ( o b 1 ) = { o b 1 , o b 5 } , g 1 ε ( o b 2 ) = { o b 2 } , g 0.5 ε ( o b 3 ) = { o b 3 } , g 1 ε ( o b 4 ) = { o b 4 } , g 0.5 ε ( o b 1 ) = { o b 5 , o b 1 } . The set of possible radii is { 0 2 , 1 2 , 2 2 } . The descriptors can be shifted in the range determined by ε and still were treated as indiscernible.
Figure 2. Exemplary toy demonstration for objects represented as pairs of attributes. We have two decision concepts: circles and rectangles. Epsilon homogenous granules can be g 0.5 ε ( o b 1 ) = { o b 1 , o b 5 } , g 1 ε ( o b 2 ) = { o b 2 } , g 0.5 ε ( o b 3 ) = { o b 3 } , g 1 ε ( o b 4 ) = { o b 4 } , g 0.5 ε ( o b 1 ) = { o b 5 , o b 1 } . The set of possible radii is { 0 2 , 1 2 , 2 2 } . The descriptors can be shifted in the range determined by ε and still were treated as indiscernible.
Computers 08 00036 g002
Table 1. Example of decision system ( U t r n , B , d ) .
Table 1. Example of decision system ( U t r n , B , d ) .
b 1 b 2 b 3 b 4 b 5 b 6 b 7 b 8 b 9 b 10 b 11 b 12 b 13 d
u 1 74.0 0.0 2.0 120.0 269.0 0.0 2.0 121.0 1.0 0.2 1.0 1.0 3.0 1
u 2 65.0 1.0 4.0 120.0 177.0 0.0 0.0 140.0 0.0 0.4 1.0 0.0 7.0 1
u 3 59.0 1.0 4.0 135.0 234.0 0.0 0.0 161.0 0.0 0.5 2.0 0.0 7.0 1
u 4 53.0 1.0 4.0 142.0 226.0 0.0 2.0 111.0 1.0 0.0 1.0 0.0 7.0 1
u 5 43.0 1.0 4.0 115.0 303.0 0.0 0.0 181.0 0.0 1.2 2.0 0.0 3.0 1
u 6 46.0 0.0 4.0 138.0 243.0 0.0 2.0 152.0 1.0 0.0 2.0 0.0 3.0 1
u 7 60.0 1.0 4.0 140.0 293.0 0.0 2.0 170.0 0.0 1.2 2.0 2.0 7.0 2
u 8 63.0 0.0 4.0 150.0 407.0 0.0 2.0 154.0 0.0 4.0 2.0 3.0 7.0 2
u 9 40.0 1.0 1.0 140.0 199.0 0.0 0.0 178.0 1.0 1.4 1.0 0.0 7.0 1
u 10 48.0 1.0 2.0 130.0 245.0 0.0 2.0 180.0 0.0 0.2 2.0 0.0 3.0 1
u 11 54.0 0.0 2.0 132.0 288.0 1.0 2.0 159.0 1.0 0.0 1.0 1.0 3.0 1
u 12 71.0 0.0 3.0 110.0 265.0 1.0 2.0 130.0 0.0 0.0 1.0 1.0 3.0 1
u 13 70.0 1.0 4.0 130.0 322.0 0.0 2.0 109.0 0.0 2.4 2.0 3.0 3.0 2
u 14 56.0 1.0 3.0 130.0 256.0 1.0 2.0 142.0 1.0 0.6 2.0 1.0 6.0 2
u 15 59.0 1.0 4.0 110.0 239.0 0.0 2.0 142.0 1.0 1.2 2.0 1.0 7.0 2
u 16 64.0 1.0 1.0 110.0 211.0 0.0 2.0 144.0 1.0 1.8 2.0 0.0 3.0 1
u 17 67.0 1.0 4.0 120.0 229.0 0.0 2.0 129.0 1.0 2.6 2.0 2.0 7.0 2
u 18 51.0 0.0 3.0 120.0 295.0 0.0 2.0 157.0 0.0 0.6 1.0 0.0 3.0 1
u 19 64.0 1.0 4.0 128.0 263.0 0.0 0.0 105.0 1.0 0.2 2.0 1.0 7.0 1
u 20 57.0 0.0 4.0 128.0 303.0 0.0 2.0 159.0 0.0 0.0 1.0 1.0 3.0 1
u 21 71.0 0.0 4.0 112.0 149.0 0.0 0.0 125.0 0.0 1.6 2.0 0.0 3.0 1
u 22 53.0 1.0 4.0 140.0 203.0 1.0 2.0 155.0 1.0 3.1 3.0 0.0 7.0 2
u 23 47.0 1.0 4.0 112.0 204.0 0.0 0.0 143.0 0.0 0.1 1.0 0.0 3.0 1
u 24 58.0 1.0 3.0 112.0 230.0 0.0 2.0 165.0 0.0 2.5 2.0 1.0 7.0 2
Table 2. Granular decision system formed from Covering granules.
Table 2. Granular decision system formed from Covering granules.
b 1 b 2 b 3 b 4 b 5 b 6 b 7 b 8 b 9 b 10 b 11 b 12 b 13 d
g 0.462 ( u 2 ) 65.0 1.0 4.0 120.0 177.0 0.0 0.0 140.0 0.0 0.4 1.0 0.0 7.0 1
g 0.539 ( u 3 ) 65.0 1.0 4.0 120.0 177.0 0.0 0.0 140.0 0.0 0.4 2.0 0.0 7.0 1
g 0.462 ( u 6 ) 53.0 0.0 4.0 142.0 226.0 0.0 2.0 111.0 1.0 0.0 2.0 0.0 3.0 1
g 0.462 ( u 8 ) 60.0 1.0 4.0 140.0 293.0 0.0 2.0 170.0 0.0 1.2 2.0 3.0 7.0 2
g 0.385 ( u 12 ) 74.0 0.0 2.0 120.0 269.0 0.0 2.0 159.0 0.0 0.0 1.0 1.0 3.0 1
g 0.385 ( u 14 ) 56.0 1.0 3.0 130.0 256.0 0.0 2.0 142.0 1.0 0.6 2.0 1.0 7.0 2
g 0.539 ( u 17 ) 60.0 1.0 4.0 140.0 293.0 0.0 2.0 170.0 1.0 1.2 2.0 2.0 7.0 2
g 0.385 ( u 18 ) 71.0 0.0 4.0 120.0 269.0 0.0 2.0 121.0 0.0 0.0 1.0 0.0 3.0 1
g 0.615 ( u 19 ) 64.0 1.0 4.0 128.0 263.0 0.0 0.0 105.0 1.0 0.2 2.0 1.0 7.0 1
g 0.462 ( u 21 ) 59.0 1.0 4.0 112.0 234.0 0.0 0.0 161.0 0.0 0.5 2.0 0.0 3.0 1
g 0.615 ( u 22 ) 53.0 1.0 4.0 140.0 203.0 1.0 2.0 155.0 1.0 3.1 3.0 0.0 7.0 2
Table 3. Training data system ( U t r n , A , d ) , (a sample from australian credit dataset), for ε = 0.05 .
Table 3. Training data system ( U t r n , A , d ) , (a sample from australian credit dataset), for ε = 0.05 .
b 1 b 2 b 3 b 4 b 5 b 6 b 7 b 8 b 9 b 10 b 11 b 12 b 13 d
u 1 1 20.17 8.17 264 1.96 111402601591
u 2 1 34.92 52148 7.5 11612010011
u 3 1 58.58 2.71 284 2.415 0001232010
u 4 1 29.58 4.5 294 7.5 1121233011
u 5 0 19.17 0.58 164 0.585 1001216010
u 6 1 23.08 2.5 284 1.085 1111126021851
u 7 0 21.67 11.5 1530111112011
u 8 1 27.83 11283000021765380
u 9 1 41.17 1.33 224 0.165 0000216810
u 10 1 41.58 1.75 244 0.21 1000216010
u 11 1 22.5 0.12 144 0.125 00002200710
u 12 1 33.17 3.04 188 2.04 11112180180281
u 13 1.234 22.08 11.46 244 1.585 0001210012130
u 14 0 58.67 4.46 2118 3.04 11602435611
u 15 1 33.5 1.75 2148 4.5 114122538581
u 16 0 18.92 9264 0.75 11202885921
u 17 120 1.25 144 0.125 0000214050
u 18 1 19.5 9.58 264 0.79 00002803510
u 19 0 22.67 3.8 284 0.165 0000216010
u 20 1 17.42 6.5 234 0.125 00002601010
u 21 1 41.42 5211851161247011
u 22 1 20.67 1.25 188 1.375 113121402110
u 23 1 48.08 6.04 244 0.04 00002026911
u 24 0 28.17 0.58 264 0.04 0000226010050
Table 4. Granular decision system formed from Covering granules.
Table 4. Granular decision system formed from Covering granules.
b 1 b 2 b 3 b 4 b 5 b 6 b 7 b 8 b 9 b 10 b 11 b 12 b 13 d
g 0.5 ( u 2 ) 1 34.92 52148 7.5 11612010011
g 0.571429 ( u 3 ) 1 58.58 2.71 284 0.165 0000232010
g 0.5 ( u 5 ) 0 19.17 0.58 264 0.21 1000216010
g 0.5 ( u 6 ) 1 20.17 8.17 264 1.96 111412601591
g 0.5 ( u 7 ) 0 21.67 11.5 1530111112011
g 0.5 ( u 8 ) 1 27.83 1.33 124 0.165 0000217610
g 0.642857 ( u 12 ) 1 33.17 3.04 188 2.04 1111218018,0281
g 0.571429 ( u 13 ) 1.234 22.08 11.46 244 1.585 0001210012130
g 0.5 ( u 16 ) 0 20.17 8.17 264 1.96 111402605611
g 0.642857 ( u 18 ) 1 19.5 9.58 264 0.79 00002803510
g 0.642857 ( u 20 ) 1 22.5 1.33 244 0.165 0000216810
g 0.5 ( u 21 ) 1 34.92 52148 7.5 11612010011
g 0.642857 ( u 22 ) 1 20.67 1.25 188 1.375 113121402110
g 0.642857 ( u 23 ) 1 48.08 6.04 244 0.04 00002026911
Table 5. Estimated parameters for k N N based on 5 × C V 5 cross–validation, data from UCI Repository [20].
Table 5. Estimated parameters for k N N based on 5 × C V 5 cross–validation, data from UCI Repository [20].
Name Optimal k
A u s t r a l i a n - c r e d i t 5
C a r E v a l u a t i o n 8
D i a b e t e s 3
G e r m a n - c r e d i t 18
H e a r t d i s e a s e 19
H e p a t i t i s 3
N u r s e r y 4
S P E C T F H e a r t 14
Table 6. Basic information about datasets-[20].
Table 6. Basic information about datasets-[20].
Name Attr Type Attr no . Obj no . Class no .
Australian-credit c a t e g o r i c a l , i n t e g e r , r e a l 156902
C a r E v a l u a t i o n c a t e g o r i c a l 717284
D i a b e t e s c a t e g o r i c a l , i n t e g e r 97682
German-credit c a t e g o r i c a l , i n t e g e r 2110002
H e a r t d i s e a s e c a t e g o r i c a l , r e a l 142702
H e p a t i t i s c a t e g o r i c a l , i n t e g e r , r e a l 201552
N u r s e r y c a t e g o r i c a l 912,9605
S P E C T F H e a r t i n t e g e r 452672
Table 7. The result for dynamic granulation; 5 × C V 5 method with k N N classifier; a c c _ 5 C V 5 = a v e r a g e a c c u r a c y , G S _ s i z e = g r a n u l a r d e c i s i o n s y s t e m s i z e , T R N _ s i z e = t r a i n i n g s e t s i z e , T R N _ r e d u c t i o n = r e d u c t i o n i n o b j e c t n u m b e r i n t r a i n i n g s i z e , r a d i i _ r a n g e = s p e c t r u m o f r a d i i .
Table 7. The result for dynamic granulation; 5 × C V 5 method with k N N classifier; a c c _ 5 C V 5 = a v e r a g e a c c u r a c y , G S _ s i z e = g r a n u l a r d e c i s i o n s y s t e m s i z e , T R N _ s i z e = t r a i n i n g s e t s i z e , T R N _ r e d u c t i o n = r e d u c t i o n i n o b j e c t n u m b e r i n t r a i n i n g s i z e , r a d i i _ r a n g e = s p e c t r u m o f r a d i i .
Name acc GS _ size TRN _ size TRN _ reduction radii _ range
Australian-credit 0.835 286.52 552 48.1 % r u 0.5
C a r E v a l u a t i o n 0.797 728.5 1382 47.3 % r u 0.667
D i a b e t e s 0.653 488.9 614 20.4 % r u 0.25
German-credit 0.725 513.3 800 35.8 % r u 0.6
H e a r t d i s e a s e 0.833 120.5 216 44.2 % r u 0.461
H e p a t i t i s 0.88 46.16 124 62.8 % r u 0.579
N u r s e r y 0.607 9009.1 10368 13.1 % r u 0.875
S P E C T F H e a r t 0.763 138.75 214 35.2 % r u 0.068
Table 8. Summary of results for k N N Classifier, granular and non granular case, a c c = accuracy of classification, r e d = percentage reduction in object number, r = granulation radius, m e t h o d = variant of Naive Bayes classifier, n i l . a c c = non granular case.
Table 8. Summary of results for k N N Classifier, granular and non granular case, a c c = accuracy of classification, r e d = percentage reduction in object number, r = granulation radius, m e t h o d = variant of Naive Bayes classifier, n i l . a c c = non granular case.
Name acc , red , r nil . acc
A u s t r a l i a n - c r e d i t 0.851 , 71.86 , 0.571 0.855
C a r E v a l u a t i o n 0.865 , 73.23 , 0.833 0.944
D i a b e t e s 0.616 , 74.74 , 0.25 0.631
G e r m a n - c r e d i t 0.724 , 59.85 , 0.65 0.73
H e a r t d i s e a s e 0.83 , 67.69 , 0.538 0.837
H e p a t i t i s 0.884 , 60 , 0.632 0.89
N u r s e r y 0.696 , 77.09 , 0.875 0.578
S P E C T F H e a r t 0.802 , 60.3 , 0.114 0.779
Table 9. The result for homogenous granulation ( H G ) and for epsilon homogenous granulation ( ε H G S ); 5 × C V 5 ; H G _ a c c = average accuracy for H G , ε H G _ a c c average accuracy for ε H G S , H G S _ s i z e = H G decision system size, ε H G S _ s i z e = ε H G S decision system size, T R N _ s i z e = t r a i n i n g s e t s i z e , H G T R N _ r e d = reduction in object number in training set for H G , ε H G S _ s i z e = reduction in object number in training set for ε H G S , H G _ r _ r a n g e = spectrum of radii for H G , ε H G _ r _ r a n g e = spectrum of radii for ε H G S , d a t a 1 = Australian-credit, d a t a 2 = German-credit, d a t a 3 = Heartdisease, d a t a 4 = Hepatitis.
Table 9. The result for homogenous granulation ( H G ) and for epsilon homogenous granulation ( ε H G S ); 5 × C V 5 ; H G _ a c c = average accuracy for H G , ε H G _ a c c average accuracy for ε H G S , H G S _ s i z e = H G decision system size, ε H G S _ s i z e = ε H G S decision system size, T R N _ s i z e = t r a i n i n g s e t s i z e , H G T R N _ r e d = reduction in object number in training set for H G , ε H G S _ s i z e = reduction in object number in training set for ε H G S , H G _ r _ r a n g e = spectrum of radii for H G , ε H G _ r _ r a n g e = spectrum of radii for ε H G S , d a t a 1 = Australian-credit, d a t a 2 = German-credit, d a t a 3 = Heartdisease, d a t a 4 = Hepatitis.
Data 1 Data 2 Data 3 Data 4
H G _ a c c 0.835 0.725 0.833 0.88
ε H G _ a c c 0.842 0.725 0.831 0.87
H G S _ s i z e 286.52 513.3 120.5 46.16
ε H G S _ s i z e 274.52 503 109.4 46.2
T R N _ s i z e 552800216124
H G T R N _ r e d 48.1 % 35.8 % 44.2 % 62.8 %
ε H G T R N _ r e d 50.3 % 37.1 % 49.4 % 62.7 %
H G _ r _ r a n g e r u 0.5 r u 0.6 r u 0.461 r u 0.579
ε H G _ r _ r a n g e r u 0.571 r u 0.65 r u 0.615 r u 0.579

Share and Cite

MDPI and ACS Style

Ropiak, K.; Artiemjew, P. Homogenous Granulation and Its Epsilon Variant. Computers 2019, 8, 36. https://doi.org/10.3390/computers8020036

AMA Style

Ropiak K, Artiemjew P. Homogenous Granulation and Its Epsilon Variant. Computers. 2019; 8(2):36. https://doi.org/10.3390/computers8020036

Chicago/Turabian Style

Ropiak, Krzysztof, and Piotr Artiemjew. 2019. "Homogenous Granulation and Its Epsilon Variant" Computers 8, no. 2: 36. https://doi.org/10.3390/computers8020036

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop