About Granular Rough Computing—Overview of Decision System Approximation Techniques and Future Perspectives

: Granular computing techniques are a huge discipline in which the basic component is to operate on groups of similar objects according to a ﬁxed similarity measure. The ﬁrst references to the granular computing can be seen in the works of Zadeh in fuzzy set theory. Granular computing allows for a very natural modelling of the world. It is very likely that the human brain, while solving problems, performs granular calculations on data collected from the senses. The researchers of this paradigm have proven the unlimited possibilities of granular computing. Among other things, they are used in the processes of classiﬁcation, regression, missing values handling, for feature selection, and as mechanisms of data approximation. It is impossible to quote all methods based on granular computing—we can only discuss a selected group of techniques. In the article, we have presented a review of recently developed granulation techniques belonging to the family of approximation algorithms founded by Polkowski—in the framework of rough set theory. Starting from the basic Polkowski’s standard granulation, we have described further developed by us concept dependent, layered, and epsilon variants, and our recent homogeneous granulation. We are presenting simple numerical examples and samples of research results. The effectiveness of these methods in terms of decision system size reduction and maintenance of the internal knowledge from the original data are presented. The reduction in the number of objects in our techniques while maintaining classiﬁcation efﬁciency reaches 90 percent—for standard granulation with usage of a kNN classiﬁer (we achieve similar efﬁciency for the concept-dependent technique for the Naive Bayes classiﬁer). The largest reduction achieved in the number of exhaustive set of rules at the efﬁciency level to the original data are 99 percent—it is for concept-dependent granulation. In homogeneous variants, the reduction is less than 60 percent, but the advantage of these techniques is that it is not necessary to look for optimal granulation parameters, which are selected dynamically. We also describe potential directions of development of granular computing techniques by prism of described methods.


Introduction
Granular computing is dedicated to work on data in the form of grouped, similar information vectors. The idea was introduced by Lotfi Zadeh [1,2]. Granulation is an integral part of the fuzzy set theory by the very definition of fuzzy set, where inverse values of fuzzy membership functions are the basic forms of granules. Shortly after Lotfi Zadeh proposed the idea of granular computing, the granules were introduced in terms of rough set theory [3] by T.Y. Lin, L. Polkowski, and A. Skowron. In this theory, granules are defined as classes of indiscernibility relations. Interesting research on more flexible granules based on blocks was conducted by (Grzymala-Busse) (see the LEM2 algorithm), and templates by (H.S. Nguyen), used in classification processes. The granules based on rough inclusions were introduced by (Polkowski and Skowron [4]), based on tolerance or similarity relations, and, more generally, binary relations by (T.Y. Lin [5], Y. Y. Yao [6][7][8]). In the context of rough mereology was proposed by (L. Polkowski and A. Skowron), in approximation spaces by (A. Skowron and J. Stepaniuk [9,10]), and finally in logic for approximate reasoning by (L. Polkowski, M. Semeniuk-Polkowska [11], and Qing Liu [12]). Of course, many other authors are conducting considerations on groups of similar objects, which is simply the most natural way of modeling problems; it is impossible to name them all. Let us quote a few very interesting works on various research topics from recent years on granular computing [13][14][15][16][17][18]. Additionally, interesting research on the field of granular computation with the use of neural network techniques can be found in the works [19][20][21].
We have developed our methods in terms of granular rough computing paradigm-the internal part of rough sets theory [3]. The computations are based on granules, the groups of objects collected together by fixed similarity measure or metrics. Theoretical background and the framework of discussed methods were proposed by Polkowski in [22][23][24]-the idea of data approximation using rough inclusions. The basic idea was to create the r-indiscernible groups of objects (objects indiscernible in fixed degree) around each training sample, cover the original training decision system using selected granules and create the granular reflection of training data using granules from the covering in the final step. This particular technique is called standard granulation and was proposed in [24]. The initial work was extended later in many variants and contexts-see [25,26], Polkowski [27,28], Polkowski and Artiemjew [29,30]. These methods, among others, have found application in classification processes [31], data approximations [30], missing values absorbtion [26,29], and, in the recent work, these were used as a key component of the new Ensemble model-see [32].
In the review, we are focusing on decision system size reduction and maintaining the internal knowledge at the same time. Despite the fact that the granulation of the decision systems in a pessimistic case has a square complexity, it is possible to apply classical techniques of transferring methods to big data for the purpose mentioned. In the article, we have described standard granulation [24], concept-dependent [25], layered [25] and homogeneous granulation [33]-designed for symbolic data, and exemplary variants developed for numerical one-with descriptors indiscernibility ratio-epsilon granulation [33,34].
The rest of the paper has the following content. In Section 2, there is a detailed description of granulation techniques with toy examples. In Section 3, we present the experimental part for a kNN classifier. In Section 4, we have additional results for the SVM and Naive Bayes classifier. In Section 5, we write about possible future developments of these techniques, and we conclude the paper in Section 6.

Granulation Techniques
Our methods are based on rough inclusions. Introduction to rough inclusions in the framework of rough mereology is available in Polkowski [22,35]; a detailed, extensive discussion can be found in Polkowski [23]. We refer the reader for a very precise theoretical introduction, but, in the paper, we include the details that allow for understanding its content. In Polkowski's granulation procedure, we can distinguish three basic steps.

First Step-Granulation
We begin with computation of granules around each training object using a selected method.

Second
Step-The Process of Covering The training decision system is covered by selected granules.

Third
Step-Building the Granular Reflections The granular reflection of original training decision system is derived from the granules selected in step 2.
We start with detailed description of the basic method-see [24].

Standard Granulation
Let us consider the decision system (U, A, d), where U is the universe of objects, A the set of conditional attributes, d ∈ A is the decision attribute, and r gran granulation radius from the set {0, 1 |A| , 2 |A| , ..., 1}. The standard rough inclusion µ, for u, v ∈ U and for selected r gran is defined as For each object u ∈ U, and selected r gran , we compute the standard granule g r gran (u) as follows: In the next step, we use a selected strategy to cover the training decision system U by computed granules-the random choice is the simplest among the most effective studied in [30]). All studied methods are available in [30] (pages 105-220).
In addition, in the last step, granular reflection of training decision set is computed with the use of the Majority Voting procedure. The ties are resolved randomly. In the next section, we show the toy example of the method. To present toy examples, we used the same system from Table 1. In case of r gran = 0, each single granule is equal U because objects are treated as indiscernible even if they are completely different. In addition, we expected only one object as the granular reflection of the training data.
The second boundary case is r gran = 1; each granule contains only their central object or duplicates because the objects are indiscernible. Now, allow us to show how the standard granulation works for radius r gran = 0.5.
In the next step, we have chosen the random granules to cover the universe of training objects from the Table 1. Our choice is the set.
The U is covered, when, in the set of chosen granules, each object of U appears at least once. The granular reflection of the set Table 1 for the radius 0.5 is in Table 3. Random coverage of training systems is as follows, The granular reflection is created by application of majority voting inside selected granules. Ties are resolved randomly.

Concept Dependent Granulation
A concept-dependent (cd) granule g cd r gran (u) of the radius r gran about u is defined as follows: v ∈ g cd r gran (u) if and only if µ(v, u, r gran ) and (d(

Toy Example
For the decision system from Table 1, we have found concept-dependent granules. For the granulation radius r gran = 0.25, the granular concept-dependent indiscernibility matrix (gcdm) -see Table 4 -is Table 4. Triangular indiscernibility matrix for concept-dependent granule generation (i < j), derived from Table 1.
u 1 u 2 u 3 u 4 u 5 u 6 u 7 u 8 u 9 u 10 u 11 u 12 u 13 u 14 hence, the granules in this case are Assuming that g r gran (u i ) = {u j ∈ U trn : a(u i ) = a(u j )}, U trn is the universe of training objects, and |X| is the cardinality of set The sample concept-dependent granules with a 0.25 radius, derived from decision systems from Table 1  The concept-dependent granular reflection of decision system from Table 1 is in Table 5.

Homogeneous Granulation
The homogeneous granules are defined based on standard and concept dependent granules previously defined, f or minimal r gran f ul f ills the equation

Toy Example
Consider the training decision system from Table 1.
Homogeneous granules for all training objects: Randomly selected coverage granules, The granular decision system from the above granules is in Table 6. Table 6. Homogeneous granular decision system formed from covering granules.

Day
Outlook Temperature Humidity Wind Play Golf

Layered Granulation
Layered granulation leads to a sequence of granular reflections of decreasing sizes, which stabilizes after a finite number of steps; usually, about five steps are sufficient. Another development that may be stressed here is the heuristic rule for finding the optimal granulation radius giving the highest accuracy.
the optimal granulation radius is located around the value which yields the maximal decrease in size of the granular reflection between the first and the second granulation layers-see [30].

Toy Example
Exemplary multiple granulation of Quinlan's data set [36], see Table 1, for the granulation radius of 0.5 and layers l 0 , l 1 , ... runs as follows.
Covering process of U l 0 with usage of order-preserving strategy yields us the covering: The granular reflection of (U l 0 , A, d) based on granules from U l 0 ,Cover , with use of Majority Voting, where ties are resolved according to the ordering of granules are shown in Table 7.

Day
Outlook Temperature Humidity Wind Play Golf Exemplary granular reflection formation based on Majority Voting looks as follows. In case, e.g., of the granule g cd 0.5,l 1 (u 1 ), we have

= Sunny Hot High Weak
Treating all other granules in the same way, we obtain the granular reflection (U l 1 , A, d) shown in Table 7.

Epsilon Variants
These methods are designed for numerical data; we can use, for instance, ε-normalized Hamming metric, which, for given ε, is defined as where abs is absolute value, The methods work analogously to variants for symbolic data; thus, we show only exemplary definition without toy examples.

ε-Modification of the Standard Rough Inclusion
Given a parameter ε valued in the unit interval [0, 1], we define the set and we set Epsilon variant of homogeneous granulation can be defined as follows.

Epsilon Homogeneous Granulation
The method is defined in the following way: where max a , min a are the maximal and minimal attribute values for a ∈ A in the original data set.

A Sample of the Experimental Work Results
In this section, we show the exemplary results for our selected techniques, to show its effectiveness in the context of reducing training data size. For the sake of simplicity, we have chosen the k-NN classifier as a base. We carried out experiments on selected data from the UCI repository [37] -see Table 10. In Tables 11-20 and Figure 1, we have the results for Cross Validation 5 method.
Let us move on to the discussion of selected detailed results, starting from description of the results for the Australian Credit data set. The result for Standard (SG) and Concept-dependent granulation (CDG) is in Table 11, where, in case of SG for radius 0.5, we have reduction in training size of around 90 percent preserving classification accuracy in the range of 84.7 percent. For the CDG variant, we have reduction in training size of about 99.5 percent for radius 0.071, where the exhaustive rule set is reduced in 99.9 percent and accuracy of classification is around 77 percent. The results are comparable, but the concept-dependent variant shows a more stable classification as the radius increases. In case of Homogeneous granulation, see Table 15, we have accuracy equal to 0.835 with a 48 percent reduction of training size. The sample of results for exemplary epsilon variant-ε Homogeneous Granulation-is in Table 20, where we have reduction in training size about 50 percent, with accuracy of 0.842. The layered granulation process is visible in Table 16, where the basic method is concept-dependent granulation and the result is similar to a single concept-dependent variant. In the case of Car data set, see Table 12, the concept-dependent variant works best giving accuracy of 0.864, with a reduction in training size of around 73 percent. For a Hepatitis data set, concept-dependent also works best, for radius 0.474, the accuracy is equal to 0.875, with a 90 percent reduction in training size. In addition, finally, the spectacular result is obtained for Heart Disease data set, where with 99 percent reduction in training size, we have obtained for concept-dependent and standard granulation the accuracy 0.8. The results for homogeneous variants are shown in Tables 15 and 20. The best result we have achieved on the tested data are a reduction of 62 percent in the number of objects with full classification efficiency. Allow us to summarize the results obtained in this section. The internal knowledge from the original training decision systems-measured by ability for classification-seems to be preserved in each mentioned case (the accuracy of classification is fully comparable with nil case, without reduction). Both techniques, standard granulation and concept-dependent, prove to be comparable. In the concept-dependent variant, we observe a higher classification stability with an increasing radius. Another advantage of the concept-dependent variant is the creation of granular reflection, which from the smallest radii contain patterns from all decision-making classes. The multiple variant does not produce spectacular results, but, according to our previous research, see [30]-it allows us to look for the optimal granulation radii. Our research shows that the radius for which the reduction of objects between the first and second layer is greatest is close to the optimal one in most tested systems. In this way, the optimum granulation radius can be estimated without classification tests. The last group of tested techniques are recently discovered homogeneous methods, which work dynamically on every data and do not require estimation of optimal parameters. It is obvious that the effectiveness of our methods depends to a large extent on the data under investigation.
We do not plan to present an overview of the effectiveness of the whole range of classification techniques because our aim was to present an example of the effectiveness of approximation methods for decision-making systems. Let us move on to presenting additional test results for selected previously used classifiers.  Table 20. Exemplary result for Epsilon Homogeneous Granulation (ε − HGS)-5 times Cross Validation 5; k − NN classifier; D 1 = Australian-credit, D 3 = Heartdisease, D 4 = Hepatitis data set; Acc = average accuracy of classification, HGS_size = granular decision system size, TRN_size = training set size, HGS_TRN_red = reduction in object number in training set, HG_r_range = spectrum of radii.

Application of Selected Other Classifiers on Granular Data
In our previous research, we checked the performance of the tens of classifiers; each variant examined matched well with the granular data. Some of the most interesting results were obtained for the Naive Bayes classifier (see the results in Chapter 7 of [30]), the SVM technique [38], and Deep Learning [39]. Examples of results are presented in this section.
In Figure 2, we have the accuracy of the classification of the granular data using the SVM method with an RBF kernel. We use the ε concept-dependent granulation-see Section 2.5. It is the result for Wisconsin Diagnostic Breast Cancer data set (see [37]) 569 objects and 32 attributes. Analyzing Figures 2 and 3, we see that the level of accuracy of the classification is reasonable with a considerable percentage of the size reduction of granular systems.  Considering four variants of classification for the Naive Bayes classifier (for which the parameters determining the classification are as follows): The results showing the effectiveness of the Naive Bayes classifier can be found in Tables 21-24 (the details can be found in [30]). The most spectacular approximation is for the 0.428571 radius, where, with an Australian credit data set, accuracy of classification is 0.852, and the average number of objects is reduced by about 94 percent. Table 21. 5 x CV-5; The result of experiments for four variants of the Naive Bayes classifier; data set Australian credit; concept dependent granulation; r gran = Granulation radius; nil = result for data without missing values; Acc = Accuracy of classification; GranSize = The size of data set after granulation in the fixed r.

Acc
GranSize  Table 22. 5 x CV-5; The result of experiments for four variants of the Naive Bayes classifier; data set Car evaluation; concept dependent granulation; r gran = Granulation radius; nil = result for data without missing values; Acc = Accuracy of classification; GranSize = The size of data set after granulation in the fixed r.  Table 23. 5 x CV-5; The result of experiments for four variants of the Naive Bayes classifier; data set Heart disease; Concept dependent granulation; r gran = Granulation radius; nil = result for data without missing values; Acc = Accuracy of classification; GranSize = The size of data set after granulation in the fixed r.  Table 24. 5 x CV-5; The result of experiments for four variants of the Naive Bayes classifier; data set Hepatitis; Concept dependent granulation; r gran = Granulation radius; nil = result for data without missing values; Acc = Accuracy of classification; GranSize = The size of data set after granulation in the fixed r. In Table 25, we have presented an example of the result of a deep neural network on the granulated data-see [39]. It turns out that it learns the internal knowledge of decision-making systems and maintains a high level of classification effectiveness. In Table 25 and Figure 4, we have the result for Australian Credit data set, for radius 0.66, with a reduction of 40 percent, and classification efficiency is around 84 percent.  The additional experimental results presented were to show that our granular techniques are compatible with various classification methods. In the next section, we discuss the potential directions of development of granular computing methods, through the prism of the possibilities of our own methods.

Future Directions in Granular Computing Paradigm
Granular computing techniques will undoubtedly play a key role in building artificial intelligence because intelligent handling of data are based on analyzing its similarity and abstracting from the vast amount of information available in the environment. One of the problems to be solved is the ability to use real-time granular computing techniques on large data. The only barrier against using these methods is scalability problem. To deal with possible scalability problems, the following methods can be considered: Data sampling method and creation of model based on samples; Decomposition method, to use the algorithms on the split data and work on them separately; the streaming computing method, incremental data processing; the mass parallel computing technique on the computer cluster, with the use of classic ways to compute in parallel, like MPI implementation (Message Passing Interface); and mass parallel computing methods based on future technologies like quantum calculations. Without a doubt, deep neural networks is one of the promising fields for using granular computing. New methods of data preprocessing can be expected to emerge, before feeding it into deep neural networks. In particular, we mean the use of granular computing in the convolutionary and pooling part of the convolutional neural networks. The granular structures of the granular computing paradigm can intuitively be used to build such new network architectures at a time when we have no clear limit of creating neural network structures. Modeling the world using granular computing is a very natural process for us, which will undoubtedly play a crucial role in the development of future technologies.

Conclusions
In this work, we offer a review of selected recently developed granular computing techniques dedicated to the approximation of decision systems (from the family of methods proposed by Polkowski in [22,24]). That is, techniques which, among other things, aim at reducing the size of data while maintaining their classification efficiency. A very important family of techniques is dedicated to speeding up decision-making processes. Our approximation techniques reduce the size of decision systems significantly maintaining the internal knowledge at the same time, which was proven in many experimental works. In our research, the main problem for standard, concept dependent, and layered methods is the need to estimate the optimal granulation radius searching among all possible ones. The problem has been partially solved for these methods-in the previous works, we have developed heuristics for searching optimal parameters by a double granulation technique (see [30]). In our last technique, homogeneous granulation, this problem does not apply because parameters are automatically set in the process of approximation. Our last method seems to be an important discovery, as it is immediately applicable, without the need to estimate the parameters, and it turns out to work very well in all the contexts we have studied. Particularly noteworthy is its application in the new technique of boosting classification-Ensemble of Random Granular Reflections [32]. To sum up our work, the presented granulation techniques allow for reducing the number of exhaustive set of rules by up to 99 percent while maintaining classification efficiency at the level obtained on the original unreduced data. Such efficiency was obtained, for example, for the concept-dependent technique using the kNN classifier. On the other hand, our methods achieve a reduction in the number of objects to more than 90 percent while maintaining classification efficiency on the original data. We have achieved such results, for example, for standard granulation with the kNN classification and concept-dependent granulation using the Naive Bayes classifier. As the closest directions of research on the development of our knowledge granulation methods, we can point out the work on hybrids with deep neural network learning and Random Forests technique. Another direction of work is the application in the process of convolution and pooling for the convolutionary neural networks and development of our proposed Ensemble model based on random granular reflections of decision systems. In conclusion of this review, we may add that, without any doubt, real-time granular computing methods will play an important role in creating artificial intelligence. Therefore, it is worthwhile to develop methods for the approximation of decision systems in order to invest in research into this prospective paradigm of knowledge.