Classiﬁcation with Fuzziﬁcation Optimization Combining Fuzzy Information Systems and Type-2 Fuzzy Inference

: In this research, we introduce a classiﬁcation procedure based on rule induction and fuzzy reasoning. The classiﬁer generalizes attribute information to handle uncertainty, which often occurs in real data. To induce fuzzy rules, we deﬁne the corresponding fuzzy information system. A transformation of the derived rules into interval type-2 fuzzy rules is provided as well. The fuzziﬁcation applied is optimized with respect to the footprint of uncertainty of the corresponding type-2 fuzzy sets. The classiﬁcation process is related to a Mamdani type fuzzy inference. The method proposed was evaluated by the F-score measure on benchmark data.


Introduction
Nowadays, machine learning and its applications are growing rapidly. Classification is one of the major machine learning problems. As there are various kinds of data, the need for robust techniques has become essential, especially in real-world applications [1,2]. What is more, often the data are not well-defined, and are vague or imbalanced, which is an additional obstacle. Therefore, fuzzy techniques became the right solution. We may divide these techniques into two primary groups: those related to the classical type-1 fuzzy sets, introduced by Zadeh [3] and those related to new concepts concerning type-2 fuzzy sets [4,5]. In practice, the interval type-2 fuzzy sets [6], a particular case of type-2 fuzzy sets, are commonly used for their reduced computational cost and easy implementation. What is more, researchers have proved that interval type-2 fuzzy concepts are better able to handle uncertainties than type-1 fuzzy approaches [6][7][8][9][10][11]. Interval type-2 fuzzy sets are very effective, as they provide better generalisations when it is difficult to determine the exact membership functions of the fuzzy sets applied.
On the other hand, to deal with information issues, data discovery techniques, assumed as the computational process of discovering patterns in data, are widely used to induce knowledge. Information systems and rough sets, introduced by Pawlak [12][13][14], are applied to represent knowledge. Therefore, it seems that the combination of fuzzy techniques and data discovery is very appropriate for the analysis of complex data such as medical data [15,16].
That is how the concept of fuzzy information systems came out. If each attribute of an information system is related to a fuzzy set, a fuzzy information system is defined [17]. Fuzzy information systems find their applications in the field of decision-making [18,19]. In recent research, a novel multi-criteria approach with application in investments was proposed [20]. Many other applications also can be found concerning rule extraction [19,21], feature selection and classification [22][23][24][25]. Mathematical properties of fuzzy information systems are under consideration as well [17,26]. The concept of rough sets was extended by analyzing the properties of lower and upper approximations of fuzzy sets [27][28][29][30][31] with

Methods
This section explains the preliminaries of the type-2 Mamdani fuzzy model [4,40] and information systems rule induction procedure [41,42]. Next, we introduce our proposal of combination information systems with type-2 fuzzy inference to introduce a classification procedure.

Fuzzy Sets and Interval Type-2 Fuzzy Sets
A fuzzy set or type-1 fuzzy set F consists of a domain X of real numbers together with a function µ F : X → [0, 1], [3] i.e.,: Here, the integral denotes the collection of all points x ∈ X with associated membership grade µ F (x) ∈ [0, 1]. The function µ F is also known as the membership function of the fuzzy set F, as its value represents the grade of membership of the elements of X to the fuzzy set F. The idea is to use membership functions to describe imprecise or vague information.
Appl. Sci. 2021, 11, 3484 3 of 18 By expanding this concept with the assumption that membership function values can be fuzzified themselves, the idea of type-2 fuzzy sets was introduced. A type-2 fuzzy set, denoted as F, is defined as follows [4]: where denotes union over all admissible x and u. Interval type-2 (IT2) fuzzy sets [4], a special case of type-2 fuzzy sets, are the most widely used because of their acceptable computational complexity and easy interpretation. Uncertainty about F is conveyed by the so-called footprint of uncertainty (FOU) of F: The size of an FOU (the corresponding surface) is directly related to the uncertainty that is conveyed by an interval type-2 fuzzy set, and what follows, an FOU with more area is more uncertain than the one with less area. The upper membership function and lower membership function of F are two type-1 membership functions F and F that bound the FOU, which might be used to describe J x , i.e.,: which leads to the following:  of X to the fuzzy set F. The idea is to use membership functions to describe imprecise or vague information. By expanding this concept with the assumption that membership function values can be fuzzified themselves, the idea of type-2 fuzzy sets was introduced. A type-2 fuzzy set, denoted as F , is defined as follows [4]: where ∬ denotes union over all admissible x and u. Interval type-2 (IT2) fuzzy sets [4], a special case of type-2 fuzzy sets, are the most widely used because of their acceptable computational complexity and easy interpretation. Uncertainty about F is conveyed by the so-called footprint of uncertainty (FOU) of F : The size of an FOU (the corresponding surface) is directly related to the uncertainty that is conveyed by an interval type-2 fuzzy set, and what follows, an FOU with more area is more uncertain than the one with less area. The upper membership function and lower membership function of F are two type-1 membership functions F and F that bound the FOU, which might be used to describe Jx, i.e.: which leads to the following:

Mamdani Type-2 Fuzzy System
In Figure 2, we show the information flow within an IT2 fuzzy system. It is very similar to its type-1 analogue. The major difference is that because of the IT2 fuzzy sets used in the rulebase, the outputs of the inference engine are IT2 fuzzy sets too. Therefore, a type reducer [6,43] must be applied to convert them into type-1 fuzzy sets to enable the defuzzification procedure.

Mamdani Type-2 Fuzzy System
In Figure 2, we show the information flow within an IT2 fuzzy system. It is very similar to its type-1 analogue. The major difference is that because of the IT2 fuzzy sets used in the rulebase, the outputs of the inference engine are IT2 fuzzy sets too. Therefore, a type reducer [6,43] must be applied to convert them into type-1 fuzzy sets to enable the defuzzification procedure. Below, we give a brief description of the basic steps of the computations in an IT2 fuzzy system [40]. Let consider the rule base of an IT2 fuzzy system consisting of N rules taking the following form: n n n n Figure 2. Information flow in a type-2 fuzzy system. Below, we give a brief description of the basic steps of the computations in an IT2 fuzzy system [40]. Let consider the rule base of an IT2 fuzzy system consisting of N rules taking the following form: where X n i (i = 1, . . . , I; n = 1, . . . , N) are IT2 fuzzy sets defined over corresponding domains, and Y n j (j = 1, . . . , M) is an IT2 fuzzy set, assuming a Mamdani type-2 system, which represents the corresponding rule conclusion. The logical operator 'o' is defined as a fuzzy conjunction or disjunction, i.e., o ∈ {⊗, ⊕}, where ⊗ and ⊕ are binary operators (t-norm and s-norm respectively) defined over [0, 1] (⊗, ⊕: [0, 1] 2 → [0, 1]). The t-norm operator provides the characterization of the AND fuzzy operator, while the s-norm provides the characterization of the OR fuzzy operator [44]. In our research, we applied the Zadeh's tand s-norms, which correspond to the min and max operators, respectively.
Assuming an input vector x = {x 1 , x 2 , x 3 , . . . , x i }, typical computations of an IT2 fuzzy system consist of the following steps: (1) Compute the membership intervals of x i for each X n i , [µ X n i (x i ), µ X n i (x i )], i = 1, . . . , I; n = 1, . . . , N (2) Compute the firing interval of the n-th rule: (3) Apply type reduction to combine F n (x) with corresponding rule consequents.
There are some methods of type reduction [6,45,46], but the most commonly used one is the center-of-set (COS) type reducer [6] using the Karnik-Mendel algorithms [6,43] or their variants [47,48]. As the whole procedure concerning Mamdani type-2 fuzzy system is well described in the literature, we will omit further details. For more clarity, in Figures 3 and 4, we show the idea of the introduced calculations.     Combined FOU for two sample rules with two interval type-2 (IT2) sets in the conclusions. The final system output is defined as (yl + yr)/2, where yl and yr are calculated with respect to L and R values, which can be determined using the Karnik-Mendel algorithm.

Rule Induction with Information Systems
An information system [12] is defined by the elements: The above approximations define a rough set [13,14]. We can consider an information system as a decision table if a decision attribute is introduced. With this assumption, a decision making approach was introduced by A. Skowron and Z. Suraj [41,42], presenting a rule induction process with respect to all considered decisions. The procedure consists of the following steps: Combined FOU for two sample rules with two interval type-2 (IT2) sets in the conclusions. The final system output is defined as (y l + y r )/2, where y l and y r are calculated with respect to L and R values, which can be determined using the Karnik-Mendel algorithm.

Rule Induction with Information Systems
An information system [12] is defined by the elements: (U, A, V, f ), where U is a universe, A is a set of attributes, V represents attributes domains: V = df a V a , with nonempty domain V a of the a-th attribute (a ∈ A) and f is the so called information function f : U × A → V, ∀ x∈U, a∈A f(u, a) ∈ V a . Important role in information systems plays the indiscernibility binary relation, which is an equivalence relation, defined over U: The above approximations define a rough set [13,14]. We can consider an information system as a decision table if a decision attribute is introduced. With this assumption, a decision making approach was introduced by A. Skowron and Z. Suraj [41,42], presenting a rule induction process with respect to all considered decisions. The procedure consists of the following steps:

1.
Introduce an information system with decision attribute, 2.
Eliminate information system inconsistency, i.e., objects with the same information function values, but different decision values, applying lower or upper approximation precision analysis, 3.
Apply rule induction and define rules for all considered decisions, which cover the decision problem.
Below, we explain the main algorithms applied in the procedure above with examples. The reduct generation is omitted, as any other attribute selection method can be applied.

Inconsistency Elimination
Let consider the following information system with decision attribute: (U, A ∪ {a*}, V ∪ V a* , f ). If inconsistency is found, then Algorithm 1, which uses the lower approximation precision as a decisive factor, should be applied. The lower approximation precision is defined as follows:

Algorithm 1 (inconsistency elimination algorithm):
Input: Inconsistent information system Output: Consistent information system Let x i and x j are casing inconsistency (x i , x j ∈ U) and let f(x i , a * ) = d 1 , f(x j , a * ) = d 2 , then: is defined as follows:

Algorithm 1 (inconsistency elimination algorithm): Input: Inconsistent information system
Output: Consistent information system Let xi and xj are casing inconsistency (xi, xj ∈ U) and let f x i , a * = d 1 , f x j , a * = d 2 , then:

2, 3, 4} with information function values represented in the matrix below.
In the above matrix, we can discover inconsistency as objects x2 and x4 have the same information function values with respect to B, but different decision values. Therefore, we can apply Algorithm 1: In the above matrix, we can discover inconsistency as objects x 2 and x 4 have the same information function values with respect to B, but different decision values. Therefore, we can apply Algorithm 1: Of course, it is only our assumption that different decision values for the same attribute values cause inconsistency. In the decision making process introduced, such decision values could be considered as a set of acceptable values. If that is the case, then the d B a * sets defined below wouldn't be singletons.

Rule Induction
The aim of this step is to induce rules for each decision regarding Algorithm 2 below.

Algorithm 2 (rule induction algorithm):
Input: Information system with decision attribute Output: Corresponding decision rules Step 1: Generate M k , k = 1, 2, . . . , |U| matrixes using the so-called discernibility matrix of the information system, denoted below as M(IS) and defined as: Step 2: Define the object implicants, Step 3: Define the set of decision rules.
Step 1: Let c ij are the elements of M(IS),ĉ ij are the elements of M k (with respect to k, k = 1, ..., n; n = |U|) and a* is the decision attribute, then: For each k = 1, ..., n: For simplicity, let consider the example below that illustrates the generation of matrix M 1 Example 2. Let assume an information system with decision attribute extended by the correspond- n; n = ⏐U⏐) and a* is the decision attribute, then: For each k = 1, ..., n: For simplicity, let consider the example below that illustrates the generation of matrix M1 Example 2. Let assume an information system with decision attribute extended by the correspond- The corresponding discernibility matrix takes the form: The corresponding discernibility matrix takes the form: The matrix M(IS) is a symmetric matrix. Next, we can apply Algorithm 2-step 1 and therefore, define the matrix M1 (k = 1): Step 2: The object implicants indicate which attributes and objects are strongly related. The first implicant is obtained from M1 applying the corresponding Boolean algebra reduction rules: Implicant1 (M1): x1  a. The rest of the implicants are presented below: Step 3: Finally, using the above implicants, we can generate the target set of decision rules regarding the decision attribute values. Each rule represents one decision and it is obtained as a sum of the object implicants related to that decision, i.e., concerning decision attribute value '0', we have: f(x1,a*) = f(x3,a*) = f(x5,a*) = 0, and therefore: The matrix M(IS) is a symmetric matrix. Next, we can apply Algorithm 2-step 1 and therefore, define the matrix M 1 (k = 1): The matrix M(IS) is a symmetric matrix. Next, we can apply Algorithm 2-step 1 and therefore, define the matrix M1 (k = 1): Step 2: The object implicants indicate which attributes and objects are strongly related. The first implicant is obtained from M1 applying the corresponding Boolean algebra reduction rules: Implicant1 (M1): x1  a. The rest of the implicants are presented below: Step 3: Finally, using the above implicants, we can generate the target set of decision rules regarding the decision attribute values. Each rule represents one decision and it is obtained as a sum of the object implicants related to that decision, i.e., concerning decision , and so: Step 2: The object implicants indicate which attributes and objects are strongly related. The first implicant is obtained from M 1 applying the corresponding Boolean algebra reduction rules: Implicant 1 (M 1 ): x 1 ⇒ a. The rest of the implicants are presented below: Step 3: Finally, using the above implicants, we can generate the target set of decision rules regarding the decision attribute values. Each rule represents one decision and it is obtained as a sum of the object implicants related to that decision, i.e., concerning decision attribute value '0', we have: f(x 1 ,a*) = f(x 3 ,a*) = f(x 5 ,a*) = 0, and therefore: by analogy: The main disadvantage of the above decision making approach, in terms of vague or imprecise data, is that the induced rules are crisp, i.e., Rule 2 has the following interpretation: if an object of x 2 type has an information function value for the attribute 'c' exactly equal to '0' then make the decision '1'. We solve this disadvantage by extending information systems to our assumption of a type-1 fuzzy information system.

Type-1 Fuzzy Information System
The major assumption of our research is the interpretation of a fuzzy information system. First, we define the values of the information function as type-1 fuzzy sets. This assumption solves the problem of crisp values by providing a generalisation. Therefore, we propose better data granulation and thus more robustness to vague or imprecise data. Next, by assuming Gaussian distribution for each attribute, we are able to define any information function value as one of the very basic fuzzy sets: low, medium, and high. Therefore, the value of any pair object-attribute is generalised by a fuzzy set. Below in Table 2, we give an illustration of a type-1 fuzzy information system with a decision attribute, according to our assumptions. %clearpage where: , a ∈ V a with expected value (a 0 ) and standard deviation (σ) (11) The fuzzy sets low and high attribute values are easy to define as well (see Figure 5): If a pair object i -attribute j has a value defined as low, for example, it means that: µ low (f(object i , attribute j )) ≥ max{µ medium (f(object i , attribute j )), µ high (f(object i , attribute j ))}.
It is important to note that our assumption was also the direct application of the original Skowron and Suraj algorithm [41,42] for rule induction. As a result, we did not have to modify the introduced indistinguishability relation (IND) to define fuzzy rough sets or modify the quality of the approximation introduced by Equation (8). We directly used the fuzzified values of the information function as if they were crisp values. This means that object i is related to object j with respect to IND in accordance with our model if the values of the corresponding attributes define the same fuzzy sets. For example, below we show the partition (P) of the set of objects {Object 1 , . . . , Object 5 } implied by IND for a sample information system given in Table 3. The corresponding partition with respect to IND has three equivalence classes: The above assumption made possible the use of classical algorithms introduced in information systems theory, while at the same time generalizing the information given by the attributes. The use of membership degrees inferred from corresponding membership functions is performed after the induction of rules and their transformation into fuzzy rules, in accordance with the assumptions of the fuzzy model used. Of course, such an assumption may be too strong a generalization. We solve this problem by introducing various possibilities of fuzzification. Defining additional fuzzy sets and transforming them into interval type-2 fuzzy sets with the corresponding optimization, depending on the examined dataset (see Table 4 below).
Next, we define the decision attribute values as fuzzy sets, which affect the degree of decision. Any induced rules from such an information system can be easily transformed into fuzzy rules for a type-1 Mamdani fuzzy system. For example, if we have pairs: (object 1 , attribute 1 ): low, (object 2 , attribute 1 ): high, (object 3 , attribute 2 ): medium and a rule which defines decision D as: (f (object 1 , attribute 1 ) ∧ f (object 2 , attribute 1 )) ∨ f (object 3 , attribute 2 ), then we can transform it into the fuzzy rule: If ((f (object 1 , attribute 1 ) is low) ⊗ (f (object 2 , attribute 1 ) is high)) ⊕ (f (object 3 , attribute 2 ) is medium) Then D.

Transformation into Type-2 Fuzzy Sets
To apply type-2 fuzzy sets in the fuzzy rule induction procedure described, we propose a simple extension. After induction of type-1 fuzzy rules, we modify the membership functions applied by changing the standard deviation of the 'medium' fuzzy set. By doing so, we can define the bounds of the FOU for the corresponding type-2 fuzzy set medium. This also makes the possibility to define low and high type-2 fuzzy sets as well. The idea is given in Figure 5.

Transformation into Type-2 Fuzzy Sets
To apply type-2 fuzzy sets in the fuzzy rule induction procedure described, we propose a simple extension. After induction of type-1 fuzzy rules, we modify the membership functions applied by changing the standard deviation of the 'medium' fuzzy set. By doing so, we can define the bounds of the FOU for the corresponding type-2 fuzzy set medium . This also makes the possibility to define low and high type-2 fuzzy sets as well. The idea is given in Figure 5. The above extension gives us the possibility to apply the type-2 Mamdani fuzzy inference procedure. In our research, we have applied the procedure proposed to solve classification problems.

Fuzzy Rules Optimization
In our experiments, the basic assumption was to analyse the influence of the shape and the FOU of the type-2 fuzzy sets used in the rule premises, derived from the corresponding fuzzy information system, on the classification accuracy. For this purpose, we have used the type-2 fuzzy sets shown in Table 4. Each of these type-2 membership functions was derived for every attribute of all benchmark data analyzed. Below, we explain the descriptions used in our research: 1. The number of Gaussian functions for fuzzification: {3, 5, 7, 9, 11}. The fuzzification procedure is as follows: generate the medium membership function and, next, the rest The above extension gives us the possibility to apply the type-2 Mamdani fuzzy inference procedure. In our research, we have applied the procedure proposed to solve classification problems.

Fuzzy Rules Optimization
In our experiments, the basic assumption was to analyse the influence of the shape and the FOU of the type-2 fuzzy sets used in the rule premises, derived from the corresponding fuzzy information system, on the classification accuracy. For this purpose, we have used the type-2 fuzzy sets shown in Table 4. Each of these type-2 membership functions was derived for every attribute of all benchmark data analyzed. Below, we explain the descriptions used in our research: 1.
The number of Gaussian functions for fuzzification: {3, 5, 7, 9, 11}. The fuzzification procedure is as follows: generate the medium membership function and, next, the rest of the membership functions based on it. All Gaussians are transformed into type-2 fuzzy sets by changing the standard deviation value (called sigma_offset parameter). The sigma_offset defines the FOU of the corresponding IT2 fuzzy sets. The same value was applied for each Gaussian function.

2.
If the standard deviations applied to the Gaussians are the same or not: {equal, progressive}-progressive assumes the concentration of membership functions around the mean or central values. For 3 Gaussians, equal and progressive, give the same fuzzification result.

3.
If the mean of the medium basic function is derived directly from the corresponding data set or if a fixed value is used: {mean, center}-center assumes Gaussian distribution with fixed mean 0.5.
For example, <3 Gausses, Equal, Mean> is considered as 3 Gaussian membership functions with equal standard deviation values used to define 3 type-2 fuzzy sets, and the medium function is defined by its corresponding mean value derived from the considered data. The <5 Gausses, Equal, Center> is considered as 5 Gaussian membership functions and the medium function is defined by a fixed mean value 0.5. The <7 Gausses, Progressive, Mean> is considered as 7 Gaussian membership functions with different standard deviation values, assuming concentration of membership functions around the mean value of the medium function. tions with equal standard deviation values used to define 3 type-2 fuzzy sets, and the medium function is defined by its corresponding mean value derived from the considered data. The <5 Gausses, Equal, Center> is considered as 5 Gaussian membership functions and the medium function is defined by a fixed mean value 0.5. The <7 Gausses, Progressive, Mean> is considered as 7 Gaussian membership functions with different standard deviation values, assuming concentration of membership functions around the mean value of the medium function. tions with equal standard deviation values used to define 3 type-2 fuzzy sets, and the medium function is defined by its corresponding mean value derived from the considered data. The <5 Gausses, Equal, Center> is considered as 5 Gaussian membership functions and the medium function is defined by a fixed mean value 0.5. The <7 Gausses, Progressive, Mean> is considered as 7 Gaussian membership functions with different standard deviation values, assuming concentration of membership functions around the mean value of the medium function. medium function is defined by its corresponding mean value derived from the considered data. The <5 Gausses, Equal, Center> is considered as 5 Gaussian membership functions and the medium function is defined by a fixed mean value 0.5. The <7 Gausses, Progressive, Mean> is considered as 7 Gaussian membership functions with different standard deviation values, assuming concentration of membership functions around the mean value of the medium function. medium function is defined by its corresponding mean value derived from the considered data. The <5 Gausses, Equal, Center> is considered as 5 Gaussian membership functions and the medium function is defined by a fixed mean value 0.5. The <7 Gausses, Progressive, Mean> is considered as 7 Gaussian membership functions with different standard deviation values, assuming concentration of membership functions around the mean value of the medium function. data. The <5 Gausses, Equal, Center> is considered as 5 Gaussian membership functions and the medium function is defined by a fixed mean value 0.5. The <7 Gausses, Progressive, Mean> is considered as 7 Gaussian membership functions with different standard deviation values, assuming concentration of membership functions around the mean value of the medium function. data. The <5 Gausses, Equal, Center> is considered as 5 Gaussian membership functions and the medium function is defined by a fixed mean value 0.5. The <7 Gausses, Progressive, Mean> is considered as 7 Gaussian membership functions with different standard deviation values, assuming concentration of membership functions around the mean value of the medium function. Mean> is considered as 7 Gaussian membership functions with different standard deviation values, assuming concentration of membership functions around the mean value of the medium function. Mean> is considered as 7 Gaussian membership functions with different standard deviation values, assuming concentration of membership functions around the mean value of the medium function.  For the type-2 sets used in the conclusions, we define s-shaped functions representing each class in the classification problem considered. We optimize the FOU of each function by changing the cross-points. We called the corresponding parameter under optimization center_offset-see Figure 6. The final classification decision value was fixed = 0.5, assuming normalized domain [0, 1]. Changes of the classification threshold did not show a positive effect on the experiment. For the type-2 sets used in the conclusions, we define s-shaped functions representing each class in the classification problem considered. We optimize the FOU of each function by changing the cross-points. We called the corresponding parameter under optimization center_offset-see Figure 6. The final classification decision value was fixed = 0.5, assuming normalized domain [0, 1]. Changes of the classification threshold did not show a positive effect on the experiment. For the type-2 sets used in the conclusions, we define s-shaped functions representing each class in the classification problem considered. We optimize the FOU of each function by changing the cross-points. We called the corresponding parameter under optimization center_offset-see Figure 6. The final classification decision value was fixed = 0.5, assuming normalized domain [0, 1]. Changes of the classification threshold did not show a positive effect on the experiment. For the type-2 sets used in the conclusions, we define s-shaped functions representing each class in the classification problem considered. We optimize the FOU of each function by changing the cross-points. We called the corresponding parameter under optimization center_offset-see Figure 6. The final classification decision value was fixed = 0.5, assuming normalized domain [0, 1]. Changes of the classification threshold did not show a positive effect on the experiment. For each benchmark data, we optimized the sigma_offset and center_offset parameters for the type-2 fuzzy sets considered, using the grid search optimization algorithm applied in k-folded cross-validation process with evaluation on a held-out validation set. The  Table 5. Additionally, for any data partition, we have used stratified sampling to ensure balanced validation and test sets. For better clarity, we show the experiment information flow in Figure 7.  For each benchmark data, we optimized the sigma_offset and center_offset parameters for the type-2 fuzzy sets considered, using the grid search optimization algorithm applied in k-folded cross-validation process with evaluation on a held-out validation set. The sigma_offset takes values in the range [0.01, 0.03] with step 0.005 and the center_offset takes values in the range [0.01, 0.21] with step 0.02. The k value was chosen for any data set assuming a fair number of samples in the corresponding validation sets, presented in Table  5. Additionally, for any data partition, we have used stratified sampling to ensure balanced validation and test sets. For better clarity, we show the experiment information flow in Figure 7.

Results
In Table 6 we present the results achieved in our experiments. The number of rules generated for each data set is constant and equal to the cardinality of the decision attribute. In the case of binary classification, two fuzzy rules will be generated, but with very extensive and complex rule premises.

Results
In Table 6 we present the results achieved in our experiments. The number of rules generated for each data set is constant and equal to the cardinality of the decision attribute. In the case of binary classification, two fuzzy rules will be generated, but with very extensive and complex rule premises. We achieved quite high classification results for seven data sets, which proves the purposefulness of the methodology used.
Below in Table 7, we show a comparison regarding high performing classifiers for some of the data sets with respect to the F-score measure. As we can see above, we achieved quite good results for the datasets data banknote authentication and HTRU 2, despite the very general fuzzy approach proposed.
Classification accuracy comparison with fuzzy classification techniques for chosen data sets is shown in Table 8 as well.

Discussion
We distinguish several important conclusions and advantages of our research. First of all, it is possible to aggregate information by interpreting attributes describing objects as fuzzy sets, using the advantages of fuzzy logic for flexible modelling of real data. This enables a large generalization of information, which positively affects unbalanced or inaccurately defined data. Additionally, the introduced interpretation of fuzzy information systems allows direct work on fuzzy sets and the induction of rules that can be easily changed into type-2 fuzzy rules. By appropriate fuzzification, the values of the information function can be derived directly from the data. Removal of inconsistencies, which may appear because of the information generalization applied, is performed by appropriate use of rough sets, i.e., by analyzing the quality of the corresponding approximations. The fuzzification applied is very general. It allows even better adjustment to special cases by optimization of the fuzzy sets in the premises and the conclusions of the rules applied. In this research, we have used only one of the many possible fuzzifications of the data presented. Applying fuzzification adapted to a specific problem, after an in-depth analysis, we should be able to adjust the fuzzy sets in the premises and in the conclusions of the rules to achieve the best possible result. The methodology we introduced in our work is a general approach, aiming to show the advantage of fuzzy inference using a simple and homogeneous model. Despite that our algorithm was not strictly adjusted to any data set, we managed to obtain good results for some of them. It shows practical potential and creates the basis for generating robust classifiers.

Conclusions
In this research, we proposed a new fuzzy classification method. Our concept assumes the induction of fuzzy rules from the corresponding fuzzy information system. The applied fuzzification of data is done in a very simple manner by Gaussian type membership functions. Moreover, we are able to extend the induced rules into interval type-2 fuzzy rules. The experiments provided with benchmark data proved the method usefulness. Additionally, by further optimization of the interval type-2 fuzzy sets applied, we were able to improve our classification results. More suitable fuzzification, adjusted to a specific data set, is possible as well. Our concept should be considered as a general approach for unbalanced or inaccurately defined data.