3.1. Rough Set Attribute Reduction
The central notion of RSAR (rough set attribute reduction) is indiscernibility. Assume there is an information system , where is a non-empty finite subset of objects (the universe), is a non-empty finite set of attributes such as for each , where is the value set for attribute . where is the set of condition attributes and is the set of decision attributes.
Definition 1 (Indiscernibility). For any , there is an associated equivalence relation , if , then and are indiscernible from attributes of . The equivalence classes of the -indiscernibility relation are denoted as .
Definition 2 (Lower Approximation). The lower approximation of is defined as .
Definition 3 (Positive Region). and are equivalence relations over , the positive region is defined as . The positive region contains the objects of which can be divided into class, using the knowledge of attribute .
Definition 4 (Dependency). An important work of data analysis is finding the dependency of attributes. If all values of a set of attributes depend uniquely on another set of attributes , or there is a functional dependency between values of and , then depends totally on , denoted . Dependency can be defined as: for , depends on with degree , defined , , if , depends totally on . If , depends partially on with degree . If , does not depend on .
A basic idea is to calculate the dependencies of all the possible subsets of
, any subset with
is a reduct. The smallest subset is the minimum reduct. However, this idea is impossible for large datasets. Algorithm 1 is called QUICKREDUCT Algorithm [
20]; we do not need to calculate all the possible subsets. Start with an empty set and add attributes to the set in proper order to obtain the maximum increase of dependency until we get the maximum possible number (usually equal to 1). We need to pay attention to that the algorithm may not get the minimum set of reduct necessarily every time. In the worst situation, the complexity reaches
for attribute dimensionality of
.
Algorithm 1: RSAR QUICKREDUCT |
Input:{empty set}, {set of all features}, |
Output: |
1: do |
2: |
3: |
4: |
5: |
6: |
7: until |
8: return R |
3.2. Membership Function
American cybernetician L.A. Zadeh created fuzzy sets theory with a groundbreaking paper in 1965 [
16]. Fuzzy set is the extension of classical set, which contains more general and various mathematical concepts, forming a new fuzzy mathematics discipline.
Assuming is the universe, is a subset of and can be represented by characteristic functions such as map . For fuzzy subset of , any , is not absolutely belonging to nor absolutely not belonging to . The degree to which belongs to can be represented by the value in .
Definition 5 (Membership Function). Assuming is the universe, the map is called a fuzzy subset of F set for short. Map is called membership function of F set about , is called degree of membership of about .
The generation of the fuzzy membership function is fundamentally important. It is important to find a proper membership function. A basic method of constructing membership functions is the reference function; there are some commonly used membership functions, choosing proper parameters to get the membership functions which we need, such as triangular membership functions:
Trapezoidal membership function:
Gaussian membership function:
Membership function can be generated from available data. Many methods can be used, such as the histogram method, the transformation of the probability distribution to the probability distribution, clustering, and neural network [
32,
33,
34,
35,
36]. We need an effective membership function generation mechanism to make full use of fuzzy theory and it must have the following advantages [
37]:
Accuracy. Membership function should reflect the knowledge contained in the data accurately.
Flexibility. This method can provide a broad family of membership function.
Computability. The method should be feasible in calculation so that it has practical application value. The literature [
38] emphasizes the importance of easy optimization and adjustment for membership functions.
Ease of use. Once the membership function is determined, for any given , the corresponding can be found easily.
In this paper, we use clustering to find membership functions. The fuzzy c-mean clustering (FCM) method is used to generate the fuzzy membership function during clustering. During the clustering process of FCM, the fuzzy membership function is generated.
3.3. Fuzzy Rough Feature Selection
The RSAR we mentioned only applies on the discrete datasets, but in real life, datasets usually contain real values and noises. Using fuzzy set theory, we can deal with this complex situation. Fuzzy mathematics is a mathematical theory in which studies utilize fuzzy phenomena. The core of fuzzy mathematics is the fuzzy set, which is different from the classical set. It has no definite elements and can only be mastered by membership function.
The intersection, union, and complement operations of fuzzy sets are similar to those of classical sets, but in some cases, the general operator may fail. The selection of the operator should be analyzed in detail. T norm and s norm can be thought of as generalized operations, but they are not actually. This question remains uncertain.
Definition 6 (t-norm). A triangular norm or shortly t-norm better reflects the nature of the logic operator . T-norm is a binary function on , which satisfies the exchange law, associative law, monotonicity, and the boundary condition. That is to say : .
There are some frequently used t-norms :
The standard min operator
The algebraic product
Lukasiewicz t-norm .
Definition 7 (s-norm). S-norm is also called a triangular conorm and shortly t-conorm. S-norm is a binary function on , which satisfies the exchange law, associative law, monotonicity, and the boundary condition. That is to say : .
Three well known s-norms are:
The standard max operator
The probabilistic sum
The bounded sum .
The fuzzy rule can be expressed as “if then ”, shortly . and are fuzzy sets, the true degrees are expressed as , . The true degree of is expressed as ; the degree of the proposition depends on the true degree of the former and the latter.
Definition 8 (Fuzzy Implicator). Find a binary function to represent the fuzzy rules properly. Meet left monotone (or right monotone)
.
The frequently used implicators are:
Mamdani implicator:
Zadeh implicator:
Kleene–Dienes implicator:
Lukasiewicz implicator: .
Fuzzy upper approximation and lower approximation are defined with the fuzzy division of input [
39] or with the following definition:
where
is the fuzzy implicator and
represents the fuzzy similar relation in the subset of feature set
.
where
represents the degree of similarity of
and
about feature
.
The fuzzy positive region and dependency are defined as in rough set [
40].
An example with the dataset in
Table 1 follows. There are six objects, features (condition attribute)
and
, label (decision attribute)
. Assuming
, using Equation (6), we can obtain the fuzzy similarity matrixes as follows:
Others are the same as object 3.
Thus, the positive regions of every object are:
The dependency of feature
:
Using the same algorithm, we can calculate the , , in the first circulation we choose feature . Then, in the second circulation we obtain , and in the end, we choose the feature subset .