Abstract
Attribute reduction is a critical topic in the field of rough set theory. Currently, to further enhance the stability of the derived reduct, various attribute selectors are designed based on the framework of ensemble selectors. Nevertheless, it must be pointed out that some limitations are concealed in these selectors: (1) rely heavily on the distribution of samples; (2) rely heavily on the optimal attribute. To generate the reduct with higher stability, a novel beam-influenced selector (BIS) is designed based on the strategies of random partition and beam. The scientific novelty of our selector can be divided into two aspects: (1) randomly partition samples without considering the distribution of samples; (2) beam-based selections of features can save the selector from the dependency of the optimal attribute. Comprehensive experiments using 16 UCI data sets show the following: (1) the stability of the derived reducts may be significantly enhanced by using our selector; (2) the reducts generated based on the proposed selector can provide competent performance in classification tasks.
1. Introduction
In the era of big data and artificial intelligence, the scale of data is growing massively [1]. Without loss of generality, high dimensionality has become one of the crucial characteristics of modern data [2]. For data analyses related to practical applications [3,4], such a characteristic may bring us huge challenges, which can be divided into the following two phases: (1) in the process of data production, redundant or irrelevant dimensions may lead to the low quality of data; (2) the storage and processing of data are facing tremendous difficulties because of the interference influenced by the numerous dimensions. Therefore, how to reasonably reduce the dimensions of data has become an urgent problem to be addressed.
As one of the effective technologies for realizing dimension reduction, feature selection [5,6,7,8] has been widely explored. Different from some other dimension reduction technologies, such as feature extraction, the aim of feature selection is to determine some informative features and then select them from the original features. Through using feature selection, it is expected that the selected features will provide better readability and interpretability [9,10] for the learning models. Presently, with respect to diverse requirements, various approaches with respect to feature selection are developed. Among most of the popular results, it is noteworthy that rough-set-based feature selection has been paid much attention. The reasons can be attributed to the following superiorities: (1) clear semantic explanation for such a technique is useful in terminating the process of feature selection; (2) the extensibility of rough set theory is extremely important because it offers us rich measurements for evaluating the significance of features.
Concerning rough-set-based feature selection, the output is frequently referred to as the reduct because such a feature selection is firstly named as attribute reduction by Pawlak [11] in the theory of rough set [11,12,13,14]. Through reviewing numerous results with respect to attribute reduction, it can be observed that most of the research is motivated by either the time efficiency of seeking the reduct or the generalization ability of the obtained reduct [15,16,17,18]. Nevertheless, to the best of our knowledge, few of the studies take the data perturbation into account. In other words, the stability of the obtained reduct [19,20] is seldom considered.
The stability of the reduct is defined as “sensitivity of reducts generated by an algorithm to the differences of training sets drawn from same generation distribution”. In simple terms, the reduct stability reflects the varying degree of reducts when data perturbation happens. The reduct with higher stability will bring us stable learning results, and it can enhance the confidence of domain experts when experimentally validating the selected attributes to interpret important discoveries. It is obvious that stability is one important metric for evaluating the performance of derived reducts. Therefore, the main research problem of this study is how to generate the reduct with higher stability when data perturbation happens.
In view of searching for a stable reduct, Yang and Yao [21] have firstly designed a naive ensemble selector. Though the framework of such a selector is rough, the multiple perspectives [22,23] for determining attributes with better adaptability have been preliminarily reported. Immediately, the reduct generated by using an ensemble selector may be more robust than those obtained by previous research.
However, through a critical review of the results related to ensemble selectors, some limitations may be easily revealed: (1) such a selector relies heavily on the distribution of samples; (2) such a selector also relies heavily on optimal attributes of each perspective. The former indicates that the multiple perspectives in the ensemble selector are constructed based on the distribution of samples. From this point of view, the critical requirement of following the raw distribution may greatly limit the applications of ensemble selectors in complex data. The latter implies the determination of the best attributes for each iteration in the mechanism of an ensemble selector. Therefore, the locally optimal solution instead of the globally optimal solution may be derived.
By considering the above limitations and the requirement of deriving a stable reduct, a beam-influenced attribute selector will be designed in the context of this paper. Such a selector contains two main keys: (1) randomly partition the samples [24], it can be regarded as the source of multiple beams; (2) add two or more potential attributes into each beam [25]. Therefore, different from the conventional ensemble selector, our selector is neither required to preserve the raw distribution of samples because the random partition over samples is employed nor asked to determine one and only one satisfactory attribute in each iteration because the beam is used to record multiple candidate attributes. Immediately, it is obvious that our beam-influenced attribute selector presents two different perspectives of the ensemble, one is from the perspective of the sample and the other is from the perspective of attributes. Therefore, more bases for realizing ensemble selection can be obtained and then the attribute with stronger adaptability can be determined. This is the inherent reason that our beam-influenced attribute selector may output a stable reduct.
The contribution of this study can be divided into the three phases as follows. Firstly, through analyzing the framework of an ensemble selector, some limitations of this framework are revealed. These limitations can be summarized into two aspects: (1) rely heavily on the distribution of samples; (2) rely heavily on the selection of optimal features. Secondly, to overcome these limitations in the ensemble selector, a novel beam-influenced selector (BIS) is developed. Finally, through the observation and analysis of extensive experimental results, it is verified that our selector can be effectively used to generate stable reducts, and these reducts generally possess competent classification performance.
The remainder of this paper is organized as follows. In Section 2, some basic concepts with respect to attribute reduction and measurement of stability are introduced briefly. In Section 3, ensemble selector-based attribute reduction and beam-influenced selector-based attribute reduction are presented elaborately. In Section 4, experimental results and corresponding analyses are reported clearly. Finally, some conclusions and suggestions for future work are offered in Section 5.
2. Preliminaries
2.1. Attribute Reduction
Currently, the relationship between attributes and labels not only provides guidance for constructing various attribute reduction-related constraints [11,26,27,28] in the field of attribute reduction [29,30,31] but also suggests positive heuristic information for generating a reduct based on appropriate attributes.
Up to now, it is well-known that many relationships and the corresponding constraints have been thoroughly explored and that the forms of attribute reduction vary considerably [32,33,34,35]. Different reducts possess different semantic explanations, which are closely determined by the used relationships.
However, though the forms of relationships and constraints are rich in previous research, it must be noted that the essence of attribute reduction can be further abstracted. Such an abstraction can not only reveal the clear framework of attribute reduction but also provide a broad space for introducing popular techniques into the study of attribute reduction-related topics. The following Definition 1 shows us a detailed abstraction [15].
Definition 1.
[36] Given a decision system , in which and U are nonempty finite sets with respect to condition attributes and all samples, d is the decision attribute. Suppose that is a constraint associated with a pre-defined measure ρ, , A is considered as a ρ-reduct if and only if the following conditions are satisfied:
- (1)
- A holds the constraint ;
- (2)
- , B does not hold the constraint .
In Def. 1, can be considered as a function that can map to a set of real numbers , where is an operation of the power set.
Immediately, the natural problem is how to generate the satisfactory reduct. Because of its low time complexity, the heuristic-based greed searching strategy is paid much attention to. Most of the greedy searchings [30,37,38] can be grouped into the following three phases.
- (1)
- Forward greedy searching. For each iteration, one or more appropriate attributes can be selected based on the evaluations of the candidate attributes. Thereby, through using sufficient iterations, a satisfactory reduct can be generated.
- (2)
- Backward greedy searching. For each iteration, one or more inferior attributes will be removed from the set of the raw attributes based on the evaluations. Then, through using sufficient iterations, a justifiable reduct can also be obtained.
- (3)
- Forward-Backward greedy searching. In such a strategy, both the forward and backward strategies are employed for seeking a reasonable reduct. For example, firstly, a potential reduct can be generated based on the strategy of forward searching; secondly, the redundant attributes in such a potential reduct can be further removed by the backward searching.
Among the above strategies, forward-backward greedy searching is the well-trodden strategy. The reasons can be summarized as follows: such strategy not only guarantees that the informative attributes are added into the potential reduct preferentially but also makes sure that there are no redundant attributes in the obtained reduct. The details of forward-backward greedy searching are presented as follows.
In Algorithm 1, , the fitness value will be calculated by using the fitness function ( map a to the set of real numbers) that can quantitatively characterize the significance of each candidate attribute. It must be pointed out that the fitness function is generally associated with the measure [36].
| Algorithm 1: Forward-backward greedy searching for attribute reduction (FBGSAR). |
![]() |
2.2. Measurement of Stability
Following the above discussions, testing the performances of the derived reduct is an inevitable problem. In addition to generalization performances related to classifiers, the stability of the derived reduct is another performance that should be paid attention [21].
For instance, suppose that data perturbation results in significant variations of the reducts, it means that the generated reducts are unstable. Obviously, unstable reducts will lend to unstable results of learning.
In this paper, the stability represents the degree of reduct perturbation when data perturbation happens. From this point of view, the reduct stability [39,40] is defined as Definition 2.
Definition 2.
[21] Given a decision system , if U is split into (), then the stability of the reduct is
where is the reduct obtained over .
3. Beam-Influenced Selector for Attribute Reduction
3.1. Ensemble Selector for Attribute Reduction
As shown in Algorithm 1, each candidate attribute is evaluated based on a single fitness function. Nevertheless, it is well-known that a single fitness function will bring the following challenges [41].
- (1)
- Single fitness function may lead to poorer adaptability. For example, a reduct generated based on a single granularity may fail to qualify as the reduct over other granularity, such as a slightly finer or coarser granularity [41] generated by a slight data perturbation.
- (2)
- Single fitness function may result in a poorer learning performance. For instance, in different classes, samples possess some distinct characteristics [31] that tend to optimize class-specific measurements. Nevertheless, revealing the differences among these differentiated characteristics using merely one fitness function is quite challenging.
To overcome the limitations mentioned above, a representative ensemble selector based attribute reduction has been designed by Yang and Yao [21]. Different from Algorithm 1, a set of fitness functions is used in the ensemble selector for evaluating each candidate attribute. The detailed process of the searching reduct based on the ensemble selector is shown as follows.
Actually, the fitness function is designed to measure the significance of each candidate attribute. Therefore, different perspectives can be constructed by using different fitness functions. For such a reason, the fitness function sets used in Algorithm 2 can be grouped into the following two broad categories.
- (1)
- Homogenous fitness function set: the set of fitness function {} is constructed using same evaluation criterion. For instance, the fitness function set can be defined based on approximation qualities of different rough set models, and then the generated can better adapt to different models.
- (2)
- Heterogeneous fitness function set: the set of fitness function {} is built using different evaluation criteria. For example, the fitness function set can be defined based on many different measures of a rough set model, such as approximation quality and entropy, etc., and then, the derived reduct will better adapt to different constraints.
| Algorithm 2: Ensemble selector-based attribute reduction (ESAR). |
![]() |
Algorithm 2 has substituted the set of fitness functions for a single fitness function. Therefore, compared with the conventional approach, the ensemble selector-based strategy may effectively improve the stability of the obtained reducts. The crucial reason is that the candidate attributes can be evaluated based on multiple perspectives, which makes the selected attributes may be equipped with more adaptability and generality.
Following the basic principle of Algorithm 2, some different forms of ensemble selectors [21,41,42] have also been investigated. However, it must be pointed out that most of these results suffer from a couple of limitations shown as follows.
- (1)
- Rely heavily on the distribution of samples. Take the classical ensemble selector proposed by Yang and Yao [21] as an example; each fitness function is constructed based on the samples with the same label. Therefore, the performance of the used fitness functions will be degraded if sample distribution is seriously unbalanced or the categories of samples are fewer.
- (2)
- Rely heavily on the selection of appropriate features. Take the selector proposed by Jiang et al. [42] as an example, only the optimal attribute will be selected based on each fitness function. Therefore, some candidate attributes with potential importance will be ignored, which indicates that some attributes with strong adaptability will be difficult to determine.
3.2. Beam-Influenced Selector-Based Attribute Reduction
Considering what has been pointed out in the above subsection, those limitations may lead to poor adaptability of the selected attributes. Motivated by this, two strategies called random partition [24] and beam [25] are employed in our attribute selector. The detailed structure of the beam-influenced selector (BIS) is shown in the following Figure 1.
Figure 1.
The framework of beam-influenced selector (BIS).
The details of our beam-influenced selector (BIS) shown in Figure 1 can be elaborated as follows:
- (1)
- Divide the set of raw data into groups in terms of the samples;
- (2)
- Different fitness functions are constructed based on different n groups of samples;
- (3)
- Each candidate attribute can be evaluated by different fitness functions, and then the top of the attributes with respect to the evaluation results over each fitness function will be added into the multiset T;
- (4)
- Select an attribute b with the maximal frequency of occurrences in the multiset T.
Following the above discussions, our searching strategy may be equipped with the following superiorities. Firstly, it will not be influenced by the distribution of samples. This is mainly because the whole universe is randomly partitioned into n different groups and then the distribution of each local datum is not the key thing that should be paid the most attention. Secondly, more important attributes will be considered and added into the multiset T. Those attributes are determined by not only various local data-based fitness functions but also the beam-based top selection related to each local data. Immediately, the attribute with stronger adaptability may be selected for each iteration.
The following Algorithm 3 shows the specific process of deriving a reduct based on our beam-influenced selector.
| Algorithm 3: Beam-influenced selector-based attribute reduction (BISAR). |
![]() |
In Algorithm 3, the set of homogeneous fitness functions is used, principally because: (1) those different fitness functions correspond to the local data derived from the random partition over the raw data; (2) although the local data are different, the evaluation mechanisms over those local data are same.
Compared with previous studies, it is obvious that the number of ensemble members can be adjusted freely by both the number of local data/fitness functions and the number of the beam-based top attributes. Put another way, each candidate attribute can be fully evaluated by more views. As a result, it is expected that the reduct generated by using Algorithm 3 may have a higher stability.
4. Experimental Analysis
4.1. Data Sets and Configuration
For verifying the validity and superiority of the proposed strategy, experimental comparisons are conducted in this section. All experiments were carried out on a personal computer with Windows 10, AMD 3750 (2.60 GHz) and 4.00 GB memory. The programming language is Matlab R2018b. Classifier packages (fitcecoc, fitcknn) used in the experiment are from the software of Matlab. The total run time of all experiments was about one month. Furthermore, Table 1 summarizes details of 16 UCI data sets that were used in our experiments.
Table 1.
Data sets description.
4.2. Experimental Setup
The model that was used for deriving reducts during our experimentations is the neighborhood rough set [26,43]. Note that the results obtained using the neighborhood rough set model are closely dependent on the given radius. For such reason, to verify the universality of BIEAR, 20 radii were selected for the experiment; these are 0.02, 0.04, …, 0.40 [15,24,44].
Furthermore, the 5-fold cross-validation was applied for searching reducts. The strategy of 5-fold cross-validation will divide the whole samples into 5 disjoint groups, i.e., . For the first round of computation, is considered as the set of training samples for searching for the reduct and then is considered as the set of testing samples for evaluating performances with respect to the generated reduct; …; is considered as the set of training samples to search for the reduct and then is considered as the set of testing samples for evaluating performances with respect to the generated reduct [44]. Therefore, the experimental studies reported in this paper are quite reliable and aim to provide a solid basis for a thorough evaluation of BISAR’s effectiveness.
Moreover, it is worth noting that besides the above raw data sets, two types of noise have also been injected into the raw data for further testing the effectiveness of our BISAR.
- (1)
- Feature noise. Given raw data, if the noise ratio is , then the injection is realized by randomly selecting features and replacing the values over these features with random numbers (value range is [0, 1]).
- (2)
- Label noise. Given raw data, if the noise ratio is , then the injection is realized by randomly selecting samples and replacing the labels of these samples randomly.
Finally, the feature noise ratio and the label noise ratio are set as 20%, the hyperparameters n and w of BISAR are set as 20. The following five strategies for obtaining reducts are also reproduced for comparing with our BISAR.
- (1)
- Attribute group based attribute reduction (AGAR) [15];
- (2)
- Dissimilarity based attribute reduction (DAR) [45];
- (3)
- Data-guidance based attribute reduction (DGAR) [42];
- (4)
- Ensemble selector-based attribute reduction (ESAR) [21];
- (5)
- Forward-backward greedy searching based attribute reduction (FBGSAR) [15].
4.3. Comparisons of Stability-Based Reducts
In this subsection, the stabilities of different reducts driven by AGAR, DAR, DGAR, ESAR, FBGSAR and our proposed strategy will be compared. The stability is reflected by how data perturbation will influence the reduct. Therefore, stabilities of reducts obtained by different strategies can be computed based on 5-fold cross-validation. The computation of stability has been shown in Equation (1). Since 20 different radii are used to obtain reducts in our experiments, the following Table 2, Table 3 and Table 4 show us the mean value of stabilities based on 20 different reducts.
Table 2.
The stabilities of reducts (raw data).
Table 3.
The stabilities of reducts (label noise).
Table 4.
The stabilities of reducts (feature noise).
- Compared with reducts generated by AGAR, DAR, DGAR, ESAR and FBGSAR, the reduct obtained by our proposed strategy can possess higher stability in most cases over raw data. Take the “LSVT Voice Rehabilitation (ID: 6)” data set as an example, the values with respect to stabilities of reducts obtained by AGAR, DAR, DGAR, ESAR, FBGSAR and our proposed strategy are 0.1003, 0.2194, 0.1133, 0.1380, 0.1133 and 0.6188, respectively. It is obvious that the reduct with higher stability can be effectively generated by our BISAR.
- Whether the label noise data or feature noise data are considered, the reduct obtained by our proposed strategy can always possess high stability. Take “LSVT Voice Rehabilitation (ID: 6)” data set as an example, over the label noise data, the values with respect to stabilities of reducts obtained by AGAR, DAR, DGAR, ESAR, FBGSAR and our proposed strategy are 0.0600, 0.1926, 0.0650, 0.0838, 0.0792 and 0.5738; over the feature noise data, the values are 0.0787, 0.1938, 0.0851, 0.1070, 0.1127 and 0.6256. It is not difficult to draw a conclusion that our proposed strategy can better adapt to the data with label noise or feature noise.
In addition, the Wilcoxon signed rank test is employed to characterize the differences of the stability among several strategies, in which the significance level is appointed as 0.05. The results with respect to the Wilcoxon signed rank test are shown in the following Table 5, Table 6 and Table 7.
Table 5.
p-values for comparing stabilities (raw data).
Table 6.
p-values for comparing stabilities (label noise).
Table 7.
p-values for comparing stabilities (feature noise).
Through carefully observing Table 5, Table 6 and Table 7, it is obvious that the returned p-values are less than 0.05 in most cases. Additionally, based on the results shown in Table 2, Table 3 and Table 4. It is obvious that the stability of derived reduct by using BISAR is significantly higher than that by using other approaches.
4.4. Comparisons of Classification Performances
In this subsection, the classification performances of different reducts derived by AGAR, DAR, DGAR, ESAR, FBGSAR and our proposed strategy will be compared. The k-nearest neighbor (KNN) classifier and the support vector machine (SVM) classifier are employed to test the classification performance.
Presently, although many classifiers have been designed [46,47,48], the SVM classifier and KNN classifier are still the two most commonly used in the field of feature selection [29,41,42,45,49]. The SVM classifier is a nonlinear classifier based on a sparse kernel. Such a classifier can map the data space from low-dimensional to high-dimensional and convert nonlinearly separable data into linearly separable data. Eventually, the task of classification will be completed based on the linearly separable data. Note that linear kernel is used in our experiment [49]. The KNN classifier is representative of lazy learning. Such classifier will calculate the distance between the test instance and each instance in the train set based on distance measures (such as Euclidean distance and Hamming distance), then k nearest neighbors of each test instance can be selected. Finally, the test instances will be classified based on the rule of the classification decision. In our experiment, the value of k is set to 5 [49].
Since 20 different radii are used to generate reducts in experiments, the following Table 8, Table 9, Table 10, Table 11, Table 12 and Table 13 show the mean accuracy related to 20 different reducts.
Table 8.
Classification accuracies based on the KNN classifier (raw data).
Table 9.
Classification accuracies based on the KNN classifier (label noise).
Table 10.
Classification accuracies based on the KNN classifier (feature noise).
Table 11.
Classification accuracies based on SVM classifier (raw data).
Table 12.
Classification accuracies based on SVM classifier (label noise).
Table 13.
Classification accuracies based on SVM classifier (feature noise).
With a thorough investigation of Table 8, Table 9, Table 10, Table 11, Table 12 and Table 13, we can observe that BISAR will not result in poorer classification accuracy over both KNN and SVM classifiers in most cases, whichever raw data or noise data are considered. Take the “Sonar (ID: 13)” data set as an example, over the raw data, the values with respect to classification accuracies based on the KNN classifier of reducts obtained by AGAR, DAR, DGAR, ESAR, FBGSAR and our proposed strategy are 0.7400, 0.6963, 0.7351, 0.7534, 0.7351 and 0.8176; over the label noise data, the values are 0.6646, 0.6251, 0.6510, 0.6663, 0.6610 and 0.7090; over the feature noise data, the values are 0.7215, 0.6805, 0.7146, 0.7217, 0.7212 and 0.7873. Obviously, the reduct obtained by our proposed strategy can always possess a justifiable classification ability.
Furthermore, the Wilcoxon signed rank test is also employed to compare the classification accuracies. The results with respect to the Wilcoxon signed rank test are shown in the following Table 14, Table 15 and Table 16.
Table 14.
p-values for comparing classification accuracies (raw data).
Table 15.
p-values for comparing classification accuracies (label noise).
Table 16.
p-values for comparing classification accuracies (feature noise).
From Table 14, Table 15 and Table 16, in terms of the KNN classifier and SVM classifier, no matter which algorithm is compared with our BISAR, the returned p-values are higher than 0.05 in most cases. This result further illustrates that the reduct generated based on BISAR can provide well-matched performance in classification tasks.
5. Conclusions, Limitations, and Future Research
To generate a reduct with higher stability when data perturbation happens, the beam-influenced selector (BIS) is designed in this study. Different from other popular selectors: on the one hand, our selector will not consider the original samples’ distribution because the attribute evaluation is based on the local data obtained by the strategy of random partition; on the other hand, our selector will not rely heavily on the optimal attribute because each candidate attribute can be fully considered based on the strategy of the beam. Therefore, the attribute with stronger adaptability can be selected by using BIS, and then the reduct generated based on our selector will possess higher stability. The experimental results verify that our proposed selector can significantly enhance the stability of the derived reduct, and it will not lead to a poor generalization ability of the reduct. However, in terms of the hyperparameters selection and the time consumption of deriving the reduct, some limitations existed in our strategy. Consequently, the following topics deserve further investigation.
- The time consumption of deriving a reduct may be further reduced through fusing some acceleration strategies [15,18];
- The hyperparameters selection will be further optimized based on some parameter optimization approaches;
- The effectiveness of our strategy can be further verified by comparing other state-of-the-art strategies [50,51,52] of feature selection.
Author Contributions
Data curation, W.Y.; methodology, W.Y.; software, W.Y.; supervision, T.X.; visualization, W.Y.; writing—original draft, W.Y.; writing—review & editing, W.Y., J.B., T.X., H.Y., J.S. and B.H. All authors have read and agreed to the published version of the manuscript.
Funding
This work was supported by the Natural Science Foundation of China (Nos. 62076111, 62006099, 62006128, 61906078), the Postgraduate Research and Practice Innovation Program of Jiangsu Province (No. KYCX21_3507).
Institutional Review Board Statement
Not applicable.
Informed Consent Statement
Not applicable.
Data Availability Statement
Publicly available datasets were analyzed in this study. These data can be found here: http://archive.ics.uci.edu/ml/datasets (accessed on 10 January 2022). The code with respect to this study can be found here: https://github.com/syscode-yxb/yww-experimentCode (accessed on 10 January 2022).
Conflicts of Interest
The authors declare no conflict of interest.
Abbreviations
| AGAR | attribute group based attribute reduction |
| BIS | beam-influenced selector |
| BISAR | beam-influenced selector-based attribute reduction |
| DAR | dissimilarity based attribute reduction |
| DGAR | data-guidance based attribute reductio |
| ESAR | ensemble selector-based attribute reduction |
| FBGSAR | forward-backward greedy searching based attribute reduction |
| KNN | k-nearest neighbor |
| SVM | support vector machine |
References
- Xu, W.H.; Yu, J.H. A novel approach to information fusion in multi-source datasets: A granular computing viewpoint. Inf. Sci. 2017, 378, 410–423. [Google Scholar] [CrossRef]
- Emani, C.K.; Cullot, N.; Nicolle, C. Understandable big data: A survey. Comput. Sci. Rev. 2015, 17, 70–81. [Google Scholar] [CrossRef]
- Xu, W.H.; Li, W.T. Granular computing approach to two-way learning based on formal concept analysis in fuzzy datasets. IEEE Trans. Cyber. 2016, 46, 366–379. [Google Scholar] [CrossRef] [PubMed]
- Yuan, K.H.; Xu, W.H.; Li, W.T.; Ding, W.Q. An incremental learning mechanism for object classificationbased on progressive fuzzy three-way concept. Inf. Sci. 2022, 584, 127–147. [Google Scholar] [CrossRef]
- Elaziz, M.A.; Abualigah, L.; Yousri, D.; Oliva, D.; Al-Qaness, M.A.A.; Nadimi-Shahraki, M.H.; Ewees, A.A.; Lu, S.; Ibrahim, R.A. Boosting atomic orbit search using dynamic-based learning for feature selection. Mathematics 2021, 9, 2786. [Google Scholar] [CrossRef]
- Khurma, R.A.; Aljarah, I.; Sharieh, A.; Elaziz, M.A.; Damaševičius, R.; Krilavičius, T. A review of the modification strategies of the nature inspired algorithms for feature selection problem. Mathematics 2022, 10, 464. [Google Scholar] [CrossRef]
- Li, J.D.; Liu, H. Challenges of feature selection for big data analytics. IEEE Intell. Syst. 2017, 32, 9–15. [Google Scholar] [CrossRef] [Green Version]
- Pérez-Martín, A.; Pérez-Torregrosa, A.; Rabasa, A.; Vaca, M. Feature selection to optimize credit banking risk evaluation decisions for the example of home equity loans. Mathematics 2020, 8, 1971. [Google Scholar] [CrossRef]
- Cai, J.; Luo, J.W.; Wang, S.L.; Yang, S. Feature selection in machine learning: A new perspective. Neurocomputing 2018, 300, 70–79. [Google Scholar] [CrossRef]
- Li, Y.; Li, T.; Liu, H. Recent advances in feature selection and its applications. Knowl. Inf. Syst. 2017, 53, 551–577. [Google Scholar] [CrossRef]
- Pawlak, Z. Rough Sets: Theoretical Aspects of Reasoning about Data; Kluwer Academic Publishers: Dordrecht, Netherlands, 1992. [Google Scholar]
- Ju, H.R.; Yang, X.B.; Yu, H.L.; Li, T.J.; Yu, D.J.; Yang, J.Y. Cost-sensitive rough set approach. Inf. Sci. 2016, 355–356, 282–298. [Google Scholar] [CrossRef]
- Liu, D.; Yang, X.; Li, T.R. Three-way decisions: Beyond rough sets and granular computing. Int. J. Mach. Learn. Cybern. 2020, 11, 989–1002. [Google Scholar] [CrossRef]
- Wang, C.Z.; Huang, Y.; Shao, M.W.; Fan, X.D. Fuzzy rough set-based attribute reduction using distance measures. Knowl. Based Syst. 2019, 164, 205–212. [Google Scholar] [CrossRef]
- Chen, Y.; Liu, K.Y.; Song, J.J.; Fujita, H.; Yang, X.B.; Qian, Y.H. Attribute group for attribute reduction. Inf. Sci. 2020, 535, 64–80. [Google Scholar] [CrossRef]
- Liu, K.Y.; Yang, X.B.; Yu, H.L.; Fujita, H.; Chen, X.J.; Liu, D. Supervised information granulation strategy for attribute reduction. Int. J. Mach. Learn. Cybern. 2020, 11, 2149–2163. [Google Scholar] [CrossRef]
- Liu, K.Y.; Yang, X.B.; Yu, H.L.; Mi, J.S.; Wang, P.X.; Chen, X.J. Rough set based semi-supervised feature selection via ensemble selector. Knowl. Based Syst. 2019, 165, 282–296. [Google Scholar] [CrossRef]
- Qian, Y.H.; Liang, J.Y.; Pedrycz, W.; Dang, C.Y. Positive approximation: An accelerator for attribute reduction in rough set theory. Artif. Intell. 2010, 174, 597–618. [Google Scholar] [CrossRef] [Green Version]
- Du, W.; Cao, Z.B.; Song, T.C.; Li, Y.; Liang, Y.C. A feature selection method based on multiple kernel learning with expression profiles of different types. BioData Min. 2017, 10, 4. [Google Scholar] [CrossRef] [Green Version]
- Goh, W.W.B.; Wong, L. Evaluating feature-selection stability in next generation proteomics. J. Bioinform. Comput. Biol. 2016, 14, 1650029. [Google Scholar] [CrossRef] [Green Version]
- Yang, X.B.; Yao, Y.Y. Ensemble selector for attribute reduction. Appl. Soft Comput. 2018, 70, 1–11. [Google Scholar] [CrossRef]
- Wu, W.Z.; Leung, Y. A comparison study of optimal scale combination selection in generalized multi-scale decision tables. Int. J. Mach. Learn. Cybern. 2020, 11, 961–972. [Google Scholar] [CrossRef]
- Wu, W.Z.; Qian, Y.H.; Li, T.J.; Gu, S.M. On rule acquisition in incomplete multi-scale decision tables. Inf. Sci. 2017, 378, 282–302. [Google Scholar] [CrossRef]
- Chen, Z.; Liu, K.Y.; Yang, X.B.; Fujitae, H. Random sampling accelerator for attribute reduction. Int. J. Approx. Reason. 2022, 140, 75–91. [Google Scholar] [CrossRef]
- Freitag, M.; Al-Onaizan, Y. Beam search strategies for neural machine translation. In Proceedings of the First Workshop on Neural Machine Translation, Vancouver, BC, Canada, 4 August 2017; pp. 56–60. [Google Scholar]
- Hu, Q.H.; Yu, D.R.; Xie, Z.X. Neighborhood classifiers. Expert Syst. Appl. 2008, 34, 866–876. [Google Scholar] [CrossRef]
- Wang, C.Z.; Hu, Q.H.; Wang, X.Z.; Chen, D.G.; Qian, Y.H.; Dong, Z. Feature selection based on neighborhood discrimination index. IEEE Trans. Neural Netw. Learn. Syst. 2018, 29, 2986–2999. [Google Scholar] [CrossRef] [PubMed]
- Zhang, X.; Mei, C.L.; Chen, D.G.; Li, J.H. Feature selection in mixed data: A method using a novel fuzzy rough set-based information entropy. Pattern Recognit. 2016, 56, 1–15. [Google Scholar] [CrossRef]
- Chen, Y.; Song, J.J.; Liu, K.Y.; Lin, Y.J.; Yang, X.B. Combined accelerator for attribute reduction: A sample perspective. Math. Probl. Eng. 2020, 2020, 2350627. [Google Scholar] [CrossRef]
- Jiang, Z.H.; Liu, K.Y.; Yang, X.B.; Yu, H.L.; Fujita, H.; Qian, Y.H. Accelerator for supervised neighborhood based attribute reduction. Int. J. Approx. Reason. 2020, 119, 122–150. [Google Scholar] [CrossRef]
- Xu, S.P.; Yang, X.B.; Yu, H.L.; Yu, D.J.; Yang, J.Y.; Tsang, E.C.C. Multi-label learning with label-specific feature reduction. Knowl. Based Syst. 2016, 104, 52–61. [Google Scholar] [CrossRef]
- Wu, W.Z. Attribute reduction based on evidence theory in incomplete decision systems. Inf. Sci. 2008, 178, 1355–1371. [Google Scholar] [CrossRef]
- Quafafou, M. α-RST: A generalization of rough set theory. Inf. Sci. 2000, 124, 301–316. [Google Scholar] [CrossRef]
- Skowron, A.; Rauszer, C. The discernibility matrices and functions in information systems. In Intelligent Decision Support: Handbook of Applications and Advances of the Rough Sets Theory; Springer: Dordrecht, The Netherlands, 1992; Volume 11, pp. 331–362. [Google Scholar]
- Zhang, W.X.; Wei, L.; Qi, J.J. Attribute reduction theory and approach to concept lattice. Sci. China F Inf. Sci. 2005, 48, 713–726. [Google Scholar] [CrossRef]
- Yan, W.W.; Chen, Y.; Shi, J.L.; Yu, H.L.; Yang, X.B. Ensemble and quick strategy for searching Reduct: A hybrid mechanism. Information 2021, 12, 25. [Google Scholar] [CrossRef]
- Xia, S.Y.; Zhang, Z.; Li, W.H.; Wang, G.Y.; Giem, E.; Chen, Z.Z. GBNRS: A Novel rough set algorithm for fast adaptive attribute reduction in classification. IEEE Trans. Knowl. Data Eng. 2020, 34, 1231–1242. [Google Scholar] [CrossRef]
- Yao, Y.Y.; Zhao, Y.; Wang, J. On reduct construction algorithms. Trans. Comput. Sci. II 2008, 5150, 100–117. [Google Scholar]
- Qian, Y.H.; Wang, Q.; Cheng, H.H.; Liang, J.Y.; Dang, C.Y. Fuzzy-rough feature selection accelerator. Fuzzy Sets Syst. 2015, 258, 61–78. [Google Scholar] [CrossRef]
- Yang, X.B.; Qi, Y.; Yu, H.L.; Song, X.N.; Yang, J.Y. Updating multigranulation rough approximations with increasing of granular structures. Knowl. Based Syst. 2014, 64, 59–69. [Google Scholar] [CrossRef]
- Liu, K.Y.; Yang, X.B.; Fujita, H.; Liu, D.; Yang, X.; Qian, Y.H. An efficient selector for multi-granularity attribute reduction. Inf. Sci. 2019, 505, 457–472. [Google Scholar] [CrossRef]
- Jiang, Z.H.; Dou, H.L.; Song, J.J.; Wang, P.X.; Yang, X.B.; Qian, Y.H. Data-guided multi-granularity selector for attribute reduction. Appl. Intell. 2021, 51, 876–888. [Google Scholar] [CrossRef]
- Xu, W.H.; Yuan, K.H.; Li, W.T. Dynamic updating approximations of local generalized multigranulation neighborhood rough set. Appl. Intell. 2022. [Google Scholar] [CrossRef]
- Ba, J.; Liu, K.Y.; Ju, H.R.; Xu, S.P.; Xu, T.H.; Yang, X.B. Triple-G: A new MGRS and attribute reduction. Int. J. Mach. Learn. Cybern. 2022, 13, 337–356. [Google Scholar] [CrossRef]
- Rao, X.S.; Yang, X.B.; Yang, X.; Chen, X.J.; Liu, D.; Qian, Y.H. Quickly calculating reduct: An attribute relationship based approach. Knowl. Based Syst. 2020, 200, 106041. [Google Scholar] [CrossRef]
- Borah, P.; Gupta, D. Functional iterative approaches for solving support vector classification problems based on generalized Huber loss. Neural Comput. Appl. 2020, 32, 9245–9265. [Google Scholar] [CrossRef]
- Borah, P.; Gupta, D. Unconstrained convex minimization based implicit Lagrangian twin extreme learning machine for classification (ULTELMC). Appl. Intell. 2020, 50, 1327–1344. [Google Scholar] [CrossRef]
- Adhikary, D.D.; Gupta, D. Applying over 100 classifiers for churn prediction in telecom companies. Multimed. Tools Appl. 2021, 80, 35123–35144. [Google Scholar] [CrossRef]
- Zhou, H.F.; Wang, X.Q.; Zhu, R.R. Feature selection based on mutual information with correlation coefficient. Appl. Intell. 2021. [Google Scholar] [CrossRef]
- Karakatič, S. EvoPreprocess-Data preprocessing gramework with nature-Inspired optimization algorithms. Mathematics 2020, 8, 900. [Google Scholar] [CrossRef]
- Karakatič, S.; Fister, I.; Fister, D. Dynamic genotype reduction for narrowing the feature selection search Space. In Proceedings of the 2020 IEEE 20th International Symposium on Computational Intelligence and Informatics (CINTI), Budapest, Hungary, 5–7 November 2020; pp. 35–38. [Google Scholar]
- Yan, D.W.; Chi, G.T.; Lai, K.K. Financial distress prediction and feature selection in multiple periods by lassoing unconstrained distributed lag non-linear models. Mathematics 2020, 8, 1275. [Google Scholar] [CrossRef]
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).


