Next Article in Journal
Autonomous Wireless Sensor Networks in an IPM Spatial Decision Support System
Previous Article in Journal
A Sparse Analysis-Based Single Image Super-Resolution
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Improved Measures of Redundancy and Relevance for mRMR Feature Selection

1
Department of Data Science, Dankook University, Yongin 16890, Korea
2
Department of Software Science, Dankook University, Yongin 16890, Korea
*
Author to whom correspondence should be addressed.
Computers 2019, 8(2), 42; https://doi.org/10.3390/computers8020042
Submission received: 4 April 2019 / Revised: 29 April 2019 / Accepted: 22 May 2019 / Published: 27 May 2019

Abstract

:
Many biological or medical data have numerous features. Feature selection is one of the data preprocessing steps that can remove the noise from data as well as save the computing time when the dataset has several hundred thousand or more features. Another goal of feature selection is improving the classification accuracy in machine learning tasks. Minimum Redundancy Maximum Relevance (mRMR) is a well-known feature selection algorithm that selects features by calculating redundancy between features and relevance between features and class vector. mRMR adopts mutual information theory to measure redundancy and relevance. In this research, we propose a method to improve the performance of mRMR feature selection. We apply Pearson’s correlation coefficient as a measure of redundancy and R-value as a measure of relevance. To compare original mRMR and the proposed method, features were selected using both of two methods from various datasets, and then we performed a classification test. The classification accuracy was used as a measure of performance comparison. In many cases, the proposed method showed higher accuracy than original mRMR.

Graphical Abstract

1. Introduction

Recently, with the rapid development of machine learning and the increasing accumulation of data through the internet, various methods of analyzing data using past techniques have been difficult to apply to modern big data problems, and various data preprocessing techniques have been developed. Among them, feature selection is a process of selecting a set of features (variables, attributes) that meet the purpose of analysis for a high-dimensional dataset having thousands or tens of thousands of features. Analysts can benefit from a selection of features, including better performance of predictive models, and faster and more efficient data analysis. The advantages of feature selection are as follows:
(a)
reduces the dimension of the dataset and therefore reduces the cost of computing resources
(b)
improves classification model performance by reducing data noise
(c)
facilitates data visualization and understanding
The main purpose of the general feature selection is to determine a set of related features that is of interest regarding particular events or phenomena. This feature selection is usually divided into filtering methods and wrapper methods, depending on how the relevant features are searched [1,2,3,4]. Filter techniques assess the relevance of features by evaluating only the intrinsic properties of the data [1]. In most cases, relevance scores between each feature and class vector are calculated, and high-scored features are selected. Filter techniques are simple, fast, and easy to understand. However, they do not consider redundancy and interaction between features; they assume features are independent from each other. To capture the interactions between features, wrapper methods embed a classification model within the feature subset evaluation. However, as the space of feature subsets grows exponentially with the number of features, heuristic search methods such as forward search and backward elimination are used to guide the search toward an optimal subset [1]. Feature selection can be categorized into supervised, unsupervised, and semisupervised [5,6,7]. Supervised feature selection algorithms consider features’ relevance by evaluating their correlation with the class information whereas unsupervised feature selection algorithms may exploit data variance or data distribution in its evaluation of features’ relevance without labels. Semisupervised feature selection algorithms use a small amount of labeled data as additional information to improve unsupervised feature selection [5]. Minimum Redundancy Maximum Relevance (mRMR) and the proposed method belong to the supervised method.
Ding and Hanchuan [8,9] suggested the mRMR measure to reduce redundant features during the feature selection process. They tried to measure both redundancy among features and relevance between features and class vector for a given set of features. Their redundancy and relevance measures are based on mutual information as follows:
I ( x , y ) = i , j l o g p ( x i , y j ) p ( x i ) p ( y j )
In the Equation (1), x and y are feature vector or class vector, and p() represents probability. Suppose S is a given set of features and h is a class variable. The redundancy of S is measured by Equation (2):
W i = 1 | S | 2 i , j S I ( i , j )
In Equation (2), |S| is the number of features in S. The relevance of S is measured by Equation (3):
V I = 1 | S | i S I ( h , i )
There are two types of methods to evaluate S:
MID :   V I W I
MIQ :   V I / W I
In many cases, MIQ (Mutual Information Quotient) shows better performance than MID (Mutual Information Difference). We cannot test all subsets of features S for a given dataset, so the mRMR algorithm adopts a forward search in its implementation. The procedure is described in Algorithm 1.
Algorithm 1: Forward search
/*
M: size of feature subset S that we want to get
S: set of selected features
F: whole set of features of target dataset
*/
S ← ∅
REPEAT UNTIL |S| < M
Find fiF that maximize MID/MIQ of S ∪{ fi };
SS ∪{ fi };
   Remove fi from F;
END REPEAT
RETURNS;
In the context of statistics or information theory, the term ‘variable’ is used instead of ‘feature’. We will use ‘variable’ and ‘feature’ as compatible terms according to their context. Mutual information can be only applied on two categorical variables (x,y). Therefore, if a dataset has continuous variables, they need to be converted into categorical variables before performing mRMR. The performance of mRMR depends on the quality of redundancy and relevancy measures. If we can improve the measures, we can enhance the performance of mRMR. Several studies [2,10,11] have attempted to improve redundancy measure WI by introducing equations of joint mutual information I(x1,x2,..xn). Auffarth et al. [12] compared various redundancy and relevance measures, and suggested ‘Fit Criterion’ and ‘Value Difference Metric’ as best measures. These measures, however, can be applied to only two-class datasets. mRMR is widely used in bioinformatics including gene selection and disease diagnosis [8,13,14,15].
In this study, we propose new measures for redundancy and relevancy. We suggest Pearson’s correlation coefficient [16] as a redundancy measure and the R-value [17] as a relevance measure. The R-value and correlation coefficient can be designed for continuous variables whereas mutual information implies categorical variables. We also implement advanced mRMR (AmRMR) using new measures. Details of the new measures and AmRMR are provided in the next section.

2. Materials and Methods

2.1. Pearson’s Correlation Coefficient and R-Value

Pearson’s correlation coefficient is a measure of the linear correlation between two variables x and y, and it is defined by Equation (6):
r = i = 1 n ( x i x ¯ ) ( y i y ¯ ) ( n 1 ) S x S y = i = 1 n ( x i x ¯ ) ( y i y ¯ ) i = 1 n ( x i x ¯ ) 2 i = 1 n ( y i y ¯ ) 2
x ¯ ,   y ¯   : m e a n   o f   x ,   y
S x , S y   : s t a n d a r d   d e v i a t i o n   o f   x ,   y
It has a value range [−1, +1]. If an absolute value of the correlation coefficient is near 1, the variables (x, y) have strong correlation. In the context of feature selection, if two features (x, y) represent similar values, then the correlation coefficient of (x, y) will be high; this means that the correlation coefficient can be used to measure redundancy. If two features (a, b) have strong negative correlation, their values will be different. However, from the point of view of information theory, the amount of information in a and b is similar, and they can be considered redundant features.
The R-value is proposed as an evaluation measure for datasets [17,18]. The motivation for using the R-value is that the quality of the dataset has a profound effect on classification accuracy, and overlapping areas among classes in a dataset have a strong relationship that determines the quality of the dataset. For example, dataset D1 produces higher classification accuracy than dataset D2 in Figure 1. Overlapping area is a region where samples from different classes are gathered closely to one another. If an unknown sample is located in the overlapping area, it is difficult to determine its class label. Therefore, the size of overlapping areas may be a criterion to measure the quality of features or of the entire dataset [19]. The R-value captures overlapping areas among classes in a dataset. The R-value uses a k-nearest neighbor algorithm to define overlapping areas. If an instance has many neighbors that have different class values, then it may belong to an overlapping area. Suppose DS is a given dataset, S is a subset of features, and C is a class vector. Algorithm 2 describes the procedure to calculate the R-value of S. The R-value has range [0, 1], and if the R-value of S is near 1, then S may produce lower classification accuracy.
Algorithm 2:Rvalue(S,C)
//K: number of nearest neighbor
Derive dataset DSs of S from DS;
OV ← 0;         //
N ← number of instances of DSs[];
FOR each instance in DSs[i] DO
  Find K nearest neighbor values for DSs[i] and store their instance ID to KNV;
  Count the number of elements in KNV that have class value different from C[i], and add it to OV;
END FOR
RvalueOV/(K*N);
RETURNRvalue;

2.2. Formal Description of AmRMR

Suppose we evaluate a feature set S that has m features. The new relevancy measure VR for S is simply defined using the Rvalue:
V R = 1 R v a l u e ( S , C )
If a feature set S produces a high Rvalue, it means that large overlapping areas exist between classes and may cause lower classification accuracy. Therefore, the lower the Rvalue obtained, the better the classification. We define the new relevancy measure as 1 − Rvalue to give a higher score to a lower Rvalue.
To develop a better redundancy measure, we replace mutual information with a correlation coefficient. The original redundancy measure, WI, is simply the mean of the mutual information for a pair of features in S. From several experiments, we found that the value of a specific pair of features is more important than the mean of all pairs if the value is high. Therefore, we calculate a maximum (maxC) and a mean (meanC) of the correlation coefficient, and choose maxC as a new redundant measure WR if maxC ≥ 0.5, otherwise WR = meanC. If the absolute value of correlation coefficient of variables (x,y) ≥ 0.5, we accept that they have meaningful correlation. In Equations (8) and (9), Cor() is a correlation coefficient function, abs() is an absolute value function, and max() is a maximum value function.
m a x C = max { abs ( Cor ( f i , f j ) ) }   f i , f j S ,    i , j = 1 , 2 , 3 , , m
m e a n C = m e a n { abs ( Cor ( f i , f j ) ) }   f i , f j S ,    i , j = 1 , 2 , 3 , , m
W R = { m a x C ,       i f   m a x C   0.5 m e a n C ,       i f   m a x C <   0.5    
From the new definition of relevance measure VR and redundant measure WR, we redefine MID and MIQ as RVD and RVQ, respectively. RVD is similar to MID. We define RVQ in a more sophisticated manner. In evaluation function RVQ, VR indicates benefit and WR indicates penalty. Therefore, (VR/WR) cannot be larger than VR. However, 0 ≤ VR, WR ≤ 1 in our equation, and sometimes (VR/WR) > VR. Therefore, we adjust for this discrepancy in Equation (12).
R V D = V R W R
R V Q = { V R ,    i f   ( V R W R ) > V R   V R W R , i f   ( V R W R ) V R  
We have described a new evaluation measure for feature subset S. As we mentioned earlier, we cannot evaluate all instances of S for a given dataset; thus, a heuristic approach is required. We implemented AmRMR based on mRMR code. It applies a forward search to reduce the search space. Algorithm 3 describes the pseudo code for AmRMR. We only consider the case of RVQ.
Algorithm 3: AmRMR(DS,C,M)
/*
  DS: target dataset
C: class vector of DS
M: size of feature subset S that we want to get
F: set of features in DS
*/
Find fiF that produces max(R-value( fi,C));
S ← {fi};
Remove fi from F;
REPEAT UNTIL |S| < M
  max_eval ← 0;
  max_idx ← 0;
  FOR each fjF DO
    Target ← S ∪ {fi};
    Calculate RVQ for Target;
    IF RVQ > max_eval THEN
         max_evalRVQ;
     max_idxj;
    END IF
  END FOR
  S ← S ∪{fmax_idx};
  Remove fj from F;
END REPEAT
RETURNS;

3. Result

To compare mRMR and AmRMR algorithms, we collected several types of datasets that have different numbers of features, classes, and instances. Table 1 summarizes the datasets. We obtained GDS2546, GDS2547, and GDS3715 from the NCBI Gene Expression Omnibus [20], and arcene and madelon from NPIS2003’s challenge of feature selection [21], and others were obtained from the UCI Machine Learning Repository [22]. We took 5–25 features using mRMR and AmRMR, and performed classification tests using k-nearest neighbor (KNN), support vector machine (SVM), C5.0 (C50), and random forest (RF). To avoid an overfitting problem, we adopted a k-fold cross-validation, where k is 10. In the case of arcene and madelon, we took feature set from the training dataset and performed classification tests using validation datasets because they support separated training/validation datasets. Table 2, Table 3, Table 4 and Table 5 summarize the results. In most of the cases, AmRMR produces better performance than mRMR. Figure 2 summarizes the classification results in Table 2, Table 3, Table 4 and Table 5. Each accuracy means average classification accuracy from 5 to 25 features of datasets. Each graph clearly shows AmRMR chooses better features than mRMR.

4. Discussion

In general, the R-value is better than mutual information as a measure of relevance between features and class vector. Mutual information is a statistical measure and it needs categorical values to calculate probability. Therefore, if a target dataset contains continuous values, we need to discretize them before applying mRMR. Information loss is inevitable in discretization. The R-value does not need discretization and is more advantageous than mutual information when a dataset has continuous values. Another weak point of mutual information is that it can calculate I(fi, C) where fi is a feature and C is a class vector, but it cannot calculate I({f1, f2, f3}, C) because it is based on probability. Therefore, it uses (I(f1, C) + I(f2, C) + I(f3, C))/3 to calculate relevance between {f1, f2, f3} and C. This calculation cannot fully capture interactions among {f1, f2, f3}. In contrast, the R-value is a dimensionless distance-based measure so R-value({f1, f2, f3}, C) can be directly calculated.
mRMR and AmRMR output different feature sets from the same dataset, resulting in different classification accuracies. Table 6 shows a list of 25 features from GDS3715 dataset evaluated by mRMR and AmRMR. In the case of Arcene, there is only one shared feature (9970) between mRMR and AmRMR. In the case of Madelon, there are five shared features. It means that mRMR and AmRMR have different evaluation criteria for feature selection. Figure 3 shows PCA (Principal Component Analysis) plots for Arcene and Madelon using five features by mRMR and AmRMR. As we can see, PCA plots of AmRMR show a clearer distribution of class instances than mRMR. It explains why the feature set of AmRMR produces better classification accuracy than the one used by mRMR.
Table 7 shows averages of the improved classification accuracy for 10 datasets. In the four classifiers, 4–10% of accuracies are improved. This result indicates that the proposed new redundancy and relevance measures enhance performance compared to the original mRMR measures. KNN classifier shows remarkably improved result (10.7%). The reason is in the R-value, which is a measure of relevance. Both KNN and R-value are based on k-nearest neighbor. Therefore, a set of features with good R-value may produce good classification accuracy by KNN. The relationship between R-value and KNN is similar to the relationship between the classifier and the feature evaluation measure in the wrapper method.
The proposed new redundancy and relevance measures are tailored to datasets that have continuous values. This means that they are not suitable for datasets that have categorical values. The mutual information measure in the original mRMR method is more suitable for categorical datasets. Nevertheless, AmRMR is useful, because there exist many high-dimensional continuous datasets such as microarray data, diagnosed diseases data, image analysis data, and so on.
To show the effect of AmRMR, we compare it with three filter feature selection methods such as mutual information (MI), linear correlation (Linear), and rank correlation (Rank.Corr). The condition of comparison is the same as for the case of mRMR. For simplicity, we test KNN and SVM. Figure 4 and Figure 5 are the results of comparison. We can see AmRMR produces the highest performance of all the methods.

5. Conclusions

In this study, we proposed new redundancy and relevance measures to improve mRMR feature selection. The proposed method provides powerful performance for specific target dataset than mRMR. However, it should be noted that the proposed method has its limitations on types of datasets it can analyze. The performance of feature selection depends on the characteristics of the target dataset. Therefore, users are encouraged to test both mRMR and AmRMR, and choose the better feature subsets according to the test results. The entire set of R codes for AmRMR is available at https://bitldku.github.io/home/sw/AmRMR.html.

Author Contributions

Conceptualization, S.O.; methodology, I.J. and S.O.; software, I.J.; validation, S.O. and S.L.; formal analysis, S.O.; investigation, I.J.; resources, I.J.; data curation, I.J.; writing—original draft preparation S.O.; writing—review and editing, S.L.; visualization, I.J.; supervision, S.O. and S.L.; project administration, S.O.; funding acquisition, S.O.

Funding

This work was supported by the ICT & RND program of MIST/IITP. [2018-0-00242, Development of AI ophthalmologic diagnosis and smart treatment platform based on big data].

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Saeys, Y.; Inza, I.; Larrañaga, P. A review of feature selection techniques in bioinformatics. Bioinformatics 2007, 23, 2507–2517. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  2. Wang, Z.; Li, M.; Li, J. A multi-objective evolutionary algorithm for feature selection based on mutual information with a new redundancy measure. Inf. Sci. 2015, 307, 73–88. [Google Scholar] [CrossRef]
  3. Guyon, I.; Elisseeff, A. An introduction to variable and feature selection. J. Mach. Learn. Res. 2003, 3, 1157–1182. [Google Scholar]
  4. Liu, H.; Yu, L. Toward integrating feature selection algorithms for classification and clustering. IEEE. Trans. Knowl. Data Eng. 2005, 17, 491–502. [Google Scholar] [Green Version]
  5. Liu, H.; Motoda, H.; Setiono, R.; Zhao, Z. Feature selection: An ever evolving frontier in data mining. J. Mach. Learn. Res.-Proc. Track. 2010, 10, 4–13. [Google Scholar]
  6. Ang, J.C.; Mirzal, A.; Haron, H.; Hamed, H.N.A. Supervised, unsupervised, and semi-supervised feature selection: a review on gene selection. IEEE/ACM Trans. Comput. Biol. Bioinform. 2016, 13, 971–989. [Google Scholar] [CrossRef] [PubMed]
  7. Han, Y.; Yang, Y.; Yan, Y.; Ma, Z.; Sebe, N.; Zhou, X. Semisupervised feature selection via spline regression for video semantic recognition. IEEE Trans. Neur. Net. Lear. 2015, 26, 252–264. [Google Scholar]
  8. Ding, C.; Peng, H. Minimum redundancy feature selection from microarray gene expression data. J. Bioinform. Comput. Biol. 2005, 3, 185–205. [Google Scholar] [CrossRef] [PubMed]
  9. MRMR Homepage. Available online: http://home.penglab.com/proj/mRMR/ (accessed on 28 January 2019).
  10. Ponsa, D.; López, A. Feature selection based on a new formulation of the minimal-redundancy-maximal-relevance criterion. In Proceedings of the Pattern Recognition and Image Analysis, Third Iberian Conference, IbPRIA 2007, Girona, Spain, 6–8 June 2007; pp. 47–54. [Google Scholar]
  11. Hejazi, M.I.; Ximing, C. Input variable selection for water resources systems using a modified minimum redundancy maximum relevance(mMRMR) algorithm. Adv. Water Resour. 2009, 32, 582–593. [Google Scholar] [CrossRef]
  12. Auffarth, B.; López, M.; Cerquides, J. Comparison of Redundancy and Relevance Measures for Feature Selection in Tissue Classification of CT Images. In Proceedings of the Industrial Conference on Data Mining, Berlin, Germany, 12–14 July 2010; pp. 47–54. [Google Scholar]
  13. Aggarwal, N.; Rana, B.; Agrawal, R.K.; Kumaran, S. A combination of dual-tree discrete wavelet transform and minimum redundancy maximum relevance method for diagnosis of Alzheimer's disease. J. Bioinform. Res. 2015, 11, 433–461. [Google Scholar] [CrossRef] [PubMed]
  14. Alomari, O.A.; Khader, A.T.; Al-Betar, M.A.; Abualigah, L.M. Gene selection for cancer classification by combining minimum redundancy maximum relevancy and bat-inspired algorithm. J. Data Min. Bioinform. 2017, 19, 32–51. [Google Scholar] [CrossRef]
  15. Mundra, P.A.; Rajapakse, J.C. SVM-RFE with MRMR filter for gene selection. IEEE Trans. Nanobiosci. 2009, 9, 31–37. [Google Scholar] [CrossRef] [PubMed]
  16. Pearson, K. Note on regression and inheritance in the case of two parents. Proc. R. Soc. Lond. 1895, 58, 240–242. [Google Scholar]
  17. Oh, S. A new dataset evaluation method based on category overlap. Comput. Biol. Med. 2011, 41, 115–122. [Google Scholar] [CrossRef] [PubMed]
  18. Lee, J.; Nomin, B.; Oh, S. RFS: efficient feature selection method based on R-value. Comput. Biol. Med. 2013, 43, 91–99. [Google Scholar] [CrossRef] [PubMed]
  19. Li, Y.; Liang, C.; Wong, K.C.; Luo, J.; Zhang, Z. Mirsynergy: Detecting synergistic mirna regulatory modules by overlapping neighbourhood expansion. Bioinformatics 2014, 30, 2627–2635. [Google Scholar] [CrossRef] [PubMed]
  20. NCBI Gene Expression Omnibus. Available online: http://www.ncbi.nlm.nih.gov/geo/ (accessed on 20 January 2019).
  21. NPIS2003 Workshop on Feature Extraction and Feature Selection Challenge. Available online: http://clopinet.com/isabelle/Projects/NIPS2003/ (accessed on 15 December 2018).
  22. UCI Machine Learning Repository. Available online: http://archive.ics.uci.edu/ml/ (accessed on 18 January 2019).
Figure 1. Two datasets that have different overlapping areas. Dataset D2 is more confused than Dataset D1. Therefore, Dataset D1 produces higher classification accuracy than Dataset D2.
Figure 1. Two datasets that have different overlapping areas. Dataset D2 is more confused than Dataset D1. Therefore, Dataset D1 produces higher classification accuracy than Dataset D2.
Computers 08 00042 g001
Figure 2. Summary of average classification accuracy from 5 to 25 features of datasets. Each graph clearly shows AmRMR chooses better features than mRMR.
Figure 2. Summary of average classification accuracy from 5 to 25 features of datasets. Each graph clearly shows AmRMR chooses better features than mRMR.
Computers 08 00042 g002
Figure 3. PCA plots for GDS3715 dataset. PCA plots of AmRMR show a clearer distribution of class instances than mRMR.
Figure 3. PCA plots for GDS3715 dataset. PCA plots of AmRMR show a clearer distribution of class instances than mRMR.
Computers 08 00042 g003
Figure 4. Comparison of feature selection methods by KNN test.
Figure 4. Comparison of feature selection methods by KNN test.
Computers 08 00042 g004aComputers 08 00042 g004b
Figure 5. Comparison of feature selection methods by SVM test.
Figure 5. Comparison of feature selection methods by SVM test.
Computers 08 00042 g005aComputers 08 00042 g005b
Table 1. Summary of benchmark datasets.
Table 1. Summary of benchmark datasets.
DatasetInstancesFeaturesClasses
GDS254616710004
GDS254716410004
GDS371510910004
Hill Valley12121002
Isolet779761726
Madelon20005002
Phoneme45092565
MLL72125333
Arcene99100012
Gisette599950002
Table 2. Summary of classification accuracy tested by KNN.
Table 2. Summary of classification accuracy tested by KNN.
Dataset Number of Features
510152025
GDS2546mRMR0.6230.6470.6280.5980.628
AmRMR0.6950.7190.7190.7190.719
GDS2547mRMR0.5980.6100.6280.6340.653
AmRMR0.7260.7620.7440.7620.793
GDS3715mRMR0.7800.790.8080.7700.798
AmRMR0.890.9360.9640.9360.955
Hill ValleymRMR0.5460.5530.5420.5490.557
AmRMR0.6010.6090.60.6050.612
IsoletmRMR0.3740.560.6070.6880.713
AmRMR0.5280.7560.8300.8750.893
MadelonmRMR0.7020.8240.830.8190.8
AmRMR0.8660.8940.8950.8960.896
PhonememRMR0.8300.8570.8630.8790.895
AmRMR0.8840.9160.9210.9220.925
MLLmRMR0.9410.8210.9060.9170.930
AmRMR11111
ArcenemRMR0.520.580.560.660.65
AmRMR0.790.790.820.810.82
GisettemRMR0.6080.670.790.8280.825
AmRMR0.8860.90.90.9010.901
Table 3. Summary of classification accuracy tested by SVM.
Table 3. Summary of classification accuracy tested by SVM.
Dataset Number of Features
510152025
GDS2546mRMR0.6770.7360.6880.6770.695
AmRMR0.6520.6470.7060.7430.713
GDS2547mRMR0.7010.6520.6810.6950.689
AmRMR0.7010.7010.7380.7390.719
GDS3715mRMR0.790.7710.7710.8080.789
AmRMR0.8530.8990.9080.8990.890
Hill ValleymRMR0.5180.5160.5160.520.52
AmRMR0.5260.5270.5290.5250.526
IsoletmRMR0.3820.6030.6670.7410.775
AmRMR0.5560.7930.8700.9020.919
MadelonmRMR0.7190.7140.6960.6780.674
AmRMR0.8290.810.7520.7050.685
PhonememRMR0.8480.8660.8720.8890.905
AmRMR0.8950.9230.9280.9270.932
MLLmRMR0.8990.8990.9420.9570.942
AmRMR0.9710.9570.9710.9850.958
ArcenemRMR0.7190.7140.6960.6780.674
AmRMR0.8290.810.7520.7050.685
GisettemRMR0.8480.8660.8720.8890.905
AmRMR0.8950.9230.9280.9270.932
Table 4. Summary of classification accuracy tested by C50.
Table 4. Summary of classification accuracy tested by C50.
Dataset Number of Features
510152025
GDS2546mRMR0.612 0.6530.641 0.641 0.628
AmRMR0.6530.611 0.6640.6230.658
GDS2547mRMR0.537 0.512 0.591 0.598 0.659
AmRMR0.6220.6340.6280.6770.640
GDS3715mRMR0.7980.743 0.7710.733 0.689
AmRMR0.752 0.7520.752 0.7640.754
Hill ValleymRMR0.475 0.475 0.475 0.475 0.475
AmRMR0.475 0.475 0.475 0.475 0.475
IsoletmRMR0.388 0.566 0.609 0.684 0.750
AmRMR0.5110.7150.7690.8050.815
MadelonmRMR0.693 0.712 0.720 0.729 0.743
AmRMR0.7410.8070.8040.7860.786
PhonememRMR0.815 0.825 0.830 0.843 0.878
AmRMR0.8760.8980.8890.8880.884
MLLmRMR0.845 0.818 0.804 0.9010.800
AmRMR0.8590.8590.8590.859 0.830
ArcenemRMR0.690 0.830 0.840 0.820 0.800
AmRMR0.7800.8600.8600.820 0.830
GisettemRMR0.849 0.854 0.866 0.882 0.904
AmRMR0.9160.9430.9490.9470.949
Table 5. Summary of classification accuracy tested by Random Forest (RF).
Table 5. Summary of classification accuracy tested by Random Forest (RF).
Dataset Number of Features
510152025
GDS2546mRMR0.6530.7370.7610.7550.744
AmRMR0.629 0.665 0.731 0.742 0.713
GDS2547mRMR0.677 0.658 0.652 0.683 0.695
AmRMR0.7440.7800.7620.7500.768
GDS3715mRMR0.844 0.799 0.835 0.808 0.817
AmRMR0.8720.8900.8900.9000.890
Hill ValleymRMR0.543 0.556 0.560 0.582 0.583
AmRMR0.6260.6480.6430.6400.637
IsoletmRMR0.412 0.627 0.682 0.760 0.790
AmRMR0.5560.7980.8680.8980.911
MadelonmRMR0.754 0.789 0.794 0.791 0.772
AmRMR0.8490.8610.8480.8400.832
PhonememRMR0.846 0.864 0.874 0.890 0.902
AmRMR0.8930.9170.9200.9240.926
MLLmRMR0.958 0.986 0.971 0.971 0.971
AmRMR0.9860.986 0.971 1.0000.956
ArcenemRMR0.790 0.900 0.870 0.840 0.820
AmRMR0.8900.900 0.9000.8800.880
GisettemRMR0.857 0.867 0.884 0.900 0.921
AmRMR0.9180.9480.9610.9640.967
Table 6. List of features selected by mRMR and AmRMR.
Table 6. List of features selected by mRMR and AmRMR.
Dataset Selected Feature’s ID
GDS3715mRMR1, 510, 4, 153, 48, 84, 2, 5, 516, 6, 32, 19, 700, 662, 270, 240, 9, 450, 129, 122, 25, 7, 29, 238, 12
AmRMR25, 269, 132, 90, 15, 108, 577, 301, 121, 991, 167, 273, 334, 661, 447, 19, 873, 210, 583, 26, 751, 248, 197, 558, 215
Table 7. Improved classification accuracy by AmRMR.
Table 7. Improved classification accuracy by AmRMR.
ClassifierNumber of featuresAverage
510152025
KNN0.1000.1160.1080.1080.1020.107
SVM0.0560.0630.0710.0580.0440.058
C500.0480.0570.0500.0340.0300.044
RF0.0460.0430.0450.0440.0360.043

Share and Cite

MDPI and ACS Style

Jo, I.; Lee, S.; Oh, S. Improved Measures of Redundancy and Relevance for mRMR Feature Selection. Computers 2019, 8, 42. https://doi.org/10.3390/computers8020042

AMA Style

Jo I, Lee S, Oh S. Improved Measures of Redundancy and Relevance for mRMR Feature Selection. Computers. 2019; 8(2):42. https://doi.org/10.3390/computers8020042

Chicago/Turabian Style

Jo, Insik, Sangbum Lee, and Sejong Oh. 2019. "Improved Measures of Redundancy and Relevance for mRMR Feature Selection" Computers 8, no. 2: 42. https://doi.org/10.3390/computers8020042

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop