Next Article in Journal
Towards Long-Range RNA Structure Prediction in Eukaryotic Genes
Next Article in Special Issue
Integrative Analysis of Dysregulated lncRNA-Associated ceRNA Network Reveals Functional lncRNAs in Gastric Cancer
Previous Article in Journal
Automated Recognition of RNA Structure Motifs by Their SHAPE Data Signatures
Previous Article in Special Issue
RECTA: Regulon Identification Based on Comparative Genomics and Transcriptomics Analysis
Open AccessArticle

Decision Variants for the Automatic Determination of Optimal Feature Subset in RF-RFE

by Qi Chen 1,2, Zhaopeng Meng 1,3, Xinyi Liu 1, Qianguo Jin 1 and Ran Su 1,4,*
1
School of Computer Software, Tianjin University, Tianjin 300350, China
2
The Military Transportation Command Department, Army Military Transportation University, Tianjin 300361, China
3
Tianjin University of Traditional Chinese Medicine, Tianjin 300193, China
4
State Key Laboratory of Medicinal Chemical Biology, Nankai University, Tianjin 300074, China
*
Author to whom correspondence should be addressed.
Genes 2018, 9(6), 301; https://doi.org/10.3390/genes9060301
Received: 25 April 2018 / Revised: 30 May 2018 / Accepted: 6 June 2018 / Published: 15 June 2018
Feature selection, which identifies a set of most informative features from the original feature space, has been widely used to simplify the predictor. Recursive feature elimination (RFE), as one of the most popular feature selection approaches, is effective in data dimension reduction and efficiency increase. A ranking of features, as well as candidate subsets with the corresponding accuracy, is produced through RFE. The subset with highest accuracy (HA) or a preset number of features (PreNum) are often used as the final subset. However, this may lead to a large number of features being selected, or if there is no prior knowledge about this preset number, it is often ambiguous and subjective regarding final subset selection. A proper decision variant is in high demand to automatically determine the optimal subset. In this study, we conduct pioneering work to explore the decision variant after obtaining a list of candidate subsets from RFE. We provide a detailed analysis and comparison of several decision variants to automatically select the optimal feature subset. Random forest (RF)-recursive feature elimination (RF-RFE) algorithm and a voting strategy are introduced. We validated the variants on two totally different molecular biology datasets, one for a toxicogenomic study and the other one for protein sequence analysis. The study provides an automated way to determine the optimal feature subset when using RF-RFE. View Full-Text
Keywords: feature selection; RFE; decision variant; random forest; voting feature selection; RFE; decision variant; random forest; voting
Show Figures

Figure 1

MDPI and ACS Style

Chen, Q.; Meng, Z.; Liu, X.; Jin, Q.; Su, R. Decision Variants for the Automatic Determination of Optimal Feature Subset in RF-RFE. Genes 2018, 9, 301.

Show more citation formats Show less citations formats
Note that from the first issue of 2016, MDPI journals use article numbers instead of page numbers. See further details here.

Article Access Map by Country/Region

1
Back to TopTop