Next Article in Journal
CircRNA-006258 Sponge-Adsorbs miR-574-5p to Regulate Cell Growth and Milk Synthesis via EVI5L in Goat Mammary Epithelial Cells
Previous Article in Journal
Four Individuals with a Homozygous Mutation in Exon 1f of the PLEC Gene and Associated Myasthenic Features
Open AccessArticle

Feature Selection for High-Dimensional and Imbalanced Biomedical Data Based on Robust Correlation Based Redundancy and Binary Grasshopper Optimization Algorithm

1
Department of Computer Sciences, Yusuf Maitama Sule University, 700222 Kofar Nassarawa, Kano, Nigeria
2
School of Computer Sciences, Universiti Sains Malaysia, 11800 Gelugor, Malaysia
*
Author to whom correspondence should be addressed.
Genes 2020, 11(7), 717; https://doi.org/10.3390/genes11070717
Received: 26 September 2019 / Revised: 19 December 2019 / Accepted: 7 January 2020 / Published: 27 June 2020
(This article belongs to the Section Technologies and Resources for Genetics)
The training machine learning algorithm from an imbalanced data set is an inherently challenging task. It becomes more demanding with limited samples but with a massive number of features (high dimensionality). The high dimensional and imbalanced data set has posed severe challenges in many real-world applications, such as biomedical data sets. Numerous researchers investigated either imbalanced class or high dimensional data sets and came up with various methods. Nonetheless, few approaches reported in the literature have addressed the intersection of the high dimensional and imbalanced class problem due to their complicated interactions. Lately, feature selection has become a well-known technique that has been used to overcome this problem by selecting discriminative features that represent minority and majority class. This paper proposes a new method called Robust Correlation Based Redundancy and Binary Grasshopper Optimization Algorithm (rCBR-BGOA); rCBR-BGOA has employed an ensemble of multi-filters coupled with the Correlation-Based Redundancy method to select optimal feature subsets. A binary Grasshopper optimisation algorithm (BGOA) is used to construct the feature selection process as an optimisation problem to select the best (near-optimal) combination of features from the majority and minority class. The obtained results, supported by the proper statistical analysis, indicate that rCBR-BGOA can improve the classification performance for high dimensional and imbalanced datasets in terms of G-mean and the Area Under the Curve (AUC) performance metrics. View Full-Text
Keywords: multi-filter; high dimensionality; class-imbalanced dataset; Grasshopper optimisation algorithm multi-filter; high dimensionality; class-imbalanced dataset; Grasshopper optimisation algorithm
Show Figures

Figure 1

MDPI and ACS Style

Abdulrauf Sharifai, G.; Zainol, Z. Feature Selection for High-Dimensional and Imbalanced Biomedical Data Based on Robust Correlation Based Redundancy and Binary Grasshopper Optimization Algorithm. Genes 2020, 11, 717.

Show more citation formats Show less citations formats
Note that from the first issue of 2016, MDPI journals use article numbers instead of page numbers. See further details here.

Article Access Map by Country/Region

1
Search more from Scilit
 
Search
Back to TopTop