Next Article in Journal
A New Subclass of Analytic Functions Defined by Using Salagean q-Differential Operator
Next Article in Special Issue
Two-Stage Classification with SIS Using a New Filter Ranking Method in High Throughput Data
Previous Article in Journal
Approximation-Free Output-Feedback Non-Backstepping Controller for Uncertain SISO Nonautonomous Nonlinear Pure-Feedback Systems
Previous Article in Special Issue
Skew-Reflected-Gompertz Information Quantifiers with Application to Sea Surface Temperature Records
Article Menu
Issue 5 (May) cover image

Export Article

Open AccessArticle

On the Performance of Variable Selection and Classification via Rank-Based Classifier

Department of Mathematical Sciences, The University of Texas at El Paso, El Paso, TX 79968, USA
*
Author to whom correspondence should be addressed.
These authors contributed equally to this work.
Mathematics 2019, 7(5), 457; https://doi.org/10.3390/math7050457
Received: 26 April 2019 / Revised: 11 May 2019 / Accepted: 14 May 2019 / Published: 21 May 2019
(This article belongs to the Special Issue Uncertainty Quantification Techniques in Statistics)
  |  
PDF [356 KB, uploaded 21 May 2019]
  |  

Abstract

In high-dimensional gene expression data analysis, the accuracy and reliability of cancer classification and selection of important genes play a very crucial role. To identify these important genes and predict future outcomes (tumor vs. non-tumor), various methods have been proposed in the literature. But only few of them take into account correlation patterns and grouping effects among the genes. In this article, we propose a rank-based modification of the popular penalized logistic regression procedure based on a combination of 1 and 2 penalties capable of handling possible correlation among genes in different groups. While the 1 penalty maintains sparsity, the 2 penalty induces smoothness based on the information from the Laplacian matrix, which represents the correlation pattern among genes. We combined logistic regression with the BH-FDR (Benjamini and Hochberg false discovery rate) screening procedure and a newly developed rank-based selection method to come up with an optimal model retaining the important genes. Through simulation studies and real-world application to high-dimensional colon cancer gene expression data, we demonstrated that the proposed rank-based method outperforms such currently popular methods as lasso, adaptive lasso and elastic net when applied both to gene selection and classification. View Full-Text
Keywords: gene-expression data; 2 ridge; 1 lasso; adapative lasso; elastic net; BH-FDR; Laplacian matrix gene-expression data; 2 ridge; 1 lasso; adapative lasso; elastic net; BH-FDR; Laplacian matrix
Figures

Figure 1

This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited (CC BY 4.0).
SciFeed

Share & Cite This Article

MDPI and ACS Style

Sarker, M.S.R.; Pokojovy, M.; Kim, S. On the Performance of Variable Selection and Classification via Rank-Based Classifier. Mathematics 2019, 7, 457.

Show more citation formats Show less citations formats

Note that from the first issue of 2016, MDPI journals use article numbers instead of page numbers. See further details here.

Related Articles

Article Metrics

Article Access Statistics

1

Comments

[Return to top]
Mathematics EISSN 2227-7390 Published by MDPI AG, Basel, Switzerland RSS E-Mail Table of Contents Alert
Back to Top