Next Article in Journal
Solvability of Coupled Systems of Generalized Hammerstein-Type Integral Equations in the Real Line
Previous Article in Journal
Bayesian Derivative Order Estimation for a Fractional Logistic Model
Previous Article in Special Issue
An Estimation of Sensitive Attribute Applying Geometric Distribution under Probability Proportional to Size Sampling
Open AccessArticle

Combination of Ensembles of Regularized Regression Models with Resampling-Based Lasso Feature Selection in High Dimensional Data

1
Computational Science, University of Texas at El Paso, El Paso, TX 79968, USA
2
Department of Mathematical Sciences, University of Texas at El Paso, El Paso, TX 79968, USA
*
Author to whom correspondence should be addressed.
Mathematics 2020, 8(1), 110; https://doi.org/10.3390/math8010110
Received: 14 October 2019 / Revised: 3 January 2020 / Accepted: 4 January 2020 / Published: 10 January 2020
(This article belongs to the Special Issue Uncertainty Quantification Techniques in Statistics)
In high-dimensional data, the performances of various classifiers are largely dependent on the selection of important features. Most of the individual classifiers with the existing feature selection (FS) methods do not perform well for highly correlated data. Obtaining important features using the FS method and selecting the best performing classifier is a challenging task in high throughput data. In this article, we propose a combination of resampling-based least absolute shrinkage and selection operator (LASSO) feature selection (RLFS) and ensembles of regularized regression (ERRM) capable of dealing data with the high correlation structures. The ERRM boosts the prediction accuracy with the top-ranked features obtained from RLFS. The RLFS utilizes the lasso penalty with sure independence screening (SIS) condition to select the top k ranked features. The ERRM includes five individual penalty based classifiers: LASSO, adaptive LASSO (ALASSO), elastic net (ENET), smoothly clipped absolute deviations (SCAD), and minimax concave penalty (MCP). It was built on the idea of bagging and rank aggregation. Upon performing simulation studies and applying to smokers’ cancer gene expression data, we demonstrated that the proposed combination of ERRM with RLFS achieved superior performance of accuracy and geometric mean. View Full-Text
Keywords: ensembles; feature selection; high-throughput; gene expression data; resampling; lasso; adaptive lasso; elastic net; SCAD; MCP ensembles; feature selection; high-throughput; gene expression data; resampling; lasso; adaptive lasso; elastic net; SCAD; MCP
Show Figures

Figure 1

MDPI and ACS Style

Patil, A.R.; Kim, S. Combination of Ensembles of Regularized Regression Models with Resampling-Based Lasso Feature Selection in High Dimensional Data. Mathematics 2020, 8, 110.

Show more citation formats Show less citations formats
Note that from the first issue of 2016, MDPI journals use article numbers instead of page numbers. See further details here.

Article Access Map by Country/Region

1
Back to TopTop