LASSO and Elastic Net Tend to Over-Select Features
Abstract
:1. Introduction
2. Materials and Methods
2.1. Statistical Models
2.1.1. Logistic Regression
2.1.2. Cox’s Proportional Hazards Model (PHM)
2.1.3. Stepwise Variable Selection (SVS)
2.2. Machine Learning Methods
2.2.1. LASSO
2.2.2. Elastic Net
2.3. Performance Measurements
3. Results
3.1. Impact of Over-Selection
3.2. Comparison of Prediction Methods
3.3. Real-Data Analysis
4. Discussion
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
Abbreviations
AIC | Akaike information criterion |
AUC | area under the curve |
BIC | Bayesian information criterion |
C-SVS | Cox regression with forward stepwise selection |
EN | elastic net |
LASSO | least absolute shrinkage and selection operator |
L-SVS | logistic regression with forward stepwise selection |
ML | machine learning |
MLE | maximum likelihood estimation |
PHM | proportional hazards model |
RF | random forest |
ROC | receiver operating characteristic |
R-SVS | regression methods with stepwise selection |
SLN | sentinel lymph node |
SVS | stepwise variable selection |
References
- Engelhard, M.M.; Navar, A.M.; Pencina, M.J. Incremental Benefits of Machine Learning—When Do We Need a Better Mousetrap. JAMA Cardiol. 2021, 6, 621–623. [Google Scholar] [CrossRef] [PubMed]
- Tibshirani, R. Regression Shrinkage and Selection via the Lasso. J. R. Stat. Soc. Ser. B Methodol. 1996, 58, 267–288. [Google Scholar] [CrossRef]
- Lee, J.; Sohn, I.; Do, I.G.; Kim, K.M.; Park, S.H.; Park, J.O.; Park, Y.S.; Lim, H.Y.; Sohn, T.S.; Bae, J.M.; et al. Nanostring-based multigene assay to predict recurrence for gastric cancer patients after surgery. PLoS ONE 2014, 9, e90133. [Google Scholar] [CrossRef]
- Simon, N.; Tibshirani, R. Standardization and the group LASSO penalty. Stat. Sin. 2012, 22, 983–1001. [Google Scholar] [CrossRef] [PubMed]
- Zou, H.; Hastie, T. Regularization and Variable Selection via the Elastic Net. J. R. Stat. Soc. Ser. B Stat. Methodol. 2005, 67, 301–320. [Google Scholar] [CrossRef]
- Huang, J.; Ma, S.; Zhang, C.H. Adaptive LASSO for Sparse High-Dimensional Regression Models. Stat. Sin. 2008, 18, 1603–1618. [Google Scholar]
- Stylianou, N.; Akbarov, A.; Kontopantelis, E.; Buchan, I.; Dunn, K.W. Mortality risk prediction in burn injury: Comparison of logistic regression with machine learning approaches. Burns 2015, 41, 925–934. [Google Scholar] [CrossRef]
- Kuhle, S.; Maguire, B.; Zhang, H.; Hamilton, D.; Allen, A.C.; Joseph, K.S.; Allen, V.M. Comparison of logistic regression with machine learning methods for the prediction of fetal growth abnormalities: A retrospective cohort study. BMC Pregnancy Childbirth 2018, 18, 333. [Google Scholar] [CrossRef]
- Christodoulou, E.; Ma, J.; Collins, G.S.; Steyerberg, E.W.; Verbakel, J.Y.; Van Calster, B. A systematic review shows no performance benefit of machine learning over logistic regression for clinical prediction models. J. Clin. Epidemiol. 2019, 110, 12–22. [Google Scholar] [CrossRef]
- Piros, P.; Ferenci, T.; Fleiner, R.; Andréka, P.; Fujita, H.; Főző, L.; Kovács, L.; Jánosi, A. Comparing machine learning and regression models for mortality prediction based on the Hungarian Myocardial Infarction Registry. Knowl.-Based Syst. 2019, 179, 1–7. [Google Scholar] [CrossRef]
- Khera, R.; Haimovich, J.; Hurley, N.C.; McNamara, R.; Spertus, J.A.; Desai, N.; Rumsfeld, J.S.; Masoudi, F.A.; Huang, C.; Norm, S.L.; et al. Use of Machine Learning Models to Predict Death After Acute Myocardial Infarction. JAMA Cardiol. 2021, 6, 633–641. [Google Scholar] [CrossRef]
- Song, X.; Liu, X.; Liu, F.; Wang, C. Comparison of machine learning and logistic regression models in predicting acute kidney injury: A systematic review and meta-analysis. Int. J. Med. Inform. 2021, 151, 104484. [Google Scholar] [CrossRef]
- Jing, B.; Boscardin, W.J.; Deardorff, W.J.; Jeon, S.Y.; Lee, A.K.; Donovan, A.L.; Lee, S.J. Comparing Machine Learning to Regression Methods for Mortality Prediction Using Veterans Affairs Electronic Health Record Clinical Data. Med. Care 2022, 60, 470–479. [Google Scholar] [CrossRef]
- Kattan, M.W. Comparison of Cox regression with other methods for determining prediction models and nomograms. J. Urol. 2003, 170, S6–S10. [Google Scholar] [CrossRef]
- Cox, D.R. Regression Models and Life-Tables. J. R. Stat. Soc. Ser. B Methodol. 1972, 34, 187–220. [Google Scholar] [CrossRef]
- Hastie, T.; Tibshirani, R.; Tibshirani, R. Best Subset, Forward Stepwise or Lasso? Analysis and Recommendations Based on Extensive Comparisons. Stat. Sci. 2020, 35, 579–592. [Google Scholar] [CrossRef]
- Gauthier, P.A.; Scullion, W.; Berry, A. Sound quality prediction based on systematic metric selection and shrinkage: Comparison of stepwise, lasso, and elastic-net algorithms and clustering preprocessing. J. Sound Vib. 2017, 400, 134–153. [Google Scholar] [CrossRef]
- Kumar, S.; Attri, S.D.; Singh, K.K. Comparison of Lasso and stepwise regression technique for wheat yield prediction. J. Agrometeorol. 2019, 21, 188–192. [Google Scholar] [CrossRef]
- Farrow, N.E.; Holl, E.K.; Jung, J.; Gao, J.; Jung, S.H.; Al-Rohil, R.N.; Selim, M.A.; Mosca, P.J.; Ollila, D.W.; Antonia, S.J.; et al. Characterization of Sentinel Lymph Node Immune Signatures and Implications for Risk Stratification for Adjuvant Therapy in Melanoma. Ann. Surg. Oncol. 2021, 28, 3501–3510. [Google Scholar] [CrossRef]
- Tolles, J.; Meurer, W.J. Logistic Regression: Relating Patient Characteristics to Outcomes. JAMA 2016, 316, 533–534. [Google Scholar] [CrossRef]
- Tibshirani, R. The lasso Method for Variable Selection in the Cox Model. Stat. Med. 1997, 16, 385–395. [Google Scholar] [CrossRef]
- Simon, R.; Radmacher, M.D.; Dobbin, K.; McShane, L.M. Pitfalls in the use of DNA microarray data for diagnostic and prognostic classification. J. Natl. Cancer Inst. 2003, 95, 14–18. [Google Scholar] [CrossRef]
- Meinshausen, N.; Yu, B. Lasso-type recovery of sparse representations for high-dimensional data. Ann. Stat. 2009, 37, 246–270. [Google Scholar] [CrossRef]
- Wang, S.; Nan, B.; Rosset, S.; Zhu, J. RANDOM LASSO. Ann. Appl. Stat. 2011, 5, 468–485. [Google Scholar] [CrossRef]
- Yamada, M.; Jitkrittum, W.; Sigal, L.; Xing, E.P.; Sugiyama, M. High-Dimensional Feature Selection by Feature-Wise Kernelized Lasso. Neural Comput. 2014, 26, 185–207. [Google Scholar] [CrossRef]
- Liang, J.; Wang, C.; Zhang, D.; Xie, Y.; Zeng, Y.; Li, T.; Zuo, Z.; Ren, J.; Zhao, Q. VSOLassoBag: A variable-selection oriented LASSO bagging algorithm for biomarker discovery in omic-based translational research. J. Genet. Genom. 2023, 50, 151–162. [Google Scholar] [CrossRef]
(S1) | (S2) | |||||
---|---|---|---|---|---|---|
LASSO | EN | L-SVS | LASSO | EN | L-SVS | |
(i) | ||||||
Total Selections | 31.15 | 74.59 | 5.32 | 28.59 | 56.61 | 5.32 |
True Selections | 4.03 | 4.49 | 3.25 | 4.28 | 4.60 | 3.25 |
AUC-Training | 0.92 | 0.95 | 0.84 | 0.92 | 0.95 | 0.85 |
AUC-Validation | 0.70 | 0.69 | 0.71 | 0.72 | 0.71 | 0.71 |
(ii) | ||||||
Total Selections | 34.64 | 101.47 | 5.63 | 44.56 | 84.76 | 6.88 |
True Selections | 6.73 | 7.92 | 3.60 | 7.76 | 8.19 | 4.31 |
AUC-Training | 0.92 | 0.96 | 0.83 | 0.96 | 0.97 | 0.87 |
AUC-Validation | 0.69 | 0.68 | 0.67 | 0.73 | 0.72 | 0.70 |
(S1) | (S2) | |||||
---|---|---|---|---|---|---|
LASSO | EN | C-SVS | LASSO | EN | C-SVS | |
(i) & 30% Censoring | ||||||
Total Selections | 16.51 | 24.30 | 4.80 | 19.28 | 25.89 | 5.37 |
True Selections | 4.11 | 4.46 | 3.53 | 4.45 | 4.53 | 3.78 |
− p-value (training) | 22.79 | 26.25 | 17.32 | 25.76 | 28.24 | 19.68 |
− p-value (validation) | 9.06 | 8.76 | 9.79 | 10.45 | 10.20 | 10.41 |
C-index (training) | 0.74 | 0.76 | 0.71 | 0.76 | 0.77 | 0.73 |
C-index (validation) | 0.64 | 0.64 | 0.65 | 0.66 | 0.66 | 0.66 |
(ii) & 10% Censoring | ||||||
Total Selections | 20.16 | 23.89 | 5.74 | 20.96 | 25.00 | 5.86 |
True Selections | 4.61 | 4.82 | 4.44 | 4.78 | 4.82 | 4.43 |
− p-value (training) | 27.60 | 29.72 | 22.51 | 29.73 | 31.61 | 23.83 |
− p-value (validation) | 12.89 | 12.88 | 14.95 | 14.66 | 14.39 | 15.69 |
C-index (training) | 0.74 | 0.75 | 0.71 | 0.75 | 0.76 | 0.72 |
C-index (validation) | 0.66 | 0.66 | 0.67 | 0.67 | 0.67 | 0.68 |
(iii) & 30% Censoring | ||||||
Total Selections | 26.68 | 36.83 | 6.57 | 30.35 | 37.73 | 7.85 |
True Selections | 7.52 | 8.26 | 4.89 | 8.44 | 8.61 | 5.69 |
− p-value (training) | 30.12 | 34.26 | 20.61 | 34.18 | 36.50 | 24.90 |
− p-value (validation) | 9.89 | 9.87 | 9.07 | 13.07 | 12.87 | 11.48 |
C-index (training) | 0.78 | 0.81 | 0.73 | 0.81 | 0.82 | 0.76 |
C-index (validation) | 0.66 | 0.66 | 0.64 | 0.69 | 0.69 | 0.67 |
(iv) & 10% Censoring | ||||||
Total Selections | 29.46 | 36.69 | 8.52 | 32.47 | 37.53 | 9.65 |
True Selections | 8.56 | 8.85 | 6.69 | 9.10 | 9.32 | 7.71 |
− p-value (training) | 36.21 | 38.68 | 27.61 | 39.76 | 41.38 | 32.14 |
− p-value (validation) | 14.77 | 14.44 | 15.62 | 18.44 | 18.26 | 19.54 |
C-index (training) | 0.78 | 0.79 | 0.74 | 0.80 | 0.80 | 0.76 |
C-index (validation) | 0.67 | 0.67 | 0.68 | 0.70 | 0.70 | 0.70 |
Method | # Selected Features | AUC | ||
---|---|---|---|---|
Training | Validation | Selected Features | ||
LASSO | 13 | 1.0000 | 0.6619 | add_trt, GE_CCL1, GE_CLEC6A, |
GE_HLA_DQA1, GE_IL1RL1, GE_IL25, | ||||
GE_MAGEA12, GE_MASP1, GE_MASP2, | ||||
GE_PRAME, GE_S100B, GE_SAA1, | ||||
GE_USP9Y | ||||
Elastic Net | 13 | 1.0000 | 0.6381 | add_trt, GE_CCL1, GE_CLEC6A, |
GE_HLA_DQA1, GE_IL1RL1, GE_IL1RL2, | ||||
GE_IL25, GE_MASP1, GE_MASP2, | ||||
GE_PRAME, GE_S100B, GE_SAA1, | ||||
GE_USP9Y | ||||
L-SVS | 4 | 0.9833 | 0.6905 | add_trt, GE_IL1RL1, GE_IL17F, GE_IL1RL2 |
Method | # Selected Features | − p-Value | C-Index | ||
---|---|---|---|---|---|
Training | Validation | Training | Validation | ||
LASSO | 5 | 3.7153 | 1.3623 | 0.8848 | 0.6959 |
Elastic Net | 11 | 3.8872 | 1.1965 | 0.9058 | 0.6701 |
C-SVS | 4 | 3.1182 | 1.4518 | 0.9634 | 0.6907 |
Selected Features | |||||
LASSO | add_trt, GE_CCL3, GE_CCL4, GE_IL17A, GE_NEFL | ||||
Elastic Net | add_trt, GE_CCL3, GE_CCL4, GE_CRP, GE_CXCL1, | ||||
GE_CXCR4, GE_HLA_DRB4, GE_IL17A, GE_IL8, | |||||
GE_MAGEA12, GE_NEFL | |||||
C-SVS | add_trt, GE_NEFL, GE_IFNL1, GE_MAGEC1 |
LASSO | Elastic Net | C-SVS | ||||||
---|---|---|---|---|---|---|---|---|
Feature | Coef. | p-Value | Feature | Coef. | p-Value | Feature | Coef. | p-Value |
add_trt | 4.558 | 0.004 | add_trt | 3.021 | 0.251 | add_trt | 5.352 | 0.001 |
GE_CCL3 | 3.634 | 0.153 | GE_CCL3 | 0.094 | 0.908 | GE_NEFL | 0.006 | |
GE_CCL4 | 1.163 | 0.563 | GE_CCL4 | 6.818 | 0.693 | GE_IFNL1 | 0.146 | 0.003 |
GE_IL17A | 6.268 | 0.073 | GE_CRP | 0.191 | GE_MAGEC1 | 0.011 | ||
GE_NEFL | 0.023 | GE_CXCL1 | 7.499 | 0.164 | ||||
GE_CXCR4 | 0.548 | |||||||
GE_HLA_DRB4 | 0.412 | |||||||
GE_IL17A | 5.879 | 0.202 | ||||||
GE_IL8 | 0.802 | |||||||
GE_MAGEA12 | 0.846 | |||||||
GE_NEFL | 0.736 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Liu, L.; Gao, J.; Beasley, G.; Jung, S.-H. LASSO and Elastic Net Tend to Over-Select Features. Mathematics 2023, 11, 3738. https://doi.org/10.3390/math11173738
Liu L, Gao J, Beasley G, Jung S-H. LASSO and Elastic Net Tend to Over-Select Features. Mathematics. 2023; 11(17):3738. https://doi.org/10.3390/math11173738
Chicago/Turabian StyleLiu, Lu, Junheng Gao, Georgia Beasley, and Sin-Ho Jung. 2023. "LASSO and Elastic Net Tend to Over-Select Features" Mathematics 11, no. 17: 3738. https://doi.org/10.3390/math11173738
APA StyleLiu, L., Gao, J., Beasley, G., & Jung, S.-H. (2023). LASSO and Elastic Net Tend to Over-Select Features. Mathematics, 11(17), 3738. https://doi.org/10.3390/math11173738