Next Article in Journal
Voice-Based Detection of Parkinson’s Disease Using Machine and Deep Learning Approaches: A Systematic Review
Previous Article in Journal
AI-Assisted Response Surface Methodology for Growth Optimization and Industrial Applicability Evaluation of the Diatom Gedaniella flavovirens GFTA21
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
This is an early access version, the complete PDF, HTML, and XML versions will be available soon.
Article

A Two-Step Variable Selection Strategy for Multiply Imputed Survival Data Using Penalized Cox Models

by
Qian Yang
1,
Bin Luo
2,
Chenxi Yu
3 and
Susan Halabi
3,*
1
Division of Infectious Diseases, Department of Medicine, Emory University School of Medicine, Atlanta, GA 30322, USA
2
School of Data Science and Analytics, Kennesaw State University, Marietta, GA 30060, USA
3
Department of Biostatistics and Bioinformatics, Duke University, Durham, NC 27708, USA
*
Author to whom correspondence should be addressed.
Bioengineering 2025, 12(11), 1278; https://doi.org/10.3390/bioengineering12111278 (registering DOI)
Submission received: 1 September 2025 / Revised: 7 November 2025 / Accepted: 18 November 2025 / Published: 20 November 2025
(This article belongs to the Section Biosignal Processing)

Abstract

Multiple imputation (MI) is widely used for handling missing data. However, applying penalized methods after MI can be challenging because variable selection may be inconsistent across imputations. We propose a two-step variable selection method for multiply imputed datasets with survival outcomes: apply LASSO or ALASSO to each MI dataset, followed by ridge regression, and combine estimates using variable selected in any or d% (d = 50, 70, 90, 100) of the MI datasets. For comparison, we also fit stacked MI datasets with weighted penalized regression and a group LASSO approach that enforces consistent selection across imputations. Simulations with Cox models evaluated tuning by AIC, BIC, cross-validation at the minimum error, and the 1SE rule. Across scenarios, performance differed by both the penalization and the selection rule. More conservative choices such as ALASSO with BIC and a 50% inclusion frequency tended to control false positive and gave more stable calibration. The grouped approach achieved comparable selection with modestly higher estimation error. Overall, no single method consistently outperformed others across all scenarios. Our findings suggest that practitioners should weigh trade-offs between selection stability, estimation accuracy, and calibration when applying penalized methods to multiply imputed survival data.
Keywords: multiple imputation; penalized method; proportional hazards model; missing data multiple imputation; penalized method; proportional hazards model; missing data

Share and Cite

MDPI and ACS Style

Yang, Q.; Luo, B.; Yu, C.; Halabi, S. A Two-Step Variable Selection Strategy for Multiply Imputed Survival Data Using Penalized Cox Models. Bioengineering 2025, 12, 1278. https://doi.org/10.3390/bioengineering12111278

AMA Style

Yang Q, Luo B, Yu C, Halabi S. A Two-Step Variable Selection Strategy for Multiply Imputed Survival Data Using Penalized Cox Models. Bioengineering. 2025; 12(11):1278. https://doi.org/10.3390/bioengineering12111278

Chicago/Turabian Style

Yang, Qian, Bin Luo, Chenxi Yu, and Susan Halabi. 2025. "A Two-Step Variable Selection Strategy for Multiply Imputed Survival Data Using Penalized Cox Models" Bioengineering 12, no. 11: 1278. https://doi.org/10.3390/bioengineering12111278

APA Style

Yang, Q., Luo, B., Yu, C., & Halabi, S. (2025). A Two-Step Variable Selection Strategy for Multiply Imputed Survival Data Using Penalized Cox Models. Bioengineering, 12(11), 1278. https://doi.org/10.3390/bioengineering12111278

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop