A Two-Step Variable Selection Strategy for Multiply Imputed Survival Data Using Penalized Cox Models

Yang, Qian; Luo, Bin; Yu, Chenxi; Halabi, Susan

doi:10.3390/bioengineering12111278

This is an early access version, the complete PDF, HTML, and XML versions will be available soon.

Open AccessArticle

A Two-Step Variable Selection Strategy for Multiply Imputed Survival Data Using Penalized Cox Models

by

Qian Yang

¹,

Bin Luo

²

,

Chenxi Yu

³ and

Susan Halabi

^3,*

¹

Division of Infectious Diseases, Department of Medicine, Emory University School of Medicine, Atlanta, GA 30322, USA

²

School of Data Science and Analytics, Kennesaw State University, Marietta, GA 30060, USA

³

Department of Biostatistics and Bioinformatics, Duke University, Durham, NC 27708, USA

^*

Author to whom correspondence should be addressed.

Bioengineering 2025, 12(11), 1278; https://doi.org/10.3390/bioengineering12111278 (registering DOI)

Submission received: 1 September 2025 / Revised: 7 November 2025 / Accepted: 18 November 2025 / Published: 20 November 2025

(This article belongs to the Section Biosignal Processing)

Download Versions Notes

Abstract

Multiple imputation (MI) is widely used for handling missing data. However, applying penalized methods after MI can be challenging because variable selection may be inconsistent across imputations. We propose a two-step variable selection method for multiply imputed datasets with survival outcomes: apply LASSO or ALASSO to each MI dataset, followed by ridge regression, and combine estimates using variable selected in any or d% (d = 50, 70, 90, 100) of the MI datasets. For comparison, we also fit stacked MI datasets with weighted penalized regression and a group LASSO approach that enforces consistent selection across imputations. Simulations with Cox models evaluated tuning by AIC, BIC, cross-validation at the minimum error, and the 1SE rule. Across scenarios, performance differed by both the penalization and the selection rule. More conservative choices such as ALASSO with BIC and a 50% inclusion frequency tended to control false positive and gave more stable calibration. The grouped approach achieved comparable selection with modestly higher estimation error. Overall, no single method consistently outperformed others across all scenarios. Our findings suggest that practitioners should weigh trade-offs between selection stability, estimation accuracy, and calibration when applying penalized methods to multiply imputed survival data.

Keywords: multiple imputation; penalized method; proportional hazards model; missing data

Share and Cite

MDPI and ACS Style

Yang, Q.; Luo, B.; Yu, C.; Halabi, S. A Two-Step Variable Selection Strategy for Multiply Imputed Survival Data Using Penalized Cox Models. Bioengineering 2025, 12, 1278. https://doi.org/10.3390/bioengineering12111278

AMA Style

Yang Q, Luo B, Yu C, Halabi S. A Two-Step Variable Selection Strategy for Multiply Imputed Survival Data Using Penalized Cox Models. Bioengineering. 2025; 12(11):1278. https://doi.org/10.3390/bioengineering12111278

Chicago/Turabian Style

Yang, Qian, Bin Luo, Chenxi Yu, and Susan Halabi. 2025. "A Two-Step Variable Selection Strategy for Multiply Imputed Survival Data Using Penalized Cox Models" Bioengineering 12, no. 11: 1278. https://doi.org/10.3390/bioengineering12111278

APA Style

Yang, Q., Luo, B., Yu, C., & Halabi, S. (2025). A Two-Step Variable Selection Strategy for Multiply Imputed Survival Data Using Penalized Cox Models. Bioengineering, 12(11), 1278. https://doi.org/10.3390/bioengineering12111278

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Two-Step Variable Selection Strategy for Multiply Imputed Survival Data Using Penalized Cox Models

Abstract

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI