# The Missing Indicator Approach for Accelerated Failure Time Model with Covariates Subject to Limits of Detection

^{1}

^{2}

^{*}

## Abstract

**:**

## 1. Introduction

## 2. Notations and Model

`survreg()`function from

`R`’s

`survival`package [23] is available for fitting such a parametric AFT model.

## 3. Estimating Procedures in the Presence of LOD

#### 3.1. Complete-Case Analysis

`survreg()`when data contain missing values.

#### 3.2. Parametric Substitution Approaches

#### 3.3. Parametric Multiple Imputation Approaches

`mice`[28], in that the proposed method targets imputation values outside of the observed region. Our MI method can be easily implemented and is flexible in that different parametric assumptions can be implied for different covariates.

#### 3.4. Missing Indicator Approaches

## 4. Simulation

**Complete-case analysis**

**M1**- removal of subjects with missing ${X}_{ij}^{\ast}$.

**Substitution methods:**

**M2**- substitution of the missing ${X}_{ij}^{\ast}$ by ${L}_{j}/2$ or $2{U}_{j}$.
**M3**- substitution of the missing ${X}_{ij}^{\ast}$ by ${L}_{j}/\sqrt{2}$ or $\sqrt{2}{U}_{j}$.
**M4**- substitution of the missing ${X}_{ij}^{\ast}$ by $E\left({X}_{ij}^{\ast}\right|{X}_{ij}^{\ast}<{L}_{j})$ or $E\left({X}_{ij}^{\ast}\right|{X}_{ij}^{\ast}>{U}_{j})$ under normal assumptions.

**Multiple imputation approaches:**

**M5**- MI of the missing ${X}_{ij}^{\ast}$ using the predictive mean matching (PMM) algorithm implemented in the
`R`package`mice`[28]. **M6**- MI of the missing ${X}_{ij}^{\ast}$ using conditional densities derived under normal assumptions as described in Section 3.3.

**Missing indicator approaches:**

**M7**- the missing indicator approaches (MDI) model.
**M8**- the expanded MDI model.

**Missing-indicator-embedded multiple imputation approaches (MI + MDI):**

**M9**- MI by PMM and fit with MDI model.
**M10**- MI by normal assumptions and fit with MDI model.
**M11**- MI by PMM and fit with expanded MDI model.
**M12**- MI by normal assumptions and fit with expanded MDI model.

`survreg()`function in the

`survival`package [23] in R [29] under the normal error assumption, e.g., with argument

`dist = "lognormal"`. For the scenarios considered, the CC approach (M1) sometimes failed to converge as the resultant sample size was too small or empty after removing missing observations. The convergence rate for the CC approach under different scenarios presented in the Supplementary Materials shows fewer converged replications when the sample size is small (e.g., $n=50$) or the missing proportions are high (e.g., ${m}_{1}=60\%$ or ${m}_{2}=60\%$). For this reason, the simulation results were based on the converged replications for the CC approach. For MI methods, the number of imputations M was set to 5.

## 5. Discussion

## Supplementary Materials

## Author Contributions

## Funding

## Institutional Review Board Statement

## Informed Consent Statement

## Conflicts of Interest

## References

- Bernhardt, P.W.; Wang, H.J.; Zhang, D. Statistical methods for generalized linear models with covariates subject to detection limits. Stat. Biosci.
**2015**, 7, 68–89. [Google Scholar] [CrossRef] [PubMed] - Kong, S.; Nan, B. Semiparametric approach to regression with a covariate subject to a detection limit. Biometrika
**2016**, 103, 161–174. [Google Scholar] [CrossRef] [Green Version] - Arnaout, R.; Lee, R.A.; Lee, G.R.; Callahan, C.; Yen, C.F.; Smith, K.P.; Arora, R.; Kirby, J.E. SARS-CoV2 testing: The limit of detection matters. bioRxiv
**2020**. [Google Scholar] [CrossRef] - Lou, Y.; Chen, C.; Long, X.; Gu, J.; Xiao, M.; Wang, D.; Zhou, X.; Li, T.; Hong, Z.; Li, C.; et al. Detection and Quantification of Chimeric Antigen Receptor Transgene Copy Number by Droplet Digital PCR versus Real-Time PCR. J. Mol. Diagn.
**2020**, 22, 699–707. [Google Scholar] [CrossRef] - Lin, D.Y.; Zeng, D.; Couper, D. A general framework for integrative analysis of incomplete multiomics data. Genet. Epidemiol.
**2020**, 44, 646–664. [Google Scholar] [CrossRef] - Jones, M.P. Indicator and stratification methods for missing explanatory variables in multiple linear regression. J. Am. Stat. Assoc.
**1996**, 91, 222–230. [Google Scholar] [CrossRef] - Nie, L.; Chu, H.; Liu, C.; Cole, S.R.; Vexler, A.; Schisterman, E.F. Linear regression with an independent variable subject to a detection limit. Epidemiology
**2010**, 21, S17. [Google Scholar] [CrossRef] [Green Version] - Arunajadai, S.G.; Rauh, V.A. Handling covariates subject to limits of detection in regression. Environ. Ecol. Stat.
**2012**, 19, 369–391. [Google Scholar] [CrossRef] - Schisterman, E.F.; Vexler, A.; Whitcomb, B.W.; Liu, A. The limitations due to exposure detection limits for regression models. Am. J. Epidemiol.
**2006**, 163, 374–383. [Google Scholar] [CrossRef] [Green Version] - Tran, T.M.; Abrams, S.; Aerts, M.; Maertens, K.; Hens, N. Measuring association among censored antibody titer data. Stat. Med.
**2021**, 40, 3740–3761. [Google Scholar] [CrossRef] - Richardson, D.B.; Ciampi, A. Effects of exposure measurement error when an exposure variable is constrained by a lower limit. Am. J. Epidemiol.
**2003**, 157, 355–363. [Google Scholar] [CrossRef] [PubMed] [Green Version] - Anderson, A.B.; Basilevsky, A.; Hum, D.P. Missing data: A review of the literature. In Handbook of Survey Research; Academic Press: Cambridge, MA, USA, 1983; pp. 415–494. [Google Scholar]
- Chow, W.K. A look at various estimators in logistic models in the presence of missing values. In Technical Report; Rand Corp: Santa Monica, CA, USA, 1979. [Google Scholar]
- Cohen, J.; Cohen, P.; West, S.G.; Aiken, L.S. Applied Multiple Regression/Correlation Analysis for the Behavioral Sciences; Taylor & Francis: Oxfordshire, UK, 2013. [Google Scholar]
- Chiou, S.H.; Betensky, R.A.; Balasubramanian, R. The missing indicator approach for censored covariates subject to limit of detection in logistic regression models. Ann. Epidemiol.
**2019**, 38, 57–64. [Google Scholar] [CrossRef] [PubMed] - Ortega-Villa, A.M.; Liu, D.; Ward, M.H.; Albert, P.S. New insights into modeling exposure measurements below the limit of detection. Environ. Epidemiol.
**2021**, 5, e116. [Google Scholar] [CrossRef] [PubMed] - Blackhurst, M. Identifying Lead Service Lines with Field Tap Water Sampling. ACS ES T Water
**2021**, 1, 1983–1991. [Google Scholar] [CrossRef] - Choi, J.; Dekkers, O.M.; le Cessie, S. A comparison of different methods to handle missing data in the context of propensity score analysis. Eur. J. Epidemiol.
**2019**, 34, 23–36. [Google Scholar] [CrossRef] [Green Version] - Sperrin, M.; Martin, G.P. Multiple imputation with missing indicators as proxies for unmeasured variables: Simulation study. BMC Med. Res. Methodol.
**2020**, 20, 185. [Google Scholar] [CrossRef] - Lee, S.; Park, S.; Park, J. The proportional hazards regression with a censored covariate. Stat. Probab. Lett.
**2003**, 61, 309–319. [Google Scholar] [CrossRef] - Dinse, G.E.; Jusko, T.A.; Ho, L.A.; Annam, K.; Graubard, B.I.; Hertz-Picciotto, I.; Miller, F.W.; Gillespie, B.W.; Weinberg, C.R. Accommodating measurements below a limit of detection: A novel application of Cox regression. Am. J. Epidemiol.
**2014**, 179, 1018–1024. [Google Scholar] [CrossRef] [Green Version] - Bernhardt, P.W.; Wang, H.J.; Zhang, D. Flexible modeling of survival data with covariates subject to detection limits via multiple imputation. Comput. Stat. Data Anal.
**2014**, 69, 81–91. [Google Scholar] [CrossRef] [Green Version] - Therneau, T.M. A Package for Survival Analysis in R; R Package Version 3.2-13. Available online: https://CRAN.R-project.org/package=survival (accessed on 23 March 2022).
- Hughes, R.A.; Heron, J.; Sterne, J.A.; Tilling, K. Accounting for missing data in statistical analyses: Multiple imputation is not always the answer. Int. J. Epidemiol.
**2019**, 48, 1294–1304. [Google Scholar] [CrossRef] - Hornung, R.W.; Reed, L.D. Estimation of average concentration in the presence of nondetectable values. Appl. Occup. Environ. Hyg.
**1990**, 5, 46–51. [Google Scholar] [CrossRef] - Baccarelli, A.; Pfeiffer, R.; Consonni, D.; Pesatori, A.C.; Bonzini, M.; Patterson Jr, D.G.; Bertazzi, P.A.; Landi, M.T. Handling of dioxin measurement data in the presence of non-detectable values: Overview of available methods and their application in the Seveso chloracne study. Chemosphere
**2005**, 60, 898–906. [Google Scholar] [CrossRef] [PubMed] - Rubin, D.B. Statistical matching using file concatenation with adjusted weights and multiple imputations. J. Bus. Econ. Stat.
**1986**, 4, 87–94. [Google Scholar] - van Buuren, S.; Groothuis-Oudshoorn, K. mice: Multivariate Imputation by Chained Equations in R. J. Stat. Softw.
**2011**, 45, 1–67. [Google Scholar] [CrossRef] [Green Version] - R Core Team. R: A Language and Environment for Statistical Computing; R Foundation for Statistical Computing: Vienna, Austria, 2017. [Google Scholar]
- Groenwold, R.H.; White, I.R.; Donders, A.R.T.; Carpenter, J.R.; Altman, D.G.; Moons, K.G. Missing covariate data in clinical research: When and when not to use the missing-indicator method for analysis. CMAJ
**2012**, 184, 1265–1269. [Google Scholar] [CrossRef] [PubMed] [Green Version] - Zhuchkova, S.; Rotmistrov, A. A Comparison Of The Missing-Indicator Method And Complete Case Analysis In Case Of Categorical Data. In Higher School of Economics Research Paper No. WP BRP; Social Science Research Network: Rochester, NY, USA, 2019; Volume 87. [Google Scholar]
- Blake, H.A.; Leyrat, C.; Mansfield, K.E.; Tomlinson, L.A.; Carpenter, J.; Williamson, E.J. Estimating treatment effects with partially observed covariates using outcome regression with missing indicators. Biom. J.
**2020**, 62, 428–443. [Google Scholar] [CrossRef] [PubMed] - Blake, H.A.; Leyrat, C.; Mansfield, K.E.; Seaman, S.; Tomlinson, L.A.; Carpenter, J.; Williamson, E.J. Propensity scores using missingness pattern information: A practical guide. Stat. Med.
**2020**, 39, 1641–1657. [Google Scholar] [CrossRef] [Green Version] - Qian, J.; Chiou, S.H.; Maye, J.E.; Atem, F.; Johnson, K.A.; Betensky, R.A. Threshold regression to accommodate a censored covariate. Biometrics
**2018**, 74, 1261–1270. [Google Scholar] [CrossRef]

**Figure 1.**Violin plots showing the empirical distribution of the bias associated with MLE of ${\beta}_{1}$ (red) and ${\beta}_{2}$ (green) when covariates are independent and ${X}_{ij}^{\ast},j=1,2$ is subjected to lower LOD. (

**a**) Bias under $n=50$ and ${m}_{1}={m}_{2}=20\%$. (

**b**) Bias under $n=100$ and ${m}_{1}={m}_{2}=20\%$. (

**c**) Bias under $n=50$ and ${m}_{1}={m}_{2}=40\%$. (

**d**) Bias under $n=100$ and ${m}_{1}={m}_{2}=40\%$. (

**e**) Bias under $n=50$ and ${m}_{1}={m}_{2}=60\%$. (

**f**) Bias under $n=100$ and ${m}_{1}={m}_{2}=60\%$.

**Figure 2.**Violin plots showing the empirical distribution of the bias associated with MLE of ${\beta}_{1}$ (red) and ${\beta}_{2}$ (green) when covariates are correlated and ${X}_{ij}^{\ast},j=1,2$ is subjected to lower LOD. (

**a**) Bias under $n=50$ and ${m}_{1}={m}_{2}=20\%$. (

**b**) Bias under $n=100$ and ${m}_{1}={m}_{2}=20\%$. (

**c**) Bias under $n=50$ and ${m}_{1}={m}_{2}=40\%$. (

**d**) Bias under $n=100$ and ${m}_{1}={m}_{2}=40\%$. (

**e**) Bias under $n=50$ and ${m}_{1}={m}_{2}=60\%$. (

**f**) Bias under $n=100$ and ${m}_{1}={m}_{2}=60\%$.

**Table 1.**Summary of the AAB ($\times 1000$) when covariates are independent and ${X}_{ij}^{\ast},j=1,2$ is subjected to lower LOD. M1 is complete-case analysis; M2–M4 are the different variants of the substitution methods; M5–M6 are the different variants of the MI methods; M7–M8 are the different variants of the MDI methods; M9–M12 are the different variants of MDI-embedded MI (MI + MDI) methods. AAB less than 0.1 is highlighted in gray, with darker tones corresponding to smaller AAB.

**Table 2.**Summary of the MSE ($\times 1000$) when covariates are independent and ${X}_{ij}^{\ast},j=1,2$ is subjected to lower LOD. M1 is complete-case analysis; M2–M4 are the different variants of the substitution methods; M5–M6 are the different variants of the MI methods; M7–M8 are the different variants of the MDI methods; M9–M12 are the different variants of MDI-embedded MI (MI + MDI) methods. MSEs less than 0.1 are highlighted in gray, with darker tones corresponding to smaller MSEs.

**Table 3.**Summary of the AAB ($\times 1000$) when covariates are correlated and ${X}_{ij}^{\ast},j=1,2$ is subjected to lower LOD. M1 is complete case analysis; M2–M4 are the different variants of the substitution methods; M5–M6 are the different variants of the MI methods; M7–M8 are the different variants of the MDI methods; M9–M12 are the different variants of MDI-embedded MI (MI + MDI) methods. AAB less than 0.1 is highlighted in gray, with darker tones corresponding to smaller AAB.

**Table 4.**Summary of the MSE ($\times 1000$) when covariates are correlated and ${X}_{ij}^{\ast},j=1,2$ is subjected to lower LOD. M1 is complete case analysis; M2–M4 are the different variants of the substitution methods; M5–M6 are the different variants of the MI methods; M7–M8 are the different variants of the MDI methods; M9–M12 are the different variants of MDI-embedded MI (MI + MDI) methods. MSEs less than 0.1 are highlighted in gray, with darker tones corresponding to smaller MSEs.

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Alyabs, N.; Chiou, S.H.
The Missing Indicator Approach for Accelerated Failure Time Model with Covariates Subject to Limits of Detection. *Stats* **2022**, *5*, 494-506.
https://doi.org/10.3390/stats5020029

**AMA Style**

Alyabs N, Chiou SH.
The Missing Indicator Approach for Accelerated Failure Time Model with Covariates Subject to Limits of Detection. *Stats*. 2022; 5(2):494-506.
https://doi.org/10.3390/stats5020029

**Chicago/Turabian Style**

Alyabs, Norah, and Sy Han Chiou.
2022. "The Missing Indicator Approach for Accelerated Failure Time Model with Covariates Subject to Limits of Detection" *Stats* 5, no. 2: 494-506.
https://doi.org/10.3390/stats5020029