Next Article in Journal
Goodness-of-Fit and Generalized Estimating Equation Methods for Ordinal Responses Based on the Stereotype Model
Next Article in Special Issue
Semiparametric Survival Analysis of 30-Day Hospital Readmissions with Bayesian Additive Regression Kernel Model
Previous Article in Journal
Bayesian Semiparametric Regression Analysis of Multivariate Panel Count Data
Previous Article in Special Issue
Multiple Imputation of Composite Covariates in Survival Studies
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

The Missing Indicator Approach for Accelerated Failure Time Model with Covariates Subject to Limits of Detection

1
Department of Mathematical Sciences, The University of Texas at Dallas, Richardson, TX 75080, USA
2
College of Sciences and Theoretical Studies, Saudi Electronic University, Riyadh 13316, Saudi Arabia
*
Author to whom correspondence should be addressed.
Stats 2022, 5(2), 494-506; https://doi.org/10.3390/stats5020029
Submission received: 24 March 2022 / Revised: 29 April 2022 / Accepted: 7 May 2022 / Published: 10 May 2022
(This article belongs to the Special Issue Survival Analysis: Models and Applications)

Abstract

:
The limit of detection (LOD) is commonly encountered in observational studies when one or more covariate values fall outside the measuring ranges. Although the complete-case (CC) approach is widely employed in the presence of missing values, it could result in biased estimations or even become inapplicable in small sample studies. On the other hand, approaches such as the missing indicator (MDI) approach are attractive alternatives as they preserve sample sizes. This paper compares the effectiveness of different alternatives to the CC approach under different LOD settings with a survival outcome. These alternatives include substitution methods, multiple imputation (MI) methods, MDI approaches, and MDI-embedded MI approaches. We found that the MDI approach outperformed its competitors regarding bias and mean squared error in small sample sizes through extensive simulation.

1. Introduction

We consider situations where the covariates of interest are only observable within a detection interval, referred to as the limit of detection (LOD) problem, e.g., [1,2]. The limit of detection is commonly encountered in observational studies. For example, patients with a positive real-time polymerase chain reaction (PCR) test result for coronavirus disease usually indicate that the viral RNA load is greater than the lower LOD, which varies from 10 2 to 10 6 copies per milliliter [3]. On the other hand, a typical droplet digital PCR test is restricted to an upper LOD or interval LOD [4]. Assays with high lower LODs or low higher LODs will likely result in higher false-negative or false-positive rates, respectively [3]. Another example where incomplete data due to the LOD are inevitable is in multi-omics data analysis, where quantitative omics measurements, such as metabolite levels and protein expressions, are missing due to the failure of the measurement assay at levels outside of its detection limits [5]. Proper adjustments are needed for valid data analysis with missing values due to the LOD.
In the presence of LOD, one of the most straightforward approaches is the complete-case (CC) analysis, which discards observations that fall outside the detection limits. Despite the CC analysis yielding unbiased estimates for the regression coefficients, e.g., [1,2,6,7,8], it could suffer from efficiency loss and become unstable when the sample size is small or when there are multiple covariates subject to the LOD. An approach that does not require discarding observations is to substitute the unobserved values with fixed values outside of the detection limits [9]. Common substitution methods replace missing values with ad hoc fix values or values derived from parametric assumptions, e.g., [7,8,10]. Such substitution methods could result in considerable bias when the imputed value is very different from the unobserved values or when the parametric assumptions are misspecified [1,2,7,10,11]. An approach to handle covariates subject to the LOD without discarding observations nor imposing a parametric distributional assumption is the missing indicator (MDI) approach, e.g., [6,10,12,13]. The idea of the MDI approach is to create a binary variable that indicates whether the covariate of interest is observed and include the indicator variable in the model as an additional covariate [6]. Since the MDI approach uses all of the available observations, estimating procedures that utilize the MDI approach is expected to yield a more efficient estimator and be more computationally stable when the sample size is small or multiple covariates are subject to LOD [14].
Approaches for LOD have been well studied in the literature. For example, the MDI approach was justified theoretically and numerically and compared to the CC approach in the context of linear regression [6] and logistic regression [15,16,17]. Extensions that combine the MDI and the multiple imputation (MI) approaches have also been studied under the generalized linear regression setting [18,19]. On a separate note, MDI-based and MI-based copula models were used to estimate the association between two continuous variables subjected to lower LOD [10]. Relatively fewer works compared approaches for LOD with survival outcomes. Among those, most of the existing works focus on proportional hazards models, e.g., [20,21]. Despite having a more favorable interpretability, the approaches for LOD under the accelerated failure time (AFT) framework were less explored until recently [22], where a seminonparametric distribution is recommended to model the error term.
Recent studies on the MDI approach yield encouraging results when the required conditions are met. However, most existing works are based on scenarios in which the covariates of interest are subjected to a lower LOD. We extended the MDI approach’s applicability to general scenarios in which covariates may be subjected to upper or interval LOD. We applied the MDI approach to the context of survival analysis when the survival time is subject to an independent right censoring. We assumed the survival time is related to the covariates via a parametric accelerated failure time model. We compared the performance of the MDI approach to that of existing methods at different types of LOD via large-scale simulation studies. We compared the approaches by evaluating the absolute value of the average biases (AAB) and mean squared error (MSE) for the regression parameter of interest.
The rest of the paper is organized as follows. Notation and model formulation are presented in Section 2. A brief description of the estimating procedures in the presence of covariates subject to the LOD are provided in Section 3. Results of large-scale simulation studies on the performance of the proposed estimators are reported in Section 4. A discussion concludes Section 5.

2. Notations and Model

Let T i be the time to an event of interest related to covariates via a parametric AFT model,
log ( T i ) = α + X i β + Z i γ + ϵ i , i = 1 , , n ,
where X i is a p × 1 covariate vector whose elements are partially missing due to LOD, Z i is a q × 1 fully observed covariate vector, ϵ i s are independent and identically distributed random variables with a known distribution, and ( α , β , γ ) are the corresponding conformable regression parameters. In the absence of the missing covariate X i , the regression coefficients can be estimated via maximizing the likelihood function. For example, when ϵ i has a normal distribution with mean zero and variance σ 2 , T i follows a log-normal distribution and the regression coefficients can be obtained by maximizing the likelihood
L ( α , γ , σ ; Θ ) = i = 1 n 1 σ ϕ log ( Y i ) α Z i γ σ Δ i Φ log ( Y i ) α Z i γ σ 1 Δ i ,
where Y i = min ( T i , C i ) is the observed survival time, Δ i = I ( T i C i ) is the censoring indicator, and C i is the censoring time. The functions ϕ ( · ) and Φ ( · ) are the probability density function and the cumulative distribution function of the standard normal distribution. The maximum likelihood estimator (MLE) can be obtained by a standard numerical optimization algorithm, such as the Newton–Raphson method. The variance–covariance matrix of the MLE can be estimated via the information matrix, and the asymptotic normality of the MLE follows directly from likelihood theorems. The survreg() function from R’s survival package [23] is available for fitting such a parametric AFT model.
When X i is subject to LOD, we assume X i j is observable only if L j X i j U j , where L j and U j are the lower and upper bounds of the measurement range, respectively. When X i j falls outside of [ L j , U j ] , we observe X i j = max { L j , min ( X i j , U j ) } , j = 1 , , p . That is, we observe X i j = L j if X i j < L j and X i j = U j if X i j > U j so that the direction of missing is always known. Accompanying X i = ( X i 1 , , X i p ) is the missing indicator V i = ( V i 1 , , V i p ) , where V i j = I ( L j X i j U j ) and I ( · ) is the indicator function. The observed data then consist of independent copies of Θ = { Y i , Δ i , X i , Z i , V i , L 1 , , L p , U 1 , , U p } , i = 1 , , n . We assume the censoring time C i is conditionally independent of T i given X i and Z i . Throughout the manuscript, we allow X i j to be subject to different types and levels of LOD and discuss approaches that are applicable under these scenarios.

3. Estimating Procedures in the Presence of LOD

3.1. Complete-Case Analysis

The CC analysis is commonly used in the presence of missing covariates. The fundamental idea of applying the CC analysis is to discard missing observations outside of the measurement range. Though the idea of the CC approach is straightforward, discarding observations from samples loses information and could potentially bias the estimation when the missingness is dependent on exposure, e.g., [16,24], as in the LOD cases. Additional convergence issues arise when the original sample size is small; an extreme case is where the CC approach is inapplicable when all subjects have at least one missing variable. With the missing indicator, the CC model can be expressed as a modification of (1) as follows:
Q i log ( T i ) = Q i α c + Q i X i β c + Q i Z i γ c , i = 1 , , n ,
where Q i = j V i j = 1 if all of X i j , j = 1 , , p , are observed, and is zero otherwise. The regression coefficients { α c , β c , γ c } can be obtained by maximizing the modified likelihood,
i = 1 n 1 σ ϕ log ( Y i ) α c X i β c Z i γ c σ Δ i Q i Φ log ( Y i ) α c X i β c Z i γ c σ ( 1 Δ i ) Q i .
The CC method is the default approach in survreg() when data contain missing values.

3.2. Parametric Substitution Approaches

Instead of discarding missing observations, imputations methods replace the missing values with their expectations. Most existing imputation methods replace missing values with the predicted values from models trained by the observed data, resulting in imputed values inside the observable region. Such imputation methods are not feasible for imputing missing values due to LOD, where the missing values are outside of the observable region. For this reason, it is more appropriate to consider imputation methods that replace missing values subject to LOD with conditional expectations, E ( X i j | X i j < L j ) or E ( X i j | X i j > U j ) , respectively, depending on the direction of missing. These quantities can be estimated parametrically using likelihood methods. For example, for a positive X i j subject to a lower LOD by L j > 0 , common substitution values such as L j / 2 and L j / 2 are derived by imposing a uniform distribution or a triangular distribution to the data below L j , respectively, e.g., [25,26]. On the other hand, ad hoc substituting values such as 0 and L j have also been considered but generally lead to biased estimations of regression coefficient estimates [9].
Although the aforementioned substituting values have simple forms, they are derived without using information from the observed X i j . An alternative approach is to derive the substituting values by imposing a distribution assumption on the whole data. For example, if X i j is assumed to follow a normal distribution with mean μ j and variance ς j 2 , then ( μ j , ς j 2 ) can be estimated by maximizing the likelihood
i = 1 n 1 ς ϕ x i j μ j ς j V i j Φ L j μ j ς j I ( V i j = 0 , X i j < L i ) Φ μ j U j ς j I ( V i j = 0 , X i j > U i ) .
Let r ( x ) = ϕ ( x ) / Φ ( x ) , and μ ^ j and ς ^ j 2 be the MLEs of μ j and ς j 2 , respectively. Once μ ^ j and ς ^ j 2 are obtained, the conditional expectations
E ( X i j | X i j < L j ) = μ ^ j ς ^ j r L j μ ^ j ς ^ j and   E ( X i j | X i j > U j ) = μ ^ j + ς ^ j r μ ^ j U j ς ^ j ,
can be used as the substituting values for those X i j censored by L j and U j , respectively. The estimates of the regression coefficients in (1) under the parametric substitution methods are then obtained by maximizing the likelihood in (1) with missing X i j replaced by the desired substituting values.

3.3. Parametric Multiple Imputation Approaches

Single imputation methods such as those mentioned in Section 3.2 are less computation-demanding compared to the MI [27] approaches, but the latter could be more efficient as they better reflect uncertainty about imputed values. The general idea of MI methods is to impute the missing X i j repeatedly with values generated from its predictive distribution given the observed data. Once the M complete data sets are generated, the CC analysis is then applied to each complete data set. The separate results are then pooled to provide the final inference. Building onto the aforementioned substitution method under normal assumptions, we consider imputing the missing X i j s by random values generated from densities f ( x | X i j < L j , μ ^ j , σ ^ j 2 ) or f ( x | X i j > U j , μ ^ j , σ ^ j 2 ) . Under the normal assumption on X i j , f ( x | · ) corresponds to truncated normal density functions and the random values are generated via the inverse cumulative distribution function method. Let θ ^ m = ( α ^ m , β ^ m , γ ^ m ) , m = 1 , , M , be the coefficient estimate obtained by maximizing (1) at the mth imputation. Using the Rubin’s rule [27], the pooled MI coefficient estimate and variance estimate are
θ ^ M I = 1 M m = 1 M θ ^ m and   V a r ( θ ^ M I ) = 1 M m = 1 M V a r ( θ ^ m ) + 1 + 1 M m = 1 M ( θ ^ m θ ^ M I ) 2 M 1 ,
where V a r ( θ ^ n ) is the variance estimate for θ ^ m . The proposed MI method differs from the existing MI methods, such as the ones implemented in mice [28], in that the proposed method targets imputation values outside of the observed region. Our MI method can be easily implemented and is flexible in that different parametric assumptions can be implied for different covariates.

3.4. Missing Indicator Approaches

A useful alternative that does not require discarding or imputing missing values is the MDI approach [6]. The idea of the MDI approach is to include the missing status as additional covariates in the model so that all available information remains in the analysis to maintain statistical power. Specifically, we consider the MDI-embedded AFT model
log ( T i ) = α m + ( V i X i ) β m + Z i γ m + ( 1 V i ) θ m + ϵ i , i = 1 , , n ,
where u v is the element-wise product of vectors u and v and θ m is an additional p × 1 regression coefficient. The MLE of ( α m , β m , γ m , θ m ) can be obtained by maximizing the modified likelihood
i = 1 n 1 σ ϕ e i ( α m , β m , γ m , θ m ) σ Δ i Φ e i ( α m , β m , γ m , θ m ) σ 1 Δ i ,
where e i ( α m , β m , γ m , θ m ) = log ( Y i ) α m ( V i X i ) β m Z i γ m ( 1 V i ) θ m . In the context of linear regression, the least-squares estimator for β m was shown to be asymptotically unbiased for β in (1) if X i and Z i are uncorrelated [6]. The performance of the MDI approach has also been studied under the generalized linear model, e.g., [15]. Since the parametric AFT model has a log-linear form, the MLE obtained from maximizing (4) is expected to be asymptotically unbiased in the absence of censoring. We also conjecture that the asymptotic unbiasedness continue to hold in the presence of censoring. The MDI approach is easy to implement and can be extended in several directions. For example, the fully expanded MDI model extends (3) by including interaction terms between the missing indicators and the observed covariates [15], resulting in the revised AFT model
log ( T i ) = α m + ( V i X i ) β m + Z i γ m + ( 1 V i ) θ m + [ ( 1 V i ) Z i ] ϕ m + ϵ i ,
where ϕ m is an additional q × 1 regression coefficient. On the other hand, the MDI approach could be embedded into the MI approach, e.g., [18,19], resulting in the revised AFT model
log ( T i ) = α m + X ˜ i β m + Z i γ m + ( 1 V i ) θ m + ϵ i , i = 1 , , n ,
where X ˜ i = ( X ˜ i 1 , , X ˜ i p ) , X ˜ i j = X i j if V i j = 1 , and X ˜ i j is the imputed value by MI if V i j = 0 . The MI coefficient estimates are then pooled by the Rubin’s rule. Those extensions of the MDI approach are implemented and compared in simulation.

4. Simulation

A series of simulation studies were conducted to compare methods discussed in Section 3. The failure time T i was generated from the AFT model
log ( T i ) = β 0 + β 1 X i 1 + β 2 X i 2 + γ 1 Z i + ϵ ,
where X i 1 was a Weibull random variable with shape 1 and scale 1 / 3 , X i 2 was a normal random variable with mean 0 and variance 0.64 , Z i was a standard normal random variable, the regression parameter ( β 0 , β 1 , β 2 , γ 1 ) = ( 2 , 1 , 1 , 1 ) , and the error term ϵ followed a standard normal distribution. We considered scenarios where covariates are independent and where the covariates are correlated. In the latter case, the Clayton copula with a Spearman’s rho of 0.4 was used to specify the correlation between X 1 and Z. The censoring time was independently generated from a uniform distribution over [ 0 , 1.25 ] , yielding a 30% censoring rate on T i . We considered three types of LOD: lower LOD, upper LOD, and interval LOD, where X i j is observable in [ L j , ] , [ , U j ] , and [ L j , U j ] , respectively. The detection limits, L j and U j , were quantiles of X i j chosen to achieve three levels of missing proportions, 20%, 40%, and 60%, for light missing, moderate missing, and heavy missing, respectively. For interval LOD, we additionally assumed L j to be the ( 100 · m j / 4 ) th quantile of X i j , where m j is the missing proportion for X i j , j = 1 , 2 .
For each configuration, we compared the performance of the following approaches to handling missing data.
Complete-case analysis
M1 
removal of subjects with missing X i j .
Substitution methods:
M2 
substitution of the missing X i j by L j / 2 or 2 U j .
M3 
substitution of the missing X i j by L j / 2 or 2 U j .
M4 
substitution of the missing X i j by E ( X i j | X i j < L j ) or E ( X i j | X i j > U j ) under normal assumptions.
Multiple imputation approaches:
M5 
MI of the missing X i j using the predictive mean matching (PMM) algorithm implemented in the R package mice [28].
M6 
MI of the missing X i j using conditional densities derived under normal assumptions as described in Section 3.3.
Missing indicator approaches:
M7 
the missing indicator approaches (MDI) model.
M8 
the expanded MDI model.
Missing-indicator-embedded multiple imputation approaches (MI + MDI):
M9 
MI by PMM and fit with MDI model.
M10 
MI by normal assumptions and fit with MDI model.
M11 
MI by PMM and fit with expanded MDI model.
M12 
MI by normal assumptions and fit with expanded MDI model.
The simulation was repeated 10,000 times with sample sizes n = 50 , 100 , and 500. The MLE of the regression parameter of the AFT model (5) was obtained using the survreg() function in the survival package [23] in R [29] under the normal error assumption, e.g., with argument dist = "lognormal". For the scenarios considered, the CC approach (M1) sometimes failed to converge as the resultant sample size was too small or empty after removing missing observations. The convergence rate for the CC approach under different scenarios presented in the Supplementary Materials shows fewer converged replications when the sample size is small (e.g., n = 50 ) or the missing proportions are high (e.g., m 1 = 60 % or m 2 = 60 % ). For this reason, the simulation results were based on the converged replications for the CC approach. For MI methods, the number of imputations M was set to 5.
Table 1 and Table 2 summarize the AAB and MSE associated with the MLEs of β 1 , β 2 , and γ 1 in the AFT model (5) when the covariates are independent and the censored covariates are subjected to a lower LOD. The MDI approaches (M7 and M8) have among the smallest AAB and MSE across the considered scenarios. Moreover, the MDI approaches outperform the CC approach (M1) when the sample size is small or the missing proportions ( m 1 and m 2 ) are high. Overall, the AAB and the MSE generally increase with increasing missing proportions. On the other hand, whereas MSE generally decreases with an increasing sample size, the trend of AAB varies by model. Among the substitution methods, both M2 and M3 yield smaller AAB for β 1 than for β 2 ; this is because the substituting values under these approaches are close to E ( X i 1 | X i 1 < L 1 ) . On the contrary, M4 yields smaller AAB for β 2 when the parametric assumption for X 2 is satisfied. The same trend can be seen in the parametric MI approach, M6. In particular, all of the imputation approaches, including the PMM-based MI approach (M5), did not improve the performance when compared with the MDI approach. Combining MDI models in MI approaches does not necessarily improve the performance of MDI or MI approaches if they would be applied solely. In situations where the combined approach shows improved AAB over the MI approaches, there are trade-offs in MSE. Of those, the expanded MDI-embedded MI approach (M11 and M12) yields smaller AAB than the MDI-embedded MI approach (M9 and M10), but they result in a comparable MSE. In addition, biases associated with the MLEs of β 1 and β 2 summarized in Figure 1 provide insight into the direction of bias. Among those that yield a substantial bias, approaches with uniform and triangular assumptions, i.e., M2 and M3, tend to overestimate β 1 and underestimate β 2 . In contrast, approaches with normal assumptions, i.e., M4, M6, and M10, tend to underestimate β 1 and correctly estimate β 2 . The pattern is reversed in the case of an upper or interval LOD. These observations suggest that the direction of bias is imposed by the underlying parametric assumption and highlight the robustness of the MDI approach. Similar trends are observed in scenarios where the covariates are subjected to the upper or interval LOD and where n = 500, as presented in the Supplementary Materials. On the other hand, the results when the covariates are correlated are presented in Table 3 and Table 4 and Figure 2. For all approaches, correlation generally results in higher AAB and MSE but does not alter the direction of bias. This observation is consistent with the literature, where the asymptotic bias of the regression coefficient associated with the censored covariate is shown to increase with an increasing magnitude of the correlation [6]. However, these theoretical results do not apply directly to a small sample setting, as the MDI approaches remain at least as good as, if not better than, the CC approach.

5. Discussion

The MDI approach minimizes the loss of information and does not require making parametric assumptions, making it an attractive alternative to some of the more widely used approaches for handling missing covariates. Moreover, the MDI approaches show clear advantages over the competitors and are recommended in models with survival outcomes, as in our simulation. Our simulation shows no apparent difference between the MDI and the expanded MDI models, but embedding the expanded MDI model in MI could result in a higher bias reduction. The advantage of the MDI approach is more substantial when there is a large proportion of missing covariates or when the distributional assumption is violated in the MI approach. The MDI approaches continue to perform well under additional simulation settings, including scenarios where the survival time is not subject to censoring and scenarios under a Cox proportional hazard model setting.
It has been noted that, even though the MDI approach generally results in a reduced bias, it might have minimal improvements when the missing mechanism is associated with the outcome [30] or when the missing covariate is categorical [31]. Those phenomena were verified in the context of generalized linear regression, and it would be worth investigating those scenarios in our setting with survival outcomes. Moreover, extending the assessments of the validity of the MDI approach, e.g., [32,33], to our settings will be of interest.
We only considered scenarios where the direction of missing is known in this paper. Nevertheless, the MDI approach is still applicable when the direction of missing is unknown. The aforementioned parametric imputation methods can easily be extended to the case when the direction of missing is unknown. For example, suppose that X i j follows a normal distribution with mean μ j and variance ς j 2 as in Section 3.3. The MLEs of μ j and ς j 2 can be obtained by maximizing the likelihood
i = 1 n 1 ς ϕ x i j μ j ς j V i j Φ L j μ j ς j + Φ μ j U j ς j 1 V i j .
The corresponding MI procedure can then be carried out with missing X i j s imputed by values generated from density p f ( x | X i j < L j , μ ^ j , σ ^ j 2 ) + ( 1 p ) f ( x | X i j > U j , μ ^ j , σ ^ j 2 ) , where p = 1 with probability Φ [ ( L j μ ^ j ) / ς ^ j ] / { Φ [ ( L j μ ^ j ) / ς ^ j ] + Φ [ ( μ ^ j U j ) / ς ^ j ] } and p = 0 otherwise. Due to its simplicity, the MDI method can also be easily embedded into other methods to improve the overall performance. An immediate example is the MI+MDI approaches considered in Section 4. Another extension is to embed the MDI approach in threshold regression approaches [34] to accommodate multiple censored covariates.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/stats5020029/s1.

Author Contributions

Conceptualization, N.A. and S.H.C.; methodology, N.A. and S.H.C.; software, N.A.; validation, N.A. and S.H.C.; formal analysis, N.A. and S.H.C.; writing—original draft preparation, N.A.; writing—review and editing, S.H.C.; visualization, N.A. and S.H.C.; supervision, S.H.C. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Bernhardt, P.W.; Wang, H.J.; Zhang, D. Statistical methods for generalized linear models with covariates subject to detection limits. Stat. Biosci. 2015, 7, 68–89. [Google Scholar] [CrossRef] [PubMed]
  2. Kong, S.; Nan, B. Semiparametric approach to regression with a covariate subject to a detection limit. Biometrika 2016, 103, 161–174. [Google Scholar] [CrossRef] [Green Version]
  3. Arnaout, R.; Lee, R.A.; Lee, G.R.; Callahan, C.; Yen, C.F.; Smith, K.P.; Arora, R.; Kirby, J.E. SARS-CoV2 testing: The limit of detection matters. bioRxiv 2020. [Google Scholar] [CrossRef]
  4. Lou, Y.; Chen, C.; Long, X.; Gu, J.; Xiao, M.; Wang, D.; Zhou, X.; Li, T.; Hong, Z.; Li, C.; et al. Detection and Quantification of Chimeric Antigen Receptor Transgene Copy Number by Droplet Digital PCR versus Real-Time PCR. J. Mol. Diagn. 2020, 22, 699–707. [Google Scholar] [CrossRef]
  5. Lin, D.Y.; Zeng, D.; Couper, D. A general framework for integrative analysis of incomplete multiomics data. Genet. Epidemiol. 2020, 44, 646–664. [Google Scholar] [CrossRef]
  6. Jones, M.P. Indicator and stratification methods for missing explanatory variables in multiple linear regression. J. Am. Stat. Assoc. 1996, 91, 222–230. [Google Scholar] [CrossRef]
  7. Nie, L.; Chu, H.; Liu, C.; Cole, S.R.; Vexler, A.; Schisterman, E.F. Linear regression with an independent variable subject to a detection limit. Epidemiology 2010, 21, S17. [Google Scholar] [CrossRef] [Green Version]
  8. Arunajadai, S.G.; Rauh, V.A. Handling covariates subject to limits of detection in regression. Environ. Ecol. Stat. 2012, 19, 369–391. [Google Scholar] [CrossRef]
  9. Schisterman, E.F.; Vexler, A.; Whitcomb, B.W.; Liu, A. The limitations due to exposure detection limits for regression models. Am. J. Epidemiol. 2006, 163, 374–383. [Google Scholar] [CrossRef] [Green Version]
  10. Tran, T.M.; Abrams, S.; Aerts, M.; Maertens, K.; Hens, N. Measuring association among censored antibody titer data. Stat. Med. 2021, 40, 3740–3761. [Google Scholar] [CrossRef]
  11. Richardson, D.B.; Ciampi, A. Effects of exposure measurement error when an exposure variable is constrained by a lower limit. Am. J. Epidemiol. 2003, 157, 355–363. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  12. Anderson, A.B.; Basilevsky, A.; Hum, D.P. Missing data: A review of the literature. In Handbook of Survey Research; Academic Press: Cambridge, MA, USA, 1983; pp. 415–494. [Google Scholar]
  13. Chow, W.K. A look at various estimators in logistic models in the presence of missing values. In Technical Report; Rand Corp: Santa Monica, CA, USA, 1979. [Google Scholar]
  14. Cohen, J.; Cohen, P.; West, S.G.; Aiken, L.S. Applied Multiple Regression/Correlation Analysis for the Behavioral Sciences; Taylor & Francis: Oxfordshire, UK, 2013. [Google Scholar]
  15. Chiou, S.H.; Betensky, R.A.; Balasubramanian, R. The missing indicator approach for censored covariates subject to limit of detection in logistic regression models. Ann. Epidemiol. 2019, 38, 57–64. [Google Scholar] [CrossRef] [PubMed]
  16. Ortega-Villa, A.M.; Liu, D.; Ward, M.H.; Albert, P.S. New insights into modeling exposure measurements below the limit of detection. Environ. Epidemiol. 2021, 5, e116. [Google Scholar] [CrossRef] [PubMed]
  17. Blackhurst, M. Identifying Lead Service Lines with Field Tap Water Sampling. ACS ES T Water 2021, 1, 1983–1991. [Google Scholar] [CrossRef]
  18. Choi, J.; Dekkers, O.M.; le Cessie, S. A comparison of different methods to handle missing data in the context of propensity score analysis. Eur. J. Epidemiol. 2019, 34, 23–36. [Google Scholar] [CrossRef] [Green Version]
  19. Sperrin, M.; Martin, G.P. Multiple imputation with missing indicators as proxies for unmeasured variables: Simulation study. BMC Med. Res. Methodol. 2020, 20, 185. [Google Scholar] [CrossRef]
  20. Lee, S.; Park, S.; Park, J. The proportional hazards regression with a censored covariate. Stat. Probab. Lett. 2003, 61, 309–319. [Google Scholar] [CrossRef]
  21. Dinse, G.E.; Jusko, T.A.; Ho, L.A.; Annam, K.; Graubard, B.I.; Hertz-Picciotto, I.; Miller, F.W.; Gillespie, B.W.; Weinberg, C.R. Accommodating measurements below a limit of detection: A novel application of Cox regression. Am. J. Epidemiol. 2014, 179, 1018–1024. [Google Scholar] [CrossRef] [Green Version]
  22. Bernhardt, P.W.; Wang, H.J.; Zhang, D. Flexible modeling of survival data with covariates subject to detection limits via multiple imputation. Comput. Stat. Data Anal. 2014, 69, 81–91. [Google Scholar] [CrossRef] [Green Version]
  23. Therneau, T.M. A Package for Survival Analysis in R; R Package Version 3.2-13. Available online: https://CRAN.R-project.org/package=survival (accessed on 23 March 2022).
  24. Hughes, R.A.; Heron, J.; Sterne, J.A.; Tilling, K. Accounting for missing data in statistical analyses: Multiple imputation is not always the answer. Int. J. Epidemiol. 2019, 48, 1294–1304. [Google Scholar] [CrossRef]
  25. Hornung, R.W.; Reed, L.D. Estimation of average concentration in the presence of nondetectable values. Appl. Occup. Environ. Hyg. 1990, 5, 46–51. [Google Scholar] [CrossRef]
  26. Baccarelli, A.; Pfeiffer, R.; Consonni, D.; Pesatori, A.C.; Bonzini, M.; Patterson Jr, D.G.; Bertazzi, P.A.; Landi, M.T. Handling of dioxin measurement data in the presence of non-detectable values: Overview of available methods and their application in the Seveso chloracne study. Chemosphere 2005, 60, 898–906. [Google Scholar] [CrossRef] [PubMed]
  27. Rubin, D.B. Statistical matching using file concatenation with adjusted weights and multiple imputations. J. Bus. Econ. Stat. 1986, 4, 87–94. [Google Scholar]
  28. van Buuren, S.; Groothuis-Oudshoorn, K. mice: Multivariate Imputation by Chained Equations in R. J. Stat. Softw. 2011, 45, 1–67. [Google Scholar] [CrossRef] [Green Version]
  29. R Core Team. R: A Language and Environment for Statistical Computing; R Foundation for Statistical Computing: Vienna, Austria, 2017. [Google Scholar]
  30. Groenwold, R.H.; White, I.R.; Donders, A.R.T.; Carpenter, J.R.; Altman, D.G.; Moons, K.G. Missing covariate data in clinical research: When and when not to use the missing-indicator method for analysis. CMAJ 2012, 184, 1265–1269. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  31. Zhuchkova, S.; Rotmistrov, A. A Comparison Of The Missing-Indicator Method And Complete Case Analysis In Case Of Categorical Data. In Higher School of Economics Research Paper No. WP BRP; Social Science Research Network: Rochester, NY, USA, 2019; Volume 87. [Google Scholar]
  32. Blake, H.A.; Leyrat, C.; Mansfield, K.E.; Tomlinson, L.A.; Carpenter, J.; Williamson, E.J. Estimating treatment effects with partially observed covariates using outcome regression with missing indicators. Biom. J. 2020, 62, 428–443. [Google Scholar] [CrossRef] [PubMed]
  33. Blake, H.A.; Leyrat, C.; Mansfield, K.E.; Seaman, S.; Tomlinson, L.A.; Carpenter, J.; Williamson, E.J. Propensity scores using missingness pattern information: A practical guide. Stat. Med. 2020, 39, 1641–1657. [Google Scholar] [CrossRef] [Green Version]
  34. Qian, J.; Chiou, S.H.; Maye, J.E.; Atem, F.; Johnson, K.A.; Betensky, R.A. Threshold regression to accommodate a censored covariate. Biometrics 2018, 74, 1261–1270. [Google Scholar] [CrossRef]
Figure 1. Violin plots showing the empirical distribution of the bias associated with MLE of β 1 (red) and β 2 (green) when covariates are independent and X i j , j = 1 , 2 is subjected to lower LOD. (a) Bias under n = 50 and m 1 = m 2 = 20 % . (b) Bias under n = 100 and m 1 = m 2 = 20 % . (c) Bias under n = 50 and m 1 = m 2 = 40 % . (d) Bias under n = 100 and m 1 = m 2 = 40 % . (e) Bias under n = 50 and m 1 = m 2 = 60 % . (f) Bias under n = 100 and m 1 = m 2 = 60 % .
Figure 1. Violin plots showing the empirical distribution of the bias associated with MLE of β 1 (red) and β 2 (green) when covariates are independent and X i j , j = 1 , 2 is subjected to lower LOD. (a) Bias under n = 50 and m 1 = m 2 = 20 % . (b) Bias under n = 100 and m 1 = m 2 = 20 % . (c) Bias under n = 50 and m 1 = m 2 = 40 % . (d) Bias under n = 100 and m 1 = m 2 = 40 % . (e) Bias under n = 50 and m 1 = m 2 = 60 % . (f) Bias under n = 100 and m 1 = m 2 = 60 % .
Stats 05 00029 g001
Figure 2. Violin plots showing the empirical distribution of the bias associated with MLE of β 1 (red) and β 2 (green) when covariates are correlated and X i j , j = 1 , 2 is subjected to lower LOD. (a) Bias under n = 50 and m 1 = m 2 = 20 % . (b) Bias under n = 100 and m 1 = m 2 = 20 % . (c) Bias under n = 50 and m 1 = m 2 = 40 % . (d) Bias under n = 100 and m 1 = m 2 = 40 % . (e) Bias under n = 50 and m 1 = m 2 = 60 % . (f) Bias under n = 100 and m 1 = m 2 = 60 % .
Figure 2. Violin plots showing the empirical distribution of the bias associated with MLE of β 1 (red) and β 2 (green) when covariates are correlated and X i j , j = 1 , 2 is subjected to lower LOD. (a) Bias under n = 50 and m 1 = m 2 = 20 % . (b) Bias under n = 100 and m 1 = m 2 = 20 % . (c) Bias under n = 50 and m 1 = m 2 = 40 % . (d) Bias under n = 100 and m 1 = m 2 = 40 % . (e) Bias under n = 50 and m 1 = m 2 = 60 % . (f) Bias under n = 100 and m 1 = m 2 = 60 % .
Stats 05 00029 g002
Table 1. Summary of the AAB ( × 1000 ) when covariates are independent and X i j , j = 1 , 2 is subjected to lower LOD. M1 is complete-case analysis; M2–M4 are the different variants of the substitution methods; M5–M6 are the different variants of the MI methods; M7–M8 are the different variants of the MDI methods; M9–M12 are the different variants of MDI-embedded MI (MI + MDI) methods. AAB less than 0.1 is highlighted in gray, with darker tones corresponding to smaller AAB.
Table 1. Summary of the AAB ( × 1000 ) when covariates are independent and X i j , j = 1 , 2 is subjected to lower LOD. M1 is complete-case analysis; M2–M4 are the different variants of the substitution methods; M5–M6 are the different variants of the MI methods; M7–M8 are the different variants of the MDI methods; M9–M12 are the different variants of MDI-embedded MI (MI + MDI) methods. AAB less than 0.1 is highlighted in gray, with darker tones corresponding to smaller AAB.
Stats 05 00029 i001
Table 2. Summary of the MSE ( × 1000 ) when covariates are independent and X i j , j = 1 , 2 is subjected to lower LOD. M1 is complete-case analysis; M2–M4 are the different variants of the substitution methods; M5–M6 are the different variants of the MI methods; M7–M8 are the different variants of the MDI methods; M9–M12 are the different variants of MDI-embedded MI (MI + MDI) methods. MSEs less than 0.1 are highlighted in gray, with darker tones corresponding to smaller MSEs.
Table 2. Summary of the MSE ( × 1000 ) when covariates are independent and X i j , j = 1 , 2 is subjected to lower LOD. M1 is complete-case analysis; M2–M4 are the different variants of the substitution methods; M5–M6 are the different variants of the MI methods; M7–M8 are the different variants of the MDI methods; M9–M12 are the different variants of MDI-embedded MI (MI + MDI) methods. MSEs less than 0.1 are highlighted in gray, with darker tones corresponding to smaller MSEs.
Stats 05 00029 i002
Table 3. Summary of the AAB ( × 1000 ) when covariates are correlated and X i j , j = 1 , 2 is subjected to lower LOD. M1 is complete case analysis; M2–M4 are the different variants of the substitution methods; M5–M6 are the different variants of the MI methods; M7–M8 are the different variants of the MDI methods; M9–M12 are the different variants of MDI-embedded MI (MI + MDI) methods. AAB less than 0.1 is highlighted in gray, with darker tones corresponding to smaller AAB.
Table 3. Summary of the AAB ( × 1000 ) when covariates are correlated and X i j , j = 1 , 2 is subjected to lower LOD. M1 is complete case analysis; M2–M4 are the different variants of the substitution methods; M5–M6 are the different variants of the MI methods; M7–M8 are the different variants of the MDI methods; M9–M12 are the different variants of MDI-embedded MI (MI + MDI) methods. AAB less than 0.1 is highlighted in gray, with darker tones corresponding to smaller AAB.
Stats 05 00029 i003
Table 4. Summary of the MSE ( × 1000 ) when covariates are correlated and X i j , j = 1 , 2 is subjected to lower LOD. M1 is complete case analysis; M2–M4 are the different variants of the substitution methods; M5–M6 are the different variants of the MI methods; M7–M8 are the different variants of the MDI methods; M9–M12 are the different variants of MDI-embedded MI (MI + MDI) methods. MSEs less than 0.1 are highlighted in gray, with darker tones corresponding to smaller MSEs.
Table 4. Summary of the MSE ( × 1000 ) when covariates are correlated and X i j , j = 1 , 2 is subjected to lower LOD. M1 is complete case analysis; M2–M4 are the different variants of the substitution methods; M5–M6 are the different variants of the MI methods; M7–M8 are the different variants of the MDI methods; M9–M12 are the different variants of MDI-embedded MI (MI + MDI) methods. MSEs less than 0.1 are highlighted in gray, with darker tones corresponding to smaller MSEs.
Stats 05 00029 i004
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Alyabs, N.; Chiou, S.H. The Missing Indicator Approach for Accelerated Failure Time Model with Covariates Subject to Limits of Detection. Stats 2022, 5, 494-506. https://doi.org/10.3390/stats5020029

AMA Style

Alyabs N, Chiou SH. The Missing Indicator Approach for Accelerated Failure Time Model with Covariates Subject to Limits of Detection. Stats. 2022; 5(2):494-506. https://doi.org/10.3390/stats5020029

Chicago/Turabian Style

Alyabs, Norah, and Sy Han Chiou. 2022. "The Missing Indicator Approach for Accelerated Failure Time Model with Covariates Subject to Limits of Detection" Stats 5, no. 2: 494-506. https://doi.org/10.3390/stats5020029

APA Style

Alyabs, N., & Chiou, S. H. (2022). The Missing Indicator Approach for Accelerated Failure Time Model with Covariates Subject to Limits of Detection. Stats, 5(2), 494-506. https://doi.org/10.3390/stats5020029

Article Metrics

Back to TopTop