Next Article in Journal
Suicide Precipitants Differ Across the Lifespan but Are Not Significant in Predicting Medically Severe Attempts
Next Article in Special Issue
Integration of a Copper-Containing Biohybrid (CuHARS) with Cellulose for Subsequent Degradation and Biomedical Control
Previous Article in Journal
Underestimation of Self-Reported Smoking Prevalence in Korean Adolescents: Evidence from Gold Standard by Combined Method
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:

Using a Counting Process Method to Impute Censored Follow-Up Time Data

Centre for Clinical Epidemiology and Biostatistics (CCEB), School of Medicine and Public Health, The University of Newcastle (UoN), Callaghan, NSW 2308, Australia
Author to whom correspondence should be addressed.
Int. J. Environ. Res. Public Health 2018, 15(4), 690;
Submission received: 17 March 2018 / Revised: 1 April 2018 / Accepted: 3 April 2018 / Published: 5 April 2018


Censoring occurs when complete follow-up time information is unavailable for patients enrolled in a clinical study. The process is considered to be informative (non-ignorable) if the likelihood function for the model cannot be partitioned into a set of response parameters that are independent of the censoring parameters. In such cases, estimated survival time probabilities may be biased, prompting the need for special statistical methods to remedy the situation. The problem is especially salient when censoring occurs early in a study. In this paper, we describe a method to impute censored follow-up times using a counting process method.

1. Introduction

Ideally, censoring in a survival analysis should be non-informative and not related to any aspect of the study that could bias results [1,2,3,4,5,6,7]. For example, toxic side effects of an investigational drug may prompt the most ill patients to withdraw early from the study. Other patients may opt to leave before the intended end of a trial if the treatment is effective and they feel well. Even when censoring is non-informative (e.g., relocation to another city because of plant closure), by chance alone, it may still have a serious effect on estimated survival probabilities, especially if the dropouts occur early in the study.
In this paper, we present an example of early censoring to illustrate how the resulting survival probabilities may be biased. We then describe a method to impute censored follow-up times by rearranging the data as a counting process and generating jump-point plots.

2. Materials and Methods

Imputing Censored Follow-Up Times
Let ξ ( j ) denote the follow-up time for the j t h ordered observation, given a total of ( n )   observations and ( k ) integer valued time ( t ) points. Accordingly,
( t 2 = t 1 + 1 ,   t 3 = t 2 + 1 , ,   t k = t k 1 + 1 ; t k n )
t 1 ξ ( 1 ) ξ ( 2 ) ξ ( n ) t k .
Then, for all values of ξ ( j ) and t i over their respective ranges (i.e., j = 1   to   n ,   i = 1   to   k ), the counting process “indicator” variable ( ζ ) assumes the value 0 if t i ξ ( j ) and ξ ( j )   corresponds to a censored follow-up time (i.e., the outcome event, such as death, has yet to occurred by time ξ ( j ) ). Otherwise, when the observation denotes an event, ( ζ ) assumes the value 0 if t i < ξ ( j ) and the value 1 if t i ξ ( j ) [8].
Next, we fit a multiple logistic regression model with ( p + 1 )   covariates to the above counting process data, i.e.,
P ( ζ = 1 | β 0 , β 1 , β 2 , β p + 1 ) = 1 1 + e { β 0 + β 1 ( t i ) + β 2 ( x 1 ) + β 3 ( x 2 ) + ··· + β p + 1 ( x p ) }
where the first covariate denotes time ( t i ) and the remaining ( p ) covariates ( x 1 , x 2 , , x p ) represent outcome related risk factors. Model adequacy may be assessed by examining diagnostic plots, in lieu of standard goodness-of-fit tests, which would assume that the underlying counting process data are independently distributed [9,10,11,12]. In some cases, including higher-order or trigonometric terms into the logistic regression equation may improve model fit.
A jump-point plot for a particular covariate pattern corresponding to a censored follow-up time ( ξ ( j ) c ) may be obtained by plotting the model predicted values ( p ^ ) against the time variable ( t i ) . Let ξ ˜ ( j ) c denote the value of ( t i ) that maps to p ^ = 0.50 (i.e., maximum likelihood estimate of the mean jump-point observation). The imputed censored follow-up time is given as
imp ξ ˜ ( j ) c = i n f { s u p ( ξ ˜ ( j ) c , ξ ( j ) c ) , ξ m a x } ,
where ( ξ m a x ) is the natural upper limit for a patient’s follow-up time. For example, in the case of cancer therapy, the maximum life expectancy of a patient rarely exceeds 101 years of age. For a patient aged 89 at diagnosis, we see that ξ m a x is computed as (101 minus 89) . The original censored follow-up times are then replaced with imp ξ ˜ ( j ) c . However, it is important to note that these follow-up times are still treated as censored values rather than events when computing survival probability estimates.
Assuming Martingale independence and considering the event times to be binomially distributed (i.e., the probability ( π ν ) of an event ( ς ν ) occurring in a particular risk set ( ν ) is equal to its expectation E ( ς ν ) divided by the risk set size n ν (accounting for censored and event times), with corresponding variance equal to n ν π ν ( 1 π ν ) ), it follows that the resulting data be may analyzed using standard methods for handling censored time-to-event observations (e.g., Kaplan-Meier and Cox proportional-hazards models) [13,14]. Here, the sampling variability for estimating the imputed censored follow-up times is absorbed into the sampling variability for each binomially distributed event time, given that the estimates are asymptotically consistent and adhere to certain regularity conditions.

3. Results

3.1. Kaplan-Meier (Product-Limit) Example

Consider the data in Table A1, which provides the event and censored follow-up times for 250 cancer patients undergoing treatment and their simulated complete values (for illustrative purposes). Approximately 10% of the follow-up times were censored, with the majority of these values occurring early in the study. Figure 1 shows a Kaplan-Meir (KM) plot comparing the original censored data with the dataset of complete follow-up times. The probability of surviving 5 years (60 months) was ~26% for the original censored data compared with ~18% for the complete dataset. In Figure 2, we compare the complete dataset with the imputed censored dataset. At 5 years, rounding to the nearest whole number, we see that the survival times are identical (i.e., 26%). Indeed, the curves are similar until ~90 months, at which point the survival times for the imputed censored time curve are divergently lower than those for the complete dataset. In Figure 3A–C, we observed a few outlying values that indicate a nominal departure from the underlying model goodness-of-fit.

3.2. Generating the Jump-Point Plot

The jump-point plot corresponding to the covariate pattern
( x 1 = 76 ,   x 2 = 3 , x 3 = 0 ,   x 4 = 0 ,   x 5 = 1 ,   x 6 = 0 )
is shown in Figure 4. Here, the variables ( x 1 , x 2 , , x 6 ) correspond to age (years), grade (I-IV), lymph node invasion (1 = yes, 0 = no), positive margins (1 = yes, 0 = no), no hormone therapy (1 = yes, 0 = no), and no radiation therapy (1 = yes, 0 = no), respectively. We see that the imputed censored follow-up time of 29.33 months closely matches the actual event time of 29 months. This observation was originally censored at 1 month.

3.3. SAS Code

The SAS code used to generate the jump-point plot is shown in Figure 5. In this code, it is assumed that the data contained in Appendix A has been previously read into the dataset “a”. The counting process variables are created in dataset “b” and then modeled using the “PROC LOGISTIC” procedure. The predicted probabilities generated by this procedure are plotted against time (ranging from 1 to 99 months) to obtain the jump-point plot. Analyses were performed using SAS Version 9.4. (SAS Institute, Cary, NC, USA).

4. Discussion

In this manuscript, we have introduced a simple method to partially correct for non-ignorable early censoring. By rearranging the data as a counting process, we are able to account for the follow-up times of all patients, regardless of their censoring status. For example, if a patient is censored at 50 months, the counting process creates 50 observations corresponding to each month and accordingly assigns the value of 0 to an indicator variable. On the other hand, if the patient had an event at 50 months, the counting process would create 49 observations with the indicator variable set to 0 and similarly create new observations for each month thereafter until the last month of the study, but instead assigning the value of 1 to the indicator variable.
These counting responses are then analyzed with logistic regression. In addition to including outcome-related covariates, a variable denoting the time (e.g., month) associated with the indicator variable is added to the model. The predicted value generated by the model for a particular covariate pattern (associated with a censored observation) is then plotted against the time variable (spanning each month of the study) to obtain a jump-point plot. Dropping a line from the midpoint of this plot to the x-axis gives an imputed censored follow-up time. We then replace the original censored time with this value if it is the larger of the two values. Additionally, the imputed value is constrained by a natural upper bound to prevent impossible censored follow-up times (e.g., the value must be consistent with a patient’s maximum biologic lifespan).
An important aspect of this technique is identifying a well-fitting model for the counting process data so that it is able to accurately predict if the outcome is 0 or 1. Understanding the dynamics of the disease or process under study will aid in the selection of appropriate outcome-related covariates. However, formally testing for model goodness-of-fit is not practical given the highly correlated nature of the counting process data. In theory, while it may be possible to assess model goodness-of-fit for dependent data using a robust “Huber-White” approach, the regularity conditions for such estimates are quite stringent [15,16,17].
Instead, we recommend using diagnostic plots to rule out ill-fitting models [9,10,11]. As shown in our example, when the Jacobian leverages ( j ^ i , k ) (defined as the instantaneous rate of change in the ( i t h ) predictive value with respect to the ( k t h ) outcome) were plotted against the studentized residuals, we observed outlying values towards the upper right hand side of this plot (Figure 3B). Correspondingly, we also noticed the presence of extreme points in the lower right-hand side of Figure 3A (which plots predicted values against their residuals) and the upper right portion of Figure 3C (which plots predictive values against their expected raw residuals). Nonetheless, the plots provided in Figure 3A–C were relatively symmetrical with respect to the positive and negative residuals, suggesting that the logistic regression model provided a reasonable fit to the counting process data [18].
The advantage of using a counting process approach is that the imputed censored follow-up times, when appropriately constructed, will better reflect the survival prospects of those who continued in the study. However, because the method is model based, it may not be suitable for small datasets or those lacking a set of reasonably predictive covariates. Additionally, it may not always be possible to identify a well-fitting model if there are abrupt changes in the hazard function of the underlying data. The method at hand should not be used if patients who enroll later in study survive longer (e.g., treatment improves over time) or if enrollment criteria change over the course of the study (e.g., worst patients are excluded midway through recruitment) [3].

5. Conclusions

Overall, the best means for handling informative censoring is to avoid the problem in the first place. Careful planning at the study design stage, routine patient monitoring, and implementing proactive strategies to minimize patient dropout are some important steps to ensure the fidelity of a survival time study.
While it was beyond the scope of the current manuscript, it will be informative in future analyses to compare our method with other approaches for dealing with censored values, especially highlighting best- and worst-case scenarios [3,19,20,21,22,23,24].


Authors would like to thank the Centre for Clinical Epidemiology and Biostatistics, University of Newcastle, for support and technical assistance during the preparation of this manuscript.

Author Contributions

Jimmy T. Efird and Charulata Jindal conceived and designed the project, analyzed the data, and wrote the manuscript. Both authors approved the final version of the manuscript.

Conflicts of Interest

The authors declared no conflict of interest.

Appendix A

Table A1. Example Cancer Dataset (N = 250). A = Observation number, B = Age, C = Grade, D = Lymph node invasion, E = Positive margin, F = No hormonal therapy, G = No radiation therapy, H = Actual follow-up time, I = Original censored time, J = Imputed censored time, K = Censoring variable.
Table A1. Example Cancer Dataset (N = 250). A = Observation number, B = Age, C = Grade, D = Lymph node invasion, E = Positive margin, F = No hormonal therapy, G = No radiation therapy, H = Actual follow-up time, I = Original censored time, J = Imputed censored time, K = Censoring variable.


  1. Lagakos, S.W. General right censoring and its impact on the analysis of survival data. Biometrics 1979, 35, 139–156. [Google Scholar] [CrossRef] [PubMed]
  2. Leung, K.M.; Elashoff, R.M.; Afifi, A.A. Censoring issues in survival analysis. Annu. Rev. Public Health 1997, 18, 83–104. [Google Scholar] [CrossRef] [PubMed]
  3. Zhang, J.; Heitjan, D.F. Nonignorable censoring in randomized clinical trials. Clin. Trials 2005, 2, 488–496. [Google Scholar] [CrossRef] [PubMed]
  4. Shih, W. Problems in dealing with missing data and informative censoring in clinical trials. Curr. Control Trials Cardiovasc. Med. 2002, 3, 4. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  5. Miettinen, O.S. Survival analysis: Up from Kaplan-Meier-Greenwood. Eur. J. Epidemiol. 2008, 23, 585–592. [Google Scholar] [CrossRef] [PubMed]
  6. Wu, M.C.; Bailey, K.R. Estimation and comparison of changes in the presence of informative right censoring: Conditional linear model. Biometrics 1989, 45, 939–955. [Google Scholar] [CrossRef] [PubMed]
  7. Ranganathan, P.; Pramesh, C.S. Censoring in survival analysis: Potential for bias. Perspect. Clin. Res. 2012, 3, 40. [Google Scholar] [CrossRef] [PubMed]
  8. Collett, D. Modelling Survival Data in Medical Research, 3rd ed.; Taylor & Francis Group: Boca Raton, FL, USA, 2015. [Google Scholar]
  9. Laurent, R.T.S.; Cook, R.D. Leverages and superleverages in nonlinear regression. J. Am. Stat. Assoc. 1992, 87, 985–990. [Google Scholar] [CrossRef]
  10. Laurent, R.T.S.; Cook, R.D. Leverages, local influence, and curvature in nonlinear regression. Biometrika 1993, 80, 99–106. [Google Scholar] [CrossRef]
  11. Pregibon, D. Logistic regression diagnostic. Ann. Stat. 1981, 9, 705–724. [Google Scholar] [CrossRef]
  12. Hosmer, D.W.; Hosmer, T.; Cessie, S.L.; Lemeshow, S. A comparison of goodness-of-fit tests for the logistic regression model. Stat. Med. 1997, 16, 965–980. [Google Scholar] [CrossRef]
  13. Williams, D. Probability with Martingales; Cambridge University Press: New York, NY, USA, 1991. [Google Scholar]
  14. Crowder, M.J.; Kimber, A.C.; Smith, R.L.; Sweeting, T.J. Statistical Analysis of Reliability Data; Chapman & Hall: London, UK, 1991. [Google Scholar]
  15. White, H. Maximum likelihood estimation of misspecified models. Econometrica 1982, 50, 1–25. [Google Scholar] [CrossRef]
  16. Freedman, D.A. On The So-Called “Huber Sandwich Estimator” and “Robust Standard Errors”. Am. Stat. 2006, 60, 299–302. [Google Scholar] [CrossRef]
  17. Huber, P.J. The Behavior of Maximum Likelihood Estimates under Nonstandard Conditions. In Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, Berkeley, NC, USA, 21 June–18 July 1965 and 27 December 1965–7 January 1966; Volume I; University of California Press: Berkeley, CA USA, 1967; pp. 221–333. [Google Scholar]
  18. Stukel, T.A. Generalized logistic models. J. Am. Stat. Assoc. 1988, 83, 426–431. [Google Scholar] [CrossRef]
  19. Diggle, P.; Kenward, M.G. Informative drop-out in longitudinal data analysis. J. R. Stat. Soc. Ser. C Appl. Stat. 1994, 43, 49–93. [Google Scholar] [CrossRef]
  20. Plante, J.F. About an adaptively weighted Kaplan-Meier estimate. Lifetime Data Anal. 2009, 15, 295–315. [Google Scholar] [CrossRef] [PubMed]
  21. Hogan, J.W.; Laird, N.M. Model-based approaches to analysing incomplete longitudinal and failure time data. Stat. Med. 1997, 16, 259–272. [Google Scholar] [CrossRef]
  22. Wu, M.C.; Albert, P.S.; Wu, B.U. Adjusting for drop-out in clinical trials with repeated measures: Design and analysis issues. Stat. Med. 2001, 20, 93–108. [Google Scholar] [CrossRef]
  23. Wei, L.; Shih, W.J. Partial imputation approach to analysis of repeated measurements with dependent drop-outs. Stat. Med. 2001, 20, 1197–1214. [Google Scholar] [CrossRef] [PubMed]
  24. Little, R.J.A. Modeling the drop-out mechanism in repeated-measures studies. J. Am. Stat. Assoc. 1995, 90, 1112–1121. [Google Scholar] [CrossRef]
Figure 1. Kaplan-Meir (KM) plot comparing the original censored data with the dataset of complete follow-up times.
Figure 1. Kaplan-Meir (KM) plot comparing the original censored data with the dataset of complete follow-up times.
Ijerph 15 00690 g001
Figure 2. Kaplan-Meir (KM) plot comparing the imputed censored data with the dataset of complete follow-up times.
Figure 2. Kaplan-Meir (KM) plot comparing the imputed censored data with the dataset of complete follow-up times.
Ijerph 15 00690 g002
Figure 3. Diagnostic plots. (A) Residuals by predicted values; (B) Studentized residuals (RStudent) by Jacobian leverages; (C) Expected raw residuals by predicted values.
Figure 3. Diagnostic plots. (A) Residuals by predicted values; (B) Studentized residuals (RStudent) by Jacobian leverages; (C) Expected raw residuals by predicted values.
Ijerph 15 00690 g003
Figure 4. Jump-point plot.
Figure 4. Jump-point plot.
Ijerph 15 00690 g004
Figure 5. SAS code used to generate jump-point plot.
Figure 5. SAS code used to generate jump-point plot.
Ijerph 15 00690 g005

Share and Cite

MDPI and ACS Style

Efird, J.T.; Jindal, C. Using a Counting Process Method to Impute Censored Follow-Up Time Data. Int. J. Environ. Res. Public Health 2018, 15, 690.

AMA Style

Efird JT, Jindal C. Using a Counting Process Method to Impute Censored Follow-Up Time Data. International Journal of Environmental Research and Public Health. 2018; 15(4):690.

Chicago/Turabian Style

Efird, Jimmy T., and Charulata Jindal. 2018. "Using a Counting Process Method to Impute Censored Follow-Up Time Data" International Journal of Environmental Research and Public Health 15, no. 4: 690.

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop