Nonparametric Inference for Mixture Cure Model When Cure Information Is Partially Available †

: We introduce nonparametric estimators to estimate the conditional survival function, cure probability and latency function in the setting of a mixture cure model when the cure status is partially known. For the sake of illustration, we present an application concerning patients hospitalized with COVID-19 in Galicia (Spain) during the ﬁrst outbreak of the epidemic.


Introduction
Survival analysis arises in many applications where we want to reason about the amount of time until the considered event happens. A common assumption in standard survival modeling is that all individuals can experience the event if observed for a sufficient amount of time. Cure models [1] have been developed because there might be situations where the standard survival model is not true, for example, in the event of a recurrence in some diseases or death from some types of cancer. One challenge with time-to-event data is that the event is not always observed (censored observations). Standard cure models typically make inferences based on the assumption that the cure status information is an unobserved (latent) variable as the event is only known for the uncensored (uncured) subjects, but it is unknown for the censored observations whether it is cured or not. There are situations where cure status information is known for some of the censored individuals as they can be identified to be insusceptible to the considered event, that is, known to be cured. For example, when a medical test ascertains that a disease has entirely disappeared after treatment.
In this paper, we present kernel methods to estimate the conditional survival function, cure probability and latency function in the presence of cure status information. The proposed approach contributes to state-of-the-art in time-to-event data, as it extends previous works in the mixture cure model.

Estimation When the Cure Status Is Partially Available
Let Y be the time until the event of interest, X is a vector of covariates and F(t | x) = P(Y ≤ t | X = x) is the distribution function of Y conditional on X = x. In follow-up studies, the event of interest may not be observed due to, for example, the end of the study or loss to follow up, which occurs at censoring time C * with conditional distribution function G(t | x) = P(C * ≤ t | X = x). As a consequence, instead of observing Y, only the possibly censored survival time T * = min(Y, C * ) and the indicator of the event δ = 1(Y < C * ) can be observed. The random variables Y and C * are assumed to be conditionally independent given X = x, which is a widely used assumption in most studies. We set Y = ∞ if the subject will not experience the event and so is cured. Let ν = 1(Y = ∞) be the indicator of being cured. Note that ν is partially observed because the individual is known not to be cured (ν = 0) when the event is observed (δ = 1), but in the general situation, ν is unknown when δ = 0. When the cure status is partially known, some censored individuals are identified to be cured, so ν = 1 is observed.
To accommodate the cure status information, we include an additional random variable ξ, which indicates whether the cure status ν is known (ξ = 1) or not (ξ = 0). Furthermore, let the censoring distribution be an improper distribution function G(t | x) = (1 − π(x))G 0 (t | x). Thus, with probability π(x), the censoring variable is C * = ∞, and with probability 1 − π(x) the value of the censoring variable C * corresponds to the value of a random variable C with proper continuous distribution function G 0 (t | x). A cured individual is identified with probability P(ξ = 1 | ν = 1, In this setup, the data actually observed are . . , n} can be classified into three groups: (a) the individual is observed to have experienced the event and, therefore, is known to be uncured (X i , T i = Y i , δ i = 1, ξ i = 1, ξ i ν i = 0); (b) the lifetime is censored and the cure status is unknown (X i , T i = C i , δ i = 0, ξ i = 0, ξ i ν i = 0); and (c) the lifetime is censored and the individual is known to be cured (X i , T i = C i , δ i = 0, ξ i = 1, ξ i ν i = 1). In standard cure models where the cure status is unknown for all the censored observations, only groups (a) and (b) are considered.
The probability of cure is 1 − p(x) = P(Y = ∞ | X = x), and the conditional survival function of the uncured individuals, also known as latency, is S 0 (t | x) = P(Y > t | Y < ∞, X = x). The mixture cure model specifies the survival function S(t | x) = P(Y > t | X = x) as the following. (1) Assuming model (1) and the availability of a suitable estimator of the S(t | x), estimators of the cure probability and the latency can be derived by considering the following relationships.
Safari et al.
[2] proposed the generalized product-limit estimator of the conditional survival function S(t | x) when the cure status is partially known, which is the following: where , and ν [i] are the concomitants of the ordered observed times T (1) ≤ . . . ≤ T (n) , and B h[i] (x) is the Nadaraya-Watson (NW) weight of the following: K h (·) = K(·/h)/h is a kernel function K(·) rescaled with bandwidth h. The corresponding estimator of the cure rate 1 − p(x) [3] is the following: where T 1 (n) is the largest uncensored observed time. Here, in light of (3), (4), and the relation in (2), a nonparametric estimator of the latency function is given by the following.
The optimal bandwidth for S c h (t | x) in (3) is not necessarily the optimal bandwidth for 1 − p c h (x) in (4); therefore, the estimator in (5) is a more general estimator that uses two different bandwidths for estimating S(t | x) and 1 − p(x). Note that if h = h 1 = h 2 , then the estimator in (5) reduces to the following estimator.

Application to COVID-19 Data
For illustration of the nonparametric estimators stated in Section 2, we present an application concerning patients hospitalized with COVID-19 in Galicia (Spain) during the first outbreak of the epidemic. We have a medical database of 10,454 COVID-19 patients reported by the Galician Healthcare Service between 6 March and 7 May 2020. This database contains some information on sex, age, and the dates of different medical outcomes such as admission to the intensive care unit (ICU), discharge, or death. The aim was to estimate the time from hospital ward until admission to ICU while adjusting for age and sex. In our analysis we included only 2380 patients who had been hospitalized for at least a day. Among them, 8.3% were admitted to ICU and 91.7% were censored. In the censored group, 68.8% patients were discharged from the hospital alive and without the need for ICU, and 13.8% died without entering the ICU. Therefore, these patients were identified to be "cured" from the event of interest, which is admission to ICU. Note that in this example, "being cured" means being free of experiencing admission to ICU and not being cured in medical terms.