Multivariate Threshold Regression Models with Cure Rates: Identification and Estimation in the Presence of the Esscher Property

The first hitting time of a boundary or threshold by the sample path of a stochastic process is the central concept of threshold regression models for survival data analysis. Regression functions for the process and threshold parameters in these models are multivariate combinations of explanatory variates. The stochastic process under investigation may be a univariate stochastic process or a multivariate stochastic process. The stochastic processes of interest to us in this report are those that possess stationary independent increments (i.e., Lévy processes) as well as the Esscher property. The Esscher transform is a transformation of probability density functions that has applications in actuarial science, financial engineering, and other fields. Lévy processes with this property are often encountered in practical applications. Frequently, these applications also involve a ‘cure rate’ fraction because some individuals are susceptible to failure and others not. Cure rates may arise endogenously from the model alone or exogenously from mixing of distinct statistical populations in the data set. We show, using both theoretical analysis and case demonstrations, that model estimates derived from typical survival data may not be able to distinguish between individuals in the cure rate fraction who are not susceptible to failure and those who may be susceptible to failure but escape the fate by chance. The ambiguity is aggravated by right censoring of survival times and by minor misspecifications of the model. Slightly incorrect specifications for regression functions or for the stochastic process can lead to problems with model identification and estimation. In this situation, additional guidance for estimating the fraction of non-susceptibles must come from subject matter expertise or from data types other than survival times, censored or otherwise. The identifiability issue is confronted directly in threshold regression but is also present when applying other kinds of models commonly used for survival data analysis. Other methods, however, usually do not provide a framework for recognizing or dealing with the issue and so the issue is often unintentionally ignored. The theoretical foundations of this work are set out, which presents new and somewhat surprising results for the first hitting time distributions of Lévy processes that have the Esscher property.


Introduction
Event times for systems and their components are of research interest in many fields including medicine, engineering, the natural sciences, economics and the social sciences. These events are important milestones, happenings or outcomes such as deaths, hospitalizations, divorces, business bankruptcies, engineering failures, and so on. We will use the generic term 'failure' to describe the event and 'failure time' or 'survival time' to describe Stats 2022, 5 the time to occurrence of the event. These failure times are often determined by an underlying stochastic process reaching a threshold, boundary or critical condition that triggers the failure event. The failure times are therefore first hitting times or first passage times. First-hitting-time models for events are ubiquitous in applications across diverse fields.
Study populations are usually a mixture of individuals, such as patients, marriages, business establishments, medical devices, plants, and the like. An individual in the study population may be at risk of eventual failure from the specific cause being investigated in the study. We refer to this type of individual as a susceptible. On the other hand, the study population may include an individual who is not at risk of failure; an individual we call a non-suspectible. Those individuals who are susceptible to failure may nevertheless escape eventual failure. They may avoid failure because their condition strengthens over time, their exposure to adverse conditions declines, or simple luck may be on their side. Susceptibles may also avoid failure of the type under study because they experience a failure event of a different type (often called a competing risk). If the non-susceptibles in a study population can be identified ahead they would usually not enter the study or at least be segregated for separate investigation. Often, however, it is not possible to know in advance (or ever) if a particular individual is truly non-susceptible. In this report, we are interested in the formulation, estimation and interpretation of survival models for study populations in diverse application fields that are a mixture of susceptibles and non-susceptibles who cannot be identified in advance.
We start our investigation by looking carefully at the idea of susceptibility to failure. We then present a setting for our investigation that involves a wide class of first hitting time models for study populations of mixed susceptibility. We examine several first hitting time models in some detail to reveal the underlying mechanisms at work. Our investigation includes the impact of regression structures on the interpretation of these first hitting time models, drawing upon the growing body of work on threshold regression. We also consider the important influences of interval-and right-censoring in survival data analysis. In the medical literature, we note that the proportion of a study population that will avoid eventual failure is often referred to as the cured subpopulation and the corresponding fraction of the study population as the cure rate. We will use this terminology even in settings where avoiding failure is not usually described as a 'cure'. Thus, a marriage is 'cured' if it does not end in divorce or a business establishment is 'cured' if it does not go bankrupt.
In our study we consider threshold regession applications that involve Lévy processes having the Esscher property (which is a common circumstance). In these contexts, the correct sign of the process drift parameter cannot be identified from censored failure data alone when the study population has some fraction of non-susceptibles. The subject matter under study or other kinds of data must provide additional information about the fraction of susceptibles, if any, who might escape failure by chance. Having censored survival data aggravates the situation. This susceptibility identification challenge can be faced squarely and analyzed in threshold regression methods. The ambiguity surrounding the susceptibility fraction originates in the subject matter of the study and therefore is present irrespective of the method chosen for survival data analysis. Aside from threshold regression, other methods for survival analysis generally provide no way to recognize or deal with the issue. As a consequence, the problem usually goes unaddressed when other methods are used.

Exogenous and Endogenous Cure Rates
Consider a univariate stochastic process {D(t), t ≥ 0} with D(0) = 0. Let S denote the first hitting time of a threshold at d 0 > 0 for this process. This first hitting event is the failure event and S is the failure time or survival time. We assume the study population has a proportion p of individuals who are at risk of experiencing the event dictated by this process. As already stated, we refer to these individuals as susceptibles because each individual may experience failure within the study's time horizon. The stochastic sample paths of susceptibles are assumed to be probabilistically independent. The remaining proportion 1 − p of the study population is assumed to be not susceptible to this kind of failure because their individual experiences are not governed by stochastic process {D(t)}. We refer to proportion 1 − p as the exogenous cure rate. As an example, a medical study population may have a subpopulation of individuals who are at risk of contracting a progressive fatal disease (fraction p) while the remainder of the population are not at risk (fraction 1 − p). As a second example, a study population may have a subpopulation of individuals who engage in a high-risk sport (say, mountain climbing) and are therefore exposed to a high-risk sport injury or death while the remainder of the population never participate in the high-risk sport. The setting just described implies that a randomly chosen individual from the study population has a mixed susceptibility survival function (s.f.), which we denote by F R (s), where subscript R reminds us of the reference study population. This survival function has the following mixture form: Here p denotes the susceptibility probability with 0 < p ≤ 1 and F(s) is the s.f. of susceptibles. Our notation suppresses the dependence of s.f. F(s) on model parameters.
As we are interested later in the behavior of F(s) as s goes to infinity, we use the following notation 1 − p 0 = lim s→∞ F(s) = lim s→∞ P(S > s) for the relevant limit. Proportion p 0 is the fraction of susceptible individuals who would eventually experience failure if process {D(t)} were allowed to operate indefinitely without competition from other causes of failure or encountering a study time horizon. Thus, a situation where p 0 = 1 is one where every susceptible individual would eventually experience failure if process {D(t)} goes on without end. In real applications, of course, a failure time S is often unobservable because it is right censored by a competing failure event or the end of study follow-up. For some processes, there is no guarantee that a susceptibile individual will eventually experience failure, that is, p 0 < 1. In this situation, the individual is at risk of failure to a varying degree over time (i.e., is susceptible) but, in fact, may not experience the failure event, irrespective of the length of observation. This situation also results in a right-censored failure time with censoring produced by either a competing failure event or the end of study follow-up. The practical consequence is that susceptibles and non-susceptibles are lumped together and indistinguishable when we only know their censored failure times.
To explain and study this phenomenon, we investigate a large collection of stochastic processes that are widely found in practical applications of threshold regression. Specifically, we consider stochastic processes {D(t)} known as Lévy processes. The only type of Lévy process that possesses continuous sample paths is a Wiener process. The sample paths of all other Lévy processes exhibit jump behavior. As a result, when we speak of the first hitting time S of a level d 0 by a process {D(t)}, we know that the level D(S) at the moment of hitting the threshold will overshoot d 0 by some positive amount, except in two circumstances: (1) when {D(t)} is a Wiener process and (2) when {D(t)} is a process with fixed jump increments and d 0 equals a multiple of the increment. We distinguish between cases where D(S) = d 0 (no overshoot) and D(S) > d 0 (overshoot) in subsequent developments.
All Lévy processes possess the property of stationary independent increments. We consider that large subclass of Lévy processes {D(t)} which are (1) bidirectional (i.e., their sample paths move up and down through time) and (2) possess a cumulant generating function (c.g.f.) κ(ζ), defined as follows: where Z is an open interval containing 0. We let δ denote the mean parameter of the process. The c.g.f. assures the mean parameter exists. We will refer to such a process {D(t)} with a positive mean parameter δ > 0 as a primal process. Every primal process has a twin process {D * (t)}, called its dual process, which follows the same probability law as {−D(t)}, the negative of the primal process. This dual process has c.g.f κ * (ζ) = κ(−ζ) and mean parameter δ * = −δ. Notice that we add an asterisk to identify characteristics of the dual process. As δ > 0 for the primal process {D(t}), its sample paths tend to drift toward the failure threshold d 0 and the failure event will eventually be observed if it is not censored, that is, p 0 = lim s→∞ P(S ≤ s) = 1. In contrast, for the dual process {D * (t)} for which δ * < 0, the process tends to drift away from the threshold at d 0 and p 0 * = lim s→∞ P * (S ≤ s) < 1. The reversed sign of the dual process transforms its stochastic behavior from one in which eventual failure is guaranteed to one in which it is not guaranteed. As we will show, the existence of these twinned processes produces an identity crisis for interpreting threshold regression models with a cure rate. We now define the conditional s.f. K(s) = P(S > s|S < ∞). Thus, K(s) is the survival function for those individuals who eventually will experience failure. For the primal process, we have F(s) = K(s) because p 0 = P(S < ∞) = 1. For the dual process, we have F * (s) = p 0 * K * (s) + 1 − p 0 * because p 0 * = P * (S < ∞) < 1. As shown in the Appendix A, the primal-dual pair of processes {D(t)} and {D * (t)} have identical conditional survival functions K(s) and K * (s) if (1) the primal process possesses the Esscher property and, in the case of processes that overshoot, (2) the primal process is in equilibrium. We now briefly define the two properties. The Appendix A provides more technical background and explanations of the properties.
Esscher Property. The Esscher property holds if the c.g.f. of the primal process has the form κ(ζ) = κ(−ζ − U), where U is a positive constant defined by This property is named for the Esscher transformation of a probability density function. Mathematically, the transformation is f * (x) = C(a) exp(ax) f (x) for a constant a and normalizing constant C(a). In our use of the transformation, we set a = −U so C(−U) = 1.
In this situation, the the original and transformed probability density functions f (x) and f * (x) have c.g.f.s of form κ(ζ) and κ(−ζ − U), respectively, with κ(ζ) = κ(−ζ − U). Equilibrium Property. The equilibrium property holds if the process {D(t)} has been operating for an extended earlier period and observation of the process starts at a random moment when the process has an upward crossing of level D = 0. This random moment is taken as the origin of the time scale (t = 0) and, hence, survival time S is measured from this moment. The precise mathematical definition of the equilibrium property, which requires a relaxation of the initial condition D(0) = 0, is set out in the Appendix A.
The presence of identical conditional survival functions under the Esscher and equilibrium properties gives rise to the identity crisis we anticipated earlier. The fact that K(s) ≡ K * (s) provides the following competing representations of the s.f. F R (s) for the study population under the primal and dual processes.
= pK(s) + (1 − p) for the primal process. (4) We see from (4) and (5) that the s.f. for the study population is the same mathematical function of survival time s under the primal and dual processes with the exception that the apparent susceptibility rate for the dual process is p * p 0 * whereas its true susceptibility rate is p * . The true susceptibility rates for the twinned processes, namely, p and p * , are different but cannot be distinguished from survival data alone. Furthermore, with a limited study horizon and perhaps interruptions from competing events, the observed survival data will be right-censored. So, it is only some fraction of the at-risk subpopulation whose failure times will actually be observed in the study. All others will have censored outcomes.
The proportion 1 − p in the primal process and 1 − p * in the dual process are different exogenous cure rates. We now refer to probability p * (1 − p 0 * ) as an endogenous cure rate for the dual process because it is this fraction of the susceptible population that will not experience failure under the dual model. The potential presence of both cure rates in the dual model leads to a combined cure rate for the study population of 1 − p * + p * (1 − p 0 * ) = 1 − p * p 0 * . The challenge for survival data analysis that is raised by the mixed susceptibility models (4) and (5) is the difficulty of distinguishing non-susceptible cases from susceptible cases that happen to escape failure.

Examples of Mixed Susceptibility Models for Lévy Processes
Numerous families of Lévy processes have the Esscher propery and all of these have potential application as mixed survival models that include a cure rate component. The Appendix A presents mathematical details for four families of Lévy processes, namely, Wiener processes, Poisson-Bernoulli random walks, bilateral gamma processes and bilateral inverse Gaussian processes. We have chosen the Wiener family for detailed development as a case illustration in order to make the preceding ideas and issues concrete. We start with a brief overview of threshold regression to provide a context for the case illustration that follows.

Threshold Regression
Threshold regression refers to a category of regression models and methods for survival data analysis. The main idea of threshold regression is that a survival or failure time is often well-described mathematically as the first hitting time of a boundary or threshold by the sample path of a stochastic process. Regression functions for the process and threshold parameters in these models are multivariate combinations of explanatory variates. Estimation methods for threshold regression models are well developed and these have found extensive application. Wiener processes in particular have been widely used in threshold regression models. See, for example, [1][2][3][4][5][6]. The book by [7] provides an extensive overview of both theory and applications. Different variants of threshold regression models and notation are encountered. A common variant of what we presented earlier considers a process {Y(t)} which starts at Y(0) = y 0 > 0 and has mean drift parameter µ and variance parameter σ 2 > 0. In this variant, the first hitting time S of a threshold at the zero level of the process defines the survival time. Note the following correspondence between this notation and our earlier notation: For the Wiener process model, the dual process is one with drift away from the threshold, that is, one where δ * = −µ < 0. For the dual model, the probability of eventual failure is given by: Parameter σ 2 is usually set equal to 1 because the Wiener model is overparameterized for survival data. The Esscher property holds for all Wiener processes and, because the Wiener sample path is continuous, there is no overshoot. Our theory shows that the conditional survival functions of both the primal and dual processes are identical, i.e., K * (s) ≡ K(s). This conditional s.f. is an inverse Gaussian distribution [7][8][9].

Case Illustration
In an application of survival model (1), estimates of the model parameters might be calculated from interval-censored failure data or, more specifically, from right-censored failure data using maximum likelihood estimation. We assume the sample is a simple random sample of individuals from the study population. We also assume that the censoring mechanism is independent of whether the individual is susceptible or not and, if susceptible, independent of the individual's failure time. The sample log-likelihood function for right-censored failure data takes the form: where θ = (d 0 , δ, p, other parameters) is the vector of model parameters to be estimated and f R is the probability density function (p.d.f.) corresponding to the population s.f. F R . Index set N 1 in (7) includes subjects i who failed at times t i , i ∈ N 1 , while index set N 0 includes subjects i who were censored by end of follow-up or a competing failure at times t i , i ∈ N 0 . Observe in this last case that the censoring time or time horizon may vary from one subject to the next. Finally, we note that in setting up the estimation problem we use the notation for the primal model. We now give an illustration for the Wiener model. The estimation problem described by log-likelihood expression (7) seems to present nothing new in principle. Yet, a comparison of versions (4) and (5) of the population survival functions shows that there is an unresolved ambiguity. The sign of mean parameter µ = −δ cannot be determined from the survival data alone. Table 1 shows alternate sets of estimates for model parameters from a single simulated data set. The simulated study population corresponds to a dual process with ln(y 0 ) = ln(d 0 ) = 1, µ = −δ * = 0.1, and p * = 0.9. The sample size is n = 100,000. All failure times are right censored at 70. The table shows the estimates in columns headed by the generic notation ln(y0), mu and logit(p). Estimation of the model is done using a numerical gradient routine in Stata 12. The table shows the log-likelihood value and parameter estimates for two different sets of starting values.  The two estimation runs in Table 1 produce identical sample log-likelihood values and almost identical estimates of ln(y0). The parameter estimates for mu and logit(p) differ sharply. Closer study shows that the estimates match model variants (4) and (5). Estimation run 1 happens to provide estimates for the dual model in (5) with µ = −δ * = 0.1 and logit p * = ln[0.9/(1 − 0.9)] = 2.1972. Here, the true value of p 0 * is 0.5806, calculated from Formula (6). Estimation run 2 happens to provide estimates for the corresponding primal model. The mu estimate is reversed in sign relative to the dual model and the logit(p) estimate is actually the estimate of logit(p * p 0 ), which equals ln{0.9(0.5806)/[1 − 0.9(0.5806)]} = 0.09030. The example demonstrates that the sign of mean parameter µ is not resolvable from censored survival data for this Wiener model. A further implication is that one cannot know if the estimated cure rate represents a purely non-susceptible subpopulation (exogenous fraction 1 − p * ) or a mixed subpopulation with some fraction of true non-susceptibles (exogenous fraction 1 − p * ) combined with an added fraction of susceptibles who escaped failure by good fortune (endogenous fraction p * − p * p 0 * ). The latter case produces an apparent cure rate of 1 − p * p 0 * . For the simulation scenario selected here, it is the first estimation run that gives correct estimates of the model. An outside observer, however, cannot know this fact from either the censored survival data set or the estimates derived from it.

Identifiability Challenge for the Mixed Susceptibility Model
The preceding case illustration, with the results displayed in Table 1, has demonstrated that the sign of the mean parameter for the threshold regression model cannot be decided unambiguously in the presence of possible exogenous and endogenous 'cures' in the study population. The sign of the estimated mean parameter depends, by chance, on the starting point for the maximum likelihood optimization. Because the conditional survival functions of the primal and dual processes are identical, i.e., K * (s) ≡ K(s), the survival data yield different but equally valid sets of estimates in terms of sample likelihood. If the study population were partitioned into subpopulations (say, subpopulations defined by age group, sex, marriage status, or other categorical variables) the same problem arises for each subpopulation, namely, the sign of the estimated mean parameter for each subpopulation would depend on the starting point for the maximum likelihood optimization. Independent subject matter knowledge would be needed to decide on the correct sign for the mean parameter.
In typical real-world applications of threshold regression, the mean parameter, threshold and exogenous cure rate fraction are made to depend on linear combinations of covariates through suitable regression link functions. The multivariate regression structures are rarely sufficiently detailed to define actual subpopulations and are usually misspecified to some degree. In these practical circumstances, estimation of the model from interval-or right-censored failure data would almost certainly lead to a unique maximum likelihood solution without any hint of the identifiability issue. But, our mathematical analysis for twinned stochastic processes anticipates that the computed threshold regression solution may have a competing local solution that is a close competitor to the global optimum. This close competitor would be a twinned process with an estimated mean having the opposite sign. Moreover, the solution and its twin would be associated with different cure rates.
As just mentioned, the drift parameter δ for our model often has a multivariate regression structure in threshold regression applications. A common formulation is an identity link function for δ connecting it to a linear combination of covariates. In this formulation, δ = zβ where z is a k-component row vector of covariates and β is a matching column vector of regression coefficients. The leading covariate may be the unit value 1 so the leading regression coefficient is an intercept term. For a given covariate vector z, the sign of the regression parameter vector must be reversed in order to reverse the sign of the drift parameter itself. In other words, δ * = zβ * = z(−β) = −δ. This is an elementary mathematical change but it does suggest a way in which subject matter knowledge can check on the identifiability problem. If the regression vector is estimated from censored survival data, the investigator might examine the sign of the regression coefficient for each covariate (and the intercept) to see if the direction of effect is consistent with subject matter knowledge. Many applications, however, do not offer this opportunity to settle the ambiguity. Sampling error in estimates may make it unclear if the estimated sign of the coefficient is reliable. Moreover, the presence of multicollinearity in a regression model with multiple covariates may make it difficult to judge the directional effect of any single covariate. An investigator may be inclined to replace the identity link function in threshold regression with a log-linear link function of form ln δ = zβ so that the estimated drift is positive. This approach may be adopted because the investigator is confident that the drift should be towards failure. The strategy may be self-defeating. Although the mathematical device preserves the anticipated sign of the drift parameter, a substantively different statistical model is being estimated. The estimated model may be the wrong model and, in consequence, hides the ambiguity about the direction of drift.
The identifiability issue will not be present if it is known with certainty that the study population consists totally of susceptible individuals. In this circumstance, it is only the dual model which has some fraction of the study population that survives indefinitely. In the threshold regression context, the endogenous cure rate is determined by parameters of the stochastic process {D(t)}, as illustrated by Formula (6) for the Wiener process. Thus, no separate regression function should be employed for logit p 0 * in this case.
The presence of censoring aggravates the identifiabilty issue because the true failure times for censored individuals are masked. Thus, precise estimates of the susceptibility rate and the other parameters are not available. In this situation, even available subject matter knowledge may not be sufficient to untangle p from p * p 0 * in the primal and dual models. Of course, small study sample sizes also make the situation more difficult.
The identifiability issue might be resolved practically if the process level of a survivor is observed at the time of censoring. For example, the extent of deterioration of a knee joint for a patient with osteoarthritis might be measured at a clinic visit that precedes the knee's replacement. The visit censors the replacement, which will be triggered if and when deterioration reaches a prespecified critical level. In a situation where the process level of a survivor is observed, the sample log-likelihood function, such as the one presented in (7), would have the term ∑ i∈N 0 ln F R (t i ) replaced by a sum of log-densities for known process levels D(t i ) = d i at censoring times t i for subjects i ∈ N 0 . This revised likelihood function will help to infer primal cases for which δ > 0 from dual cases for which δ * < 0. See [10] for a discussion of statistical methods for this kind of data situation for Lévy processes.

Multidimensional and Other Complex Extensions
Our investigation of the identifiability challenge arising in mixed susceptibility models has been framed for a univariate stochastic process because this context provides the simplest demonstration of the issue. We now present a couple of extensions that illustrate where and how care may be needed to deal with the issue in multidimensional and other complex contexts.
Multidimensional stochastic processes are ubiquitous in many fields. Chemometrics provides an example of an important field that deals with multidimensional processes and has considerable interest in relation to time trends of complex chemical processes and their critical thresholds. See [11] for an extensive overview of the methodological challenges in chemometrics applied to environmental monitoring and [12,13] for specific application settings in chemometrics that reflect the complexity of data modelling in this field. Applications in chemometrics and many other fields may concern the first hitting time of a plane surface by a multidimensional process. When drawing statistical inferences from high-dimensional data about whether a multidimensional process is moving toward or away from a critical barrier in these applications, one can encounter the kind of identifiabily issue being considering in our study here. And when the issue is encountered, it is likely to be a significant scientific concern. The first hitting time of a plane in a high dimensional space is a prototypical problem for multivariate stochastic processes. This kind of application presents an exact analog of our univariate model setup and so our findings translate immediately to this context. There are many families of multidimensional processes. Our research has not attempted a general investigation of this interesting topic. To illustrate the extension, however, we consider an obvious case example, namely, a multidimensional Wiener process, which is frequently a valid model for real-world phenomena. To be specific, let D(t) denote a k-dimensional Wiener process operating in a k-dimensional linear vector space X with D(0) = 0, where 0 is the origin of the space. Let superscript T denote the transpose of a vector. Consider a plane a T x = c in the X space, where we let c > 0 so the origin 0 lies outside the plane. Let the plane serve as a barrier or boundary for the process (i.e., a multidimensional threshold). We denote the projection of the sample path of the process D(t) onto a normal axis by D(t). The normal axis is a line directed by a normal vector from the plane through the origin 0. D(t) is a univariate process that has temporal movements toward or away from the origin in a direction that is orthogonal to the plane. In this situation, we know D(t) is a Wiener process, with a mean δ and variance σ 2 that can be derived from the mean vector and covariance matrix of the parent Wiener process D(t). A first hitting time S of the plane occurs if and when the sample path of D(t) first encounters the plane at distance d 0 = c/ √ a T a along the normal axis. D(t) is a primal process in our terminology if its drift parameter is positive, i.e., if δ > 0. D(t) also possesses the Esscher property. This application includes the ambiguity of mixed susceptibility in relation to first hitting times of the plane if the study population includes a proportion 1 − p of individual cases that are not susceptible to ever reaching the plane. For other families of multidimensional processes (such as multidimensional bilateral gamma processes), one would need to verify that the Esscher property holds for the univariate projection of the process onto a normal axis to the plane through the process origin.
For another extension of interest, we consider the conditions under which the Esscher property is preserved for stochastic processes formed from mathematical combinations of component stochastic processes. Again, this research question will not be investigated in depth here, but we sketch the following result to show the nature of the topic. Consider a set of k independent Lévy processes D j (t), j = 1, . . . , k, which each has the Esscher property. Let D(t) = ∑ j a j D j (t) be a linear combination of the k processes, where the a j are scalar coefficients. If κ j (ζ) is the c.g.f. for a unit increment in process D j (t), j = 1, . . . , k, we know that the c.g.f. of D(t) is given by φ(ζ) = ∑ j κ j (a j ζ). Moreover, as each component process satisfies the Esscher property, we have κ j (ζ) = κ j (−ζ − U j ), j = 1, . . . , k, where U j denotes the Esscher transform parameter for process D j (t), defined by (3). The Esscher property also implies that κ j (a j ζ) = κ j (−a j ζ − a j U j ). We now consider the sum ∑ j κ j (−a j ζ − a j U j ). It is evident that if U 1 = · · · = U k = U then we have: Thus, process D(t) possesses the Esscher property when all component processes share the same Esscher transform parameter U j . Reference to the case-demonstration process families considered in Appendix A.6 shows the kind of restrictions on the component process parameters that are implied by requiring that U = U j for all j. For example, if all component processes are Wiener processes then the ratio 2δ j /σ 2 j must be the same for all component processes. The preceding demonstration shows that the Esscher property is preserved in this linear combination of component processes even if the components come from different Lévy process families.

Final Remarks
The identifiability issue in this research arises under highly specialized circumstances and, hence, is an idealized construction. The exacting formulation is used to show the precise mathematical nature of the problem. Yet, it is clear that even when the exact conditions are not satisfied perfectly, the ambiguity of mixed susceptibility can still be present and problematic. Survival functions for first hitting times can be very similar for processes that might be drifting toward or away from a threshold, even if the generating process is not a true Levy process or the Esscher property is not quite valid. The situation reminds us that we may not be able to untangle susceptible from non-susceptible cases in the 'cure rate' fraction of a population based on censored survival data alone.
Our research focuses on explaining the ambiguous composition of the cure rate when both susceptible and non-susceptiable fractions may be present. We show that the ambiguity arises precisely when the conditional survival functions for the primal and dual processes are identical. This research program can be broadened by asking what theoretical circumstances must hold for identical survival time distributions to arise from different generating models within the same model class. We created pairs of processes which were twinned by simple reversal of the sign of a drift parameter. Discovery of more theoretical circumstances that produce identical survival functions will require an extended and challenging research program, but the program may produce deep insights and results that advance mathematical statistics. Natural research questions for the program might include such questions as: (1) Can identical conditional survival functions be generated from nonlinear thresholds for some family of univariate processes or non-planar boundaries for some family of multidimensional processes? (2) Do generalizations of the Esscher property exist whereby survival data or censored survival data necessarily produce ambiguous inferences for model parameters when the models have this property? Much interesting future work lies ahead. If the process has no overshoot then V = 0 and we take h(v|s) as a limiting density function concentrated on 0 (a Dirac delta function).
We will refer to {D(t)} as our primal process and its dual process by {D * (t)}, where an asterisk subscript is used to designate properties of the dual process. The dual process {D * (t)} has the same probability law as {−D(t)}, that is, {D * (t)} ∼ {−D(t)}. So {D * (t)} has c.g.f. κ * (ν) = κ(−ν) defined for all ν ∈ Z * = −Z. We are interested in the situation where the primal process {D(t)} has a positive mean parameter δ > 0 so that its dual process has a negative mean parameter δ * = −δ < 0. Sample paths of the primal process must eventually hit threshold d 0 so P(S < ∞) = 1 for its first hitting time S when δ > 0. Sample paths of the dual process {D * (t)} are not guaranteed to reach the threshold d 0 so P * (S < ∞) < 1 for its first hitting time S.
A sample path of the primal process {D(t)} either reaches level D at time t without hitting threshold d 0 during interval (0, t] or it does hit d 0 at some time S = s ≤ t and subsequently travels from D(s) to D in the interval (s, t]. This line of reasoning gives us the following probability identity connecting the mathematical constituents of this setting: The corresponding probability identity for the dual process is: Here the functions with asterisks are the corresponding entities for the dual process that are found in (A1) for the primal process.

Appendix A.2. The Esscher Property
We now narrow the subfamily of stochastic processes of interest and consider primal processes {D(t)} for which the c.g.f. κ(ζ) has the property: where U > 0 is defined as follows: Later case demonstrations show that Lévy processes possessing property (A3) form a large family.
The property κ(ζ) = κ(−ζ − U) implies that: where we recall that f (D|t) denotes the p.d.f. of D = D(t). For the dual process {D * (t)}, we use (A5) to obtain: Equation (A6) shows that the dual p.d.f. is the following exponentially tilted version of the p.d.f. for the primal process: A transformation of a p.d.f. p(x) into a second p.d.f. of form p(x|a) = C(a) exp(ax)p(x), where C(a) is a normalizing constant, is known as an Esscher transform. The transformation found early application in actuarial science and more recently in financial engineering and simulation sampling methods [14][15][16]. The property κ(ζ) = κ(−ζ − U) can be seen from (A7) to imply an Esscher transform of f (D|t) with transform parameter a = −U. This particular choice of a ensures that normalizing constant C(a) equals 1 so the transformed function f * (D|t) is itself a probability density function. Because of this connection to the Esscher transform, we refer to this property for our stochastic processes as the Esscher property.
where f (i) (∆D i |∆τ i ) and f (i) * (∆D i |∆τ i ) denote the respective probability densities. It follows then that the probability densities of any given sample path J from 0 at time 0 to D(τ) at time τ for the primal and dual processes are related by: where f (J|τ) and f * (J|τ) denote the respective probability densities for path J. The relationship in (A9) shows that the probability densities of the primal and dual processes for any path J under the Esscher property are proportional, with the multiplier exp[UD(τ)] being determined only by the cumulative vertical distance D(τ) travelled by the sample path J.
The result in (A9) provides a mechanism for evaluating probabilities of different bundles or sets of sample paths for a stochastic process observed over a given time interval (0, t]. The mechanism visualizes discrete simulations of many sample paths for the process using a fixed partition of the time interval (0, t]. In particular, we consider the simulation of a set J of independent sample paths from the primal process where f (J|t) denotes the probability density of a path J ∈ J . If we are interested in estimating the probability of experiencing a simulated sample path that meets a specified condition C then we sort through set J and keep only sample paths that meet the condition, resulting in a subset that we can denote by J C ⊆ J . The probability density for this subset is therefore ∑ J∈J C f (J|t). We can exploit the correspondence of path densities in (A9) to evaluate the relationships between corresponding terms in the probability identities (A1) and (A2). P.d.f. f (D|t), for example, is the probability density for the primal process of the set of sample paths that connect the origin to D = D(t) at time t, which we now denote by P. Using (A9), we then have the following link between f (D|t) and f * (D|t) for the primal and dual processes.
Next we consider the subset of P that contains sample paths that do not cross the threshold d 0 in the interval (0, t]. We denote this subset by P 0 ⊆ P. Using (A9), we have the following link between the conditional p.d.f.s g(D|t) and g * (D|t) for the primal and dual processes.
Note that probabilities P(S > t) and P * (S > t) must be included to adjust for the condition that the first hitting time lies beyond t.
The density functions in the rightmost terms of (A1) and (A2) arise from sample paths that cross the threshold d 0 in the interval (0, t]. The correspondence of these primal and dual densities is established as follows. Let subset P (s,v) ⊆ P − P 0 consist of paths that first cross the threshold d 0 at a specified time s in the interval (0, t], experience an overshoot of v, and then proceed to level D = D(t) at time t. So, again using (A9), we have the following equation: The distribution of process increment D − D(s) = D − d 0 − v does not depend on D(s) for either the primal or dual processes. Therefore, when we integrate both sides of (A12) with respect to D − D(s), the following integrals evaluate to 1: Using this result, Equation (A12) reduces to: Of particular interest is the limit of (A14) if we now let t go to infinity. As a first hitting is inevitable for the primal process, we have P(S < ∞) = 1. For the dual process, a first hitting may not occur in finite time so the limiting probability P * (S < ∞) will be less than 1. Equation (A14) therefore reduces to: To proceed further, we will now separate processes according to whether they overshoot threshold d 0 or not when their sample paths first exit the threshold.

Appendix A.4. First Hitting Times without Overshoot
Lévy processes that do not overshoot a threshold are of two types, namely, Wiener processes or processes with fixed jump increments where d 0 is a multiple of that fixed increment. With no overshoot, Equation (A15) simplifies as follows because random variable V(S) equals 0 with probability 1. k(s) = P * (S < ∞)k * (s)e Ud 0 ) for all s > 0. (A16) As k(s) and k * (s) are both p.d.f.s and P * (S < ∞) and e Ud 0 are both constants, it follows necessarily that k(s) ≡ k * (s) and P * (S < ∞) = e −Ud 0 in order to maintain the equality in (A16). Thus, the identity K(s) ≡ K * (s) is assured for all processes without overshoot.

Appendix A.5. First Hitting Times with Overshoot
We next consider bidirectional Lévy processes with jump components that do overshoot thresholds. To deal with these processes, we reformulate the first hitting time of the process. Instead of having primal process {D(t)} start at the origin with D(0) = 0, we let the process operate until it encounters its first positive level and shift the time origin to that position. We denote this initial positive level by V 1 so D(0) = V 1 . To proceed, we return to a discrete version of the sample path for the re-scaled process {D(t)} and consider a partition of time interval [0, τ], with equally spaced time points τ 0 = 0 < τ 1 < · · · < τ k = τ where τ i = i∆τ and ∆τ = τ/k. Let the successive levels D i , i = 0, 1, . . . , k, correspond to D i = D(τ i ), so D 0 = D(0) = V 1 and D k = D(τ). We then consider the successive maxima M j = max i≤j D i of the process, which starts with M 1 = V 1 . We define the successive increments in the maxima by ∆M j = M j − M j−1 , j = 2, . . . , k. Now we keep only positive increments ∆M j , discarding all ∆M j that are zero. These positive increments are necessarily independent and identically distributed. The increments mark out the vertical advance of the partitioned primal process from starting level V 1 toward the failure threshold at d 0 . We re-index the successive positive increments and denote them by R n = ∆M j , n = 2, . . . , m, where ∆M j is the nth positive increment in the sequence. It is evident that the set {V 1 , R 2 , . . . , R m } is a sequence of m renewal intervals for a delayed renewal process [17]. The delay V 1 in the process is, in effect, the overshoot of the zero-axis by the process in its first passage to that level.
Building on this formulation as a delayed renewal process, we denote the s.f. of the successive renewal intervals R 2 , . . . , R m by H(r). Next, we assume that the delayed renewal process is, in fact, in equilibrium so M 1 = V 1 is a random draw from the equilibrium c.d.f.: where µ denotes the mean of s.f. H(r). Parameter µ is the mean vertical advance of the process whenever a positive advance occurs. With the assumption that the renewal process is in equilibrium, it follows therefore that H E (v) is also the c.d.f. of the overshoot that will be experienced when the primal process crosses threshold d 0 . Moreover, the equilibrium condition implies that the overshoot c.d.f. H E (v) will be independent of the first hitting time S of threshold d 0 .
Finally, we consider the corresponding dual process {D * (t)} which, of course, is also bidirectional with a jump component. The preceding line of development for constructing an equilibrium renewal process for advances toward the failure threshold at d 0 can be applied to the dual process. For the dual process, we denote the s.f. of the renewal interval by H * (r) and the equilibrium c.d.f. for overshoot by H E * (v).
We return now to Equation (A14) and rearrange it into a product of ratios for matched terms as follows: It can be seen that only the rightmost ratio on the left side of Equation (A18) contains terms that are functions of overshoot v. It follows therefore that this ratio cannot depend on v although it may depend on survival time s. Denoting this rightmost ratio by Q(s), we have: Now, however, we can add that if the primal process and dual processes are in equilibrium then the densities in (A19) do not depend on s and, hence, Q(s) equals some constant Q for all s. In this case, the two density functions in the ratio are those corresponding to c.d.f.s H E (v) and H E * (v), which we denote by h E (v) and h E * (v), respectively. Moreover, in this same equilibrium condition, it follows that the density functions k(s) and k * (s) for the survival time S of the primal and dual processes must be identical. Thus, the identity K(s) ≡ K * (s) is assured for all bidirectional Lévy processes with overshoot, provided survival time is measured from a time point where the process is in equilibrium. As a final consequence, we see that constant Q = e Ud 0 P * (S < ∞) so the probability that S is finite is given by P * (S < ∞) = Qe −Ud 0 . When the primal process has no overshoot, we have from (A19) that Q(s) = Q = 1 for all s and, hence, the probability that S is finite is given by P * (S < ∞) = e −Ud 0 , a result we saw earlier.
An implication of adopting the equilibrium formulation for the overshoot case is that the first hitting time of d 0 will be immediate if V 1 ≥ d 0 . In other words, if the initial process maximum M 1 = V 1 falls above threshold d 0 then S = 0 and failure occurs at the outset. The survival function therefore has a probability mass at zero equal to P(S = 0) = P(V 1 ≥ d 0 ).

. Case Demonstrations
We now present case demonstrations of stochastic process families in which the primal and dual processes have the same survival distributions. We begin by looking at two families whose sample paths do not exhibit overshoot. We then consider an additional two families that do exhibit overshoot.

Poisson-Bernoulli Random Walk
As the primal process, consider a Poisson-Bernoulli random walk {D(t)} defined by the following doubly stochastic process. Bernoulli events occur according to a Poisson process with rate parameter λ and each Bernoulli event is either an up-step of +1 with probability p or a down-step of −1 with probability 1 − p. Assume the event outcomes occur in a mutually independent fashion. We take the threshold d 0 to be a natural number so there is no overshoot. The process therefore takes the following form: Here {N(t)} denotes the Poisson process and {B[N(t)]} denotes the binomial process for the number of up-step outcomes among the N(t) Bernoulli trials occurring during time interval (0, t]. The probability function of D(1) = D for this process is evaluated by expanding the probability product P[B = b|N(1) = n]P[N(1) = n] and then summing over all pairs (b, n) for which d = 2b − n, taking account of the condition that b = 0, . . . , n for n = 0, 1, . . .. The result is the following form, which involves a tractable infinite series: where I(·) denotes an indicator function. A little mathematics gives the following c.g.f. for the process {D(t)}: which is defined for all real numbers ζ. The mean parameter is easily evaluated from this c.g.f. as δ = (2p − 1)λ. To ensure a positive mean parameter, we set p > 1/2.
The first hitting time S of the primal and dual processes has the following c.g.f.:

Cases with Overshoot
For both cases with overshoot, we consider bilateral processes that are created by taking the difference of independent monotonically non-decreasing jump processes with stationary independent increments.

Bilateral Inverse Gaussian Process
As the primal process, consider a bilateral inverse Gaussian process {D(t)}, defined as the difference between two independent inverse Gaussian processes {H 1 (t)} and {H 2 (t)}, as follows: In the most general case, the processes may have different positive scale parameters η i , i = 1, 2, and different positive shape parameters ν i , i = 1, 2. The general bilateral inverse Gaussian process has c.g.f.: for ζ ∈ Z = (− (A36) Equal shape parameters. The special case in which the component processes {H 1 (t)} and {H 2 (t)} share the same shape parameter, say, ν 1 = ν 2 = ν, provides a family of processes {D(t)} for which δ = (η 2 − η 1 )/(η 1 η 2 ) > 0 if η 2 > η 1 . The c.g.f. in (A36) simplifies to: It can be verified that the Esscher property κ(ζ) = κ(−ζ − U) holds in this case with U = (η 2 2 − η 2 1 )/(2ν). Equal scale parameters. The counterpart of the equal-shape case is the equal-scale case in which the component processes {H 1 (t)} and {H 2 (t)} share the same scale parameter, say, η 1 = η 2 = η. In this case, the family of processes {D(t)} have mean δ = 0 and, hence, do not fall into the class of processes that interest us here. Equation κ(ζ) = 0 has only a single root at 0 in the interval ζ ∈ Z so the Esscher property does not hold.