Asymptotic Properties for Cumulative Probability Models for Continuous Outcomes

Regression models for continuous outcomes frequently require a transformation of the outcome, which is often specified a priori or estimated from a parametric family. Cumulative probability models (CPMs) nonparametrically estimate the transformation by treating the continuous outcome as if it is ordered categorically. They thus represent a flexible analysis approach for continuous outcomes. However, it is difficult to establish asymptotic properties for CPMs due to the potentially unbounded range of the transformation. Here we show asymptotic properties for CPMs when applied to slightly modified data where bounds, one lower and one upper, are chosen and the outcomes outside the bounds are set as two ordinal categories. We prove the uniform consistency of the estimated regression coefficients and of the estimated transformation function between the bounds. We also describe their joint asymptotic distribution, and show that the estimated regression coefficients attain the semiparametric efficiency bound. We show with simulations that results from this approach and those from using the CPM on the original data are very similar when a small fraction of the data are modified. We reanalyze a dataset of HIV-positive patients with CPMs to illustrate and compare the approaches.


Introduction
Regression analyses of continuous outcomes often require a transformation of the outcome to meet modeling assumptions.Because the correct transformation is often unknown and it may not fall in a prespecified family, it is desirable to estimate the transformation in a flexible way.Transformation models have been introduced to address this issue (Cheng et al., 1995;Hothorn et al., 2017).
These models involve a latent intermediate variable between the input and outcome variables and two model components, one connecting the latent variable to the outcome variable through an unknown transformation, and the other connecting the latent variable to the input variables as in traditional regression models.In semiparametric transformation models, the first component is modeled nonparametrically while the second parametrically.
The Cox proportional hazards model (Cox, 1972) for time-to-event outcomes is an example of a semiparametric transformation model, in which the effects of covariates are modeled parametrically and the baseline hazard function is modeled nonparametrically.More general transformation models including proportional hazards and proportional odds models have been studied extensively in the literature.Among them, Zeng and Lin (2007) proposed a general nonparametric maximum likelihood framework for right censored data, where the cumulative hazard function is estimated as a step function with non-negative jumps at the observed failure times.They further established the asymptotic properties of the resulting estimators, including consistency and asymptotic efficiency.
However, these approaches cannot be applied directly to study transformation models for a continuous outcome, because the outcome has no bounds on its range and there is no clear definition of a hazard rate function for such transformation models.Furthermore, the algorithms proposed in Zeng and Lin (2007) are based either on brute force optimization, which may not guarantee convergence, or on slow expectation-maximization algorithms.Liu et al. (2017) studied the performance of semiparametric linear transformation models for continuous outcomes.They showed that linear transformation models are cumulative probability models (CPMs), and that their nonparametric likelihood function is equivalent to a multinomial likelihood treating the outcome variable as if it were ordered categorical with the observed values as its categories.This result allows us to fit semiparametric transformation models for continuous outcomes using ordinal regression methods.They showed with simulations that CPMs perform well in a wide range of scenarios.However there is no established asymptotic theory for the method.
One main hurdle is that the unknown transformation of the continuous outcome variable can have an unbounded range of values, which makes it hard to establish asymptotic properties of CPMs across the whole range.
In this paper, we prove several asymptotic properties for CPMs when they are applied to data that are slightly modified from the original.Briefly, a lower bound L and an upper bound U for the outcome are chosen prior to analysis, the outcomes are censored at these bounds, and then a CPM is fit to the censored data.We prove that in this censored approach, the nonparametric estimate of the underlying transformation function is uniformly consistent in the interval [L, U ].We then describe its asymptotic distribution as well as the joint asymptotic distribution for both the estimate of the transformation function and the estimates of the coefficients for the input variables.
We also show that the results from this censored approach and those from the CPM on the original data are very similar when a small fraction of data are censored at the bounds.

Cumulative Probability Models
Let Y be the outcome of interest and Z be a vector of p covariates.The semiparametric linear transformation model is where H is a transformation function assumed to be non-decreasing but unknown otherwise, β is a vector of coefficients, and is independent of Z and is assumed to follow a continuous distribution with cumulative distribution function G(•).An alternative expression of model ( 1) is where A = H −1 is the inverse of H.For mathematical clarity, we assume H is left continuous and define A(y) = sup{z : H(z) ≤ y}; then A is non-decreasing and right continuous.
Model ( 1) is equivalent to the cumulative probability model (CPM) presented in Liu et al. (2017): Suppose the data are i.i.d. and denoted as (Y i , Z i ), i = 1, . . ., n. Liu et al. (2017) proposed to model the transformation A nonparametrically.The corresponding likelihood is where A(y−) = lim t↑y A(t).Since A can be any non-decreasing function, this likelihood will be maximized when the increments of A(•) are concentrated at the observed Y i .Liu et al. (2017) showed that this is equivalent to treating the outcomes as if they were ordered categorical with the observed distinct values as the ordered categories, and that the nonparametric maximum likelihood estimates can be obtained by fitting an ordinal regression model.They showed in simulations that CPMs perform well under a wide range of scenarios.However, since some observed Y i can be extremely large or small and the observations at both tails are often sparse, there is high variability in the estimate of A at the tails.Moreover, the unboundness of the transformation at the tails makes it difficult to control the compactness of the estimator of A, thus making most of asymptotic theory no longer applicable.

Cumulative Probability Models on Censored Data
In view of the challenges above, we hereby describe an approach in which the outcomes are censored at a lower bound and an upper bound before a CPM is fit.We will then describe the asymptotic properties of this approach in Section 2.3, and show with simulations that the results from this approach and those of the CPM on the original data are similar when a small fraction of data are censored.
More specifically, we predetermine a lower bound L and an upper bound U , and consider any observation outside the interval (L, U ) as censored.In other words, those with Y i ≤ L are treated as left-censored at L, and those with Y i ≥ U are treated as right-censored at U .The censored data may be denoted as The bounds L and U should satisfy P(L < Y < U ) > 0, P(Y ≤ L) > 0, and P(Y ≥ U ) > 0.
The variable Y i follows a mixture distribution.When Y i ∈ (L, U ), the distribution is continuous with the same cumulative distribution function as that for Y i ; that is, P( Then the nonparametric likelihood for the censored data is Since A(•) can be any non-decreasing function over the interval [L, U ), the likelihood (4) will be maximized when the increments of A(•) are concentrated at the observed Y i .Hence it suffices to consider only step functions with a jump at each distinct value of

Asymptotic Results
From now on we assume the outcome is continuous.Without loss of generality, we assume that in our models (1)-( 3), the support of Y contains 0, the vector Z contains an intercept and has p dimensions, and A(0) = 0. Furthermore, the bounds for censoring satisfy L < 0 and U > 0. To establish the asymptotic properties described below, we further assume Condition 2.1 G(x) is thrice-continuously differentiable, G (x) > 0 for any x, G (x)sign(x) < 0 for |x| ≥ M , where M > 0 is a constant, and Condition 2.2 The covariance matrix of Z is non-singular.In addition, Z and β are bounded so that β T Z ∈ [−m, m] almost surely for some large constant m.
Condition 1 imposes restrictions on G(x) at both tails; it holds for many residual distributions, including the standard normal distribution, the extreme value distribution and the logistic distribution.Conditions 2 and 3 are minimal assumptions for establishing asymptotic properties for linear transformation models.
Let ( β, A) denote the nonparametric maximum likelihood estimate of (β, A) that maximizes the likelihood (4) of the censored approach described in Section 2.2.Then A is a step function with a jump at each of the distinct Y i in the censored data.To establish the asymptotic properties for ( β, A), we consider A as a function over the closed interval [L, U ] by defining A(U ) = A(U −).We have the following consistency theorem.
Theorem 2.1 Under conditions 1 -3, with probability one, The proof of Theorem 1 is in Supplementary material.Core steps of the proof include showing that A is bounded in [L, U ] with probability one.Then, since A(•) is bounded and increasing in [L, U ], by the Helly selection theorem, for any subsequence, there exists a weakly convergent subsequence.
Thus, without confusion, we assume that A → A * weakly in [L, U ] and β → β * .We then show that with probability one, A * (y) = A(y) for y ∈ [L, U ] and β * = β.With this result, the consistency is established.Furthermore, since A is continuously differentiable, we conclude that A(y) converges to A(y) uniformly in [L, U ] with probability one.
We next describe the asymptotic distribution for ( β, A).The asymptotic distribution of A will be expressed as that of a random functional in a metric space.We first define some notation.Let BV [L, U ] be the set of all functions defined over [L, U ] for which the total variation is at most one.
Let lin(BV [L, U ]) be the set of all linear functionals over BV [L, U ]; that is, every element f in A metric over lin(BV [L, U ]) can then be derived subsequently.Given any non-decreasing function A over [L, U ], a corresponding linear functional in lin(BV [L, U ]), also denoted as A, can be defined such that for any h ∈ BV [L, U ], Similarly, for an nonparametric maximum likelihood estimate A, its corresponding linear functional The functional A is a random element in the metric space lin(BV [L, U ]).For any y ∈ (L, U ), For example, suppose the estimated jump sizes at the distinct outcome values of a dataset, {a 1 , . . ., a J }, are {ŝ 1 , . . ., ŝJ }.Then at y 0 > 0, , where h 0 (y) = I(0 < y ≤ y 0 ); and similarly, at Furthermore, the asymptotic variance of n 1/2 ( β − β) attains the semiparametric efficiency bound.
The proof of Theorem 2 is in the Supplementary material and makes use modern empirical process and semiparametric efficiency theory.Its proof relies on verifying all the technical conditions in the Master Z-Theorem in van der Vaart and Wellner (1996).In particular, it entails verification of the invertibility of the information operator for (β, A).
Because the information operator for (β, A) is invertible, the arguments given in Murphy and van der Vaart (2000) and Zeng and Lin (2006) imply that the asymptotic variance-covariance matrix of ( β, A[h 1 ], . . ., A[h m ]) for any h 1 , . . ., h m ∈ BV [L, U ] can be consistently estimated based on the information matrix for β and the jump sizes of A. Specifically, suppose the estimated jump sizes at the distinct outcome values of a dataset, {a 1 , . . ., a J }, are {ŝ 1 , . . ., ŝJ }.Let I n be the estimated information matrix for both β and {ŝ 1 , . . ., ŝJ }.Then the variance-covariance matrix for

Simulation Study
CPMs have been extensively simulated elsewhere to justify their use, and have been largely seen to have good behavior (Liu et al., 2017;Tian et al., 2020).Here we perform a more limited set of simulations to illustrate three major points which are particularly relevant for our study: 1. Estimation of A(y) using CPMs can be biased at extreme values of y.Even though A(y) may be consistent point-wise for any y, A(y) may not be uniformly consistent over all y ∈ (−∞, ∞).
2. In the censored approach, A(y) is uniformly consistent over y ∈ (L, U ).
3. Except for estimation of extreme quantiles and A(y) at extreme levels, results are largely similar between the uncensored and the censored approaches.

Simulation Set-up
CPMs have been extensively simulated elsewhere to justify their use, and have been largely seen to have good behavior (Liu et al., 2017;Tian et al., 2020).Here we perform a more limited set of simulations to illustrate three major points which are particularly relevant for our study: First, estimation of A(y) using CPMs can be biased at extreme values of y.Even though A(y) may be consistent point-wise for any y, A(y) may not be uniformly consistent over all y ∈ (−∞, ∞).
Second, in the censored approach, A(y) is uniformly consistent over y ∈ (L, U ). Third, except for estimation of extreme quantiles and A(y) at extreme levels, results are largely similar between the uncensored and the censored approaches.
We roughly followed the simulation settings of Liu et al. (2017).Let X 1 ∼ Bernoulli(0.5), , where β 1 = 1, β 2 = −0.5, and ∼ N (0, 1).In this set-up, the correct transformation function is A(y) = log(y).We generated datasets {(X 1 , X 2 , Y )} with sample sizes n = 100, 1000, and 5000.We fit CPMs that have correctly specified link function (probit) and model form (linear). (Performance of misspecified models was extensively studied via simulations in Liu et al., 2017.)In CPMs, the transformation A and the parameters (β 1 , β 2 ) were semi-parametrically estimated.We evaluated how well the transformation was estimated by comparing A(y) with the correct transformation, A(y) = log(y), for various values of y.
We fit CPMs on the original data without censoring and CPMs on the censored data with censoring at L and U , with [L, U ] being set to be [e −4 , e 4 ], [e −2 , e 2 ], and [e −1/2 , e 1/2 ]; these values correspond to approximately 0.2%, 13%, and 71% of Y being censored, respectively.All simulations had 1000 replications.

Simulation Results
Figure 1 shows the average estimate of A(y) across 1000 simulation replicates compared with the true transformation, log(y).The left, center, and right panels are results based on sample sizes of 100, 1000, and 5000, respectively.With uncensored data, for all sample sizes, estimates are unbiased when y is in the center of the distribution, approximately in the range [e −2 , e 3 ] when n = 100, in [e −3 , e 4 ] when n = 1000, and in a wider range when n = 5000.However, at extreme values of y we see biased estimation.This illustrates that for a fixed y, one can find a sample size large enough so that estimation of A(y) is unbiased, but that there will always be a more extreme value of y for which A(y) may be biased.This motivates the need to censor values outside    Not surprisingly, with increasing levels of censoring, β1 becomes slightly more variable (Table 1) and slightly less correlated with that estimated from uncensored data.The results for β 2 have similar patterns (Supplementary material Fig. 1).
Table 1 shows further results for five estimands: β 1 , β 2 , A(e 0.5 ), and the conditional median and mean of Y given X 1 = 0 and X 2 = 0.For each estimand, we compute the bias of the corresponding estimate, its standard deviation across replicates, mean of estimated standard errors, and mean squared error.For the estimands β 1 , β 2 , and A(e 0.5 ), estimation using uncensored data appears to be consistent, and the behavior of our estimators in the censored approach is as expected by the asymptotic theory.When n = 100 there appears to be only a modest amount of bias, even with 71% censoring; when n = 1000 and 5000 (shown in Supplementary material), bias is quite small.Although in Fig. 1 we saw that estimates of A(y) for extreme values of y were biased, we see no evidence that this impacts the estimation of β 1 and β 2 .The average standard errors are very similar to the empirical results (i.e., the standard deviation of parameter estimates across replicates), suggesting that we are correctly estimating standard errors.These results hold regardless of the amount of censoring in our simulations.With increasing levels of censoring, as expected, both absolute bias and standard deviation increase, and as a result, the mean squared error increases.However all these measures become smaller as the sample size increases.
We cannot compute the standard error for conditional median.Censoring also prohibits sound estimation of conditional mean; while one could instead estimate the trimmed conditional mean, e.g., ).The bias of A(y) for extreme values of y had little impact on the estimation of , which is computed using A(y) over the entire range of observed y. ratio tends to be right skewed (Fig. 3a), but there is no standard transformation for analyzing it.In various studies, it has been untransformed (Castilho et al., 2016), log-transformed (Sauter et al., 2016), dichotomized (CD4:CD8 > 1 vs. ≤ 1; Petoumenos et al., 2017), put into ordered categories roughly based on quantiles (Serrano-Villar et al., 2015), square-root transformed (Silva et al., 2018), and fifth-root transformed (Gras et al., 2019).In contrast, CPMs do not require specifying the transformation.
We fit three CPMs: Model 1 using the original data, Model 2 censoring all CD4:CD8 ratios below L = 0.1 and above U = 2.0, and Model 3 censoring below L = 0.2 and above U = 1.5.In a similar group of patients in a prior study (Serrano-Villar et al., 2014), these values of L and U were approximately the 1.5th and 99.5th percentiles, respectively, for Model 2, and the 7th and 95th percentiles for Model 3. In our dataset, there were 19 (0.9%) CD4:CD8 ratios below 0.1 and 21 (1%) above 2.0, and 156 (7.7%) below 0.2 and 74 (3.7%) above 1.5.In our models, age was modeled using restricted cubic splines with four knots at the 0.05, 0.35, 0.65, and 0.95 quantiles.
All models were fit using a logit link function; quantile-quantile plots of probability-scale residuals (Shepherd et al., 2016) from the models suggested good model fit (Supplementary material Fig. 2).All three models produced nearly identical results.Female sex had regression coefficients 0.6002, 0.6000, and 0.5994 in Models 1, 2, and 3, respectively (likelihood ratio p < 0.0001 in all models), suggesting that the odds of having a higher CD4:CD8 ratio, after controlling for all other variables in the model, were about e 0.6 = 1.82 times higher for females than for males (95% Wald confidence interval 1.44-2.31).The median CD4:CD8 ratio holding all other covariates fixed at their medians/modes was estimated to be 0.67 (0.60-0.74) for females compared to 0.53 (0.51-0.56) for males; all models had the same estimates to two decimal places.The mean CD4:CD8 ratio holding all other covariates constant was estimated to be 0.73 (0.67-0.79) for females and 0.61 (0.58-0.63) for males from Model 1.The mean estimates from Models 2 and 3 were slightly different (e.g., 0.72 for females); however, the mean should not be reported after censoring because the estimates arbitrarily assigned the censored values to be L and U .
Older age was strongly associated with a lower CD4:CD8 ratio (p < 0.0001 in all models), and the association was non-linear (p = 0.0080, 0.0081, 0.0086, respectively).Fig. 3b-d show the estimated median and mean CD4:CD8 ratio and the probability that CD4:CD8 > 1 as functions of age, all derived from the CPMs and holding other covariates fixed at their medians/modes.The median CD4:CD8 ratio and P(CD4:CD8 > 1) were not discernibly different between the three models.The mean as a function of age is only shown as derived from the uncensored Model 1.

Discussion
We have now established the asymptotic properties for censored CPMs, which are flexible semiparametric regression models for continuous outcomes because the outcome transformation is nonparametrically estimated.We proved uniform consistency of the estimated coefficients β and the estimated transformation function A over the uncensored interval [L, U ], and showed that their joint asymptotic distribution is a tight Gaussian process.We demonstrated that these estimators perform well with simulations and illustrated their use in practice with a real data example.
Establishing uniform consistency requires a bounded range of the transformation function A, which is achieved by censoring the outcome variable at both ends.Even if an outcome variable has a bounded support, the transformed values may not be bounded, and censoring will still be needed to establish uniform consistency.The proof of uniform consistency for β also required a bounded range of A even though β and A are separate components of the model.
Although the asymptotic properties for a similar nonparametric maximum likelihood approach in survival analysis have been established (Zeng and Lin, 2007), the proofs here for CPMs based on censored data are different because we consider the nonparametric maximum likelihood estimate for the transformation in CPMs rather than the cumulative hazards function in survival analysis.
In addition, the transformation is estimated in the proofs directionally and separately for the two tails, which also differs from prior work.
For data without natural lower and upper bounds, the choice of L and U might be challenging in practice.In our CD4:CD8 ratio analysis, we were able to select values of L and U that corresponded with small and large CD4:CD8 percentiles in a prior study, therefore likely ensuring that a small fraction of the data would be censored in our analysis.In general, it is desirable to choose bounds so that only a small fraction of the data are censored, although it should be reiterated that these bounds should be chosen prior to analysis.Both our simulations and data example suggest that results are robust to the specific choices of L and U as long as they do not severely censor data.
For example, in our simulations, results were nearly identical when censoring varied between 0.2% and 13%; in the data example, results were also nearly identical when censoring varied between 1.9% and 11.4%.
In addition, our simulations and data example actually suggest that without censoring, the estimators also perform well, which may support the use of uncensored CPMs in practice.Uncensored CPMs do not require specifying L and U , and they permit calculation of conditional means.However, the asymptotic theory presented here does not cover uncensored CPMs; hence, there might be some risk to analyses using uncensored CPMs.
Continuous data that are skewed or subject to detection limits are common in applied research.
Because of their ability to non-parametrically estimate a proper transformation, their robust rankbased nature, and their desirable properties proved and illustrated in this manuscript, CPMs are often an excellent choice for analyzing these types of data.Extensions of CPMs to more complicated settings, e.g., clustered and longitudinal data, multivariate outcome data, or data with multiple detection limits, are warranted and are areas of ongoing research.

Appendix
A.1 Proof of Theorem 1 Core steps of the proof: Let (β 0 , A 0 ) be the true value of (β, A), and ( β, A) be the NPMLE from the censored approach of the CPM.We will first prove that (I) A is bounded in [L, U ] with probability one.Since A(•) is bounded and increasing in [L, U ], by the Helly selection theorem, for any subsequence, there exists a weakly convergent subsequence.Thus, without confusion, we assume that A → A * weakly in [L, U ] and β → β * .We will then prove that (II) with probability one, A * (y) = A 0 (y) for y ∈ [L, U ] and β * = β 0 .With this result, the consistency is established.Furthermore, since A 0 is continuously differentiable, we conclude that A(y) converges to A 0 (y) uniformly in [L, U ] with probability one.
Proof of (I): Given a dataset of i.i.d.observations {(Y i , Z i )}, the nonparametric log-likelihood for the censored approach of the CPM is Here, P n denotes the empirical measure, i.e., We first show that lim sup Below we assume that is a Because of this, we differentiate l n (β, A) with respect to A{Y i } and then set it to zero to obtain the following equation: According to (C.1), G (x) is decreasing when x ≥ M .The left-hand side of (A.1) is For the right-hand side, we use the mean-value theorem on the denominator and then the decreasing , and this holds for any Y i between 0 and U and satisfying A(Y i −) > M + m.
Let i 0 be the maximal index i for which Y i > 0 and A(Y i 0 −) ≤ M + m.We sum over all Y i between 0 and U to obtain .
We now show that A(U ) cannot diverge to ∞.Otherwise, suppose that A(U ) → ∞ for some subsequence.From the second half of Condition (C.1), when n is large enough in the subsequence, for any Z, and therefore, in which the last term converges to a constant.We thus have a contradiction.Hence, lim sup A(U ) < ∞ with probability 1.
We can reverse the order of Y i (change Y i to −Y i so the NPMLE is equivalent to maximizing the likelihood function but instead of A(y), we consider −A(y)).The same arguments as above apply to conclude that lim sup − A(L) < ∞ with probability 1, or equivalently, lim inf A(L) > −∞ with probability 1.
Proof of (II): We first show that n A{Y i } is bounded for all Y i ∈ [L, U ]. From the proof above, we know We prove that this is true for any Y i .To do that, we define First, we note that H n (y) has a total bounded variation in [0, U ].In fact, for any 0 < t < s < U , where By choosing a subsequence, we assume that H n (y) converges weakly to H * (y).From the above inequality and taking limits, it is clear where Following this expression, we define another step function, denoted by A(y), whose jump size at Y i satisfies .
By the strong law of large numbers and monotonicity of A, it is straightforward to show A(y) The limit can be verified to be the same as A 0 (y).Furthermore, we notice . .
As a result, A * (y) = y 0 g(t)dA 0 (t), or equivalently, dA * (y)/dA 0 (y) = g(y). Define That is, We take limits on both sides.Using the Glivenko-Cantelli theorem to the first three two terms in the left-hand side and noting A{Y i }/ A{Y i } − g(Y i ) converges to zero uniformly, we obtain The left-hand side is the negative Kullback-Leibler information for the density with parameter (β * , A * ).Thus, the density function with parameter (β * , A * ) should be the same as the true density.Immediately, we obtain where Since ( β, A) maximizes l n (β, A), we have, for any v and h, The rest of the proof contains the following main steps: we first show that ( β, A) satisfies equation (A.6) (details below), and then (A.8) and finally (A.10), from which the asymptotic distribution of ( β, A) will be derived.

We know max L≤Y
and F 4 (Y, Z; β, A) = 1 + O p (n −1 ) uniformly in (Y, Z).Consequently, we obtain (A.7) On the other hand, we note that the first term in the right-hand side of (A.7) is zero if replacing ( β, A) by (β 0 , A 0 ).Thus, the right-hand side of (A.We perform the linearization to the first two terms in the above expression.After some algebra, we obtain that this expression is equivalent to where G −1 (•) serves as a link function.One example of the distribution for is the standard normal distribution, G(x) = Φ(x).In this case, the CPM becomes a normal linear model after a transformation, which includes log-linear models and linear models with a Box-Cox transformation as special cases.The CPM becomes a Cox proportional hazards model when follows the extreme value distribution, i.e.G(x) = 1 − exp(−e x ), or a proportional odds model when follows the logistic distribution, i.e.G(x) = e x /(1 + e x ).

Figure 2
Figure 2 compares estimates of β 1 for the various sample sizes using uncensored data and using data censored at the three ranges of [L, U ].As sample size becomes larger, β1 becomes less biased in all the approaches.At n = 5000, β1 is approximately unbiased even with severely censored data.
y) is bounded and increasing and β T Z, A(Y ) − β T Z belongs to a VC-hull so Donsker class.By the preservation property under the monotone transformation, G (k) ( A(U ) − β T Z), k = 0, 1, 2, also belongs to a Donsker class.Therefore, the right-hand side of (A.5) converges uniformly in Z) with probability one.From condition C.2, we conclude that β * = β 0 and A * (y) = A 0 (y) for y ∈ [L, U ]. A.2 Proof of Theorem 2 Let BV [L, U ] be the set of the functions over [L, U ] with h T V ≤ 1, where • T V denotes the total variation in [L, U ].For any ν ∈ R p with ν ≤ 1 and any h ∈ BV [L, U ], we define the score function Ψ n (β, A)[ν, h] along the submodel for β with tangent direction ν and for A with the tangent function • 0 h(t)dA(t):

Table 1 :
Simulation results for estimates from CPMs on uncensored data and those on data censored at [L, U ]; based on 1,000 replicates race, probable route of transmission, hepatitis C co-infection, hepatitis B co-infection, and year of antiretroviral therapy initiation.Here we re-analyze their data using CPMs.We will focus on the associations of CD4:CD8 ratio with age and sex, treating the other factors as covariates.CD4:CD8 The latter property ensures that H n (y) uniformly converges to H * (t) for t ∈ [0, U ].