Joint Model for Estimating the Asymmetric Distribution of Medical Costs Based on a History Process

: In this paper, we modify a semi-parameter estimation of the joint model for the mean medical cost function with time-dependent covariates to enable it to describe the nonlinear relationship between the longitudinal variable and time points by using polynomial approximation. The observation time points are discrete and not exactly the same for all subjects; in order to use all of the information, we ﬁrst estimate the mean medical cost at the same observed time points for all subjects, and then we weigh these values using the kernel method. Therefore, a smooth mean function of medical costs can be obtained. The proposed estimating method can be used for asymmetric distribution statistics. The consistency of the estimator is demonstrated by theoretical analysis. For the simulation study, we ﬁrst set up the values of parameters and non-parametric functions, and then we generated random samples for covariates and censored survival times. Finally, the longitudinal data of response variables could be produced based on the covariates and survival times. Then, numerical simulation experiments were conducted by using the proposed method and applying the JM package in R to the generated data. The estimated results for parameters and non-parametric functions were compared with different settings. Numerical results illustrate that the standard deviations of the parametric estimators decrease as the sample sizes increases and are much smaller than preassigned threshold value. The estimates of non-parametric functions in the model almost coincide with the true functions as shown in the ﬁgures of simulation study. We apply the proposed model to a real data set from a multicenter automatic deﬁbrillator implantation trial.


Introduction
Survival analysis consists in analyzing and inferring the occurrence times of the concerned events.Survival time represents the time from observation to the occurrence of the event (e.g., the patient's death time), which will be affected by random factors [1].The distribution characteristics of survival time are usually described by three functions: the probability density function, the survival function, and the hazard rate function.In the research on survival analysis, researchers need to use appropriate methods to evaluate the medical insurance-related expenditures of a specific patient group within a specific time frame.This type of data includes longitudinal data of repeated measurements of the sample, time-event censored data, and supplementary information from covariates.As we can see, the information from additional covariates and the censorship of time-event data complicate the estimation of mark variables.The Kaplan-Meier estimator [2] can solve the mark survival function if the historical process is constant before the event time, which leads to the proportional mark variable and to the terminal variable.
However, due to the censoring mechanism of 'inducing' information, the survival function is unidentifiable [3].In clinical treatment, more and more researchers realized that the patient's historical process is usually more informative than a simple endpoint measurement, which is called a mark variable in the point process theory; it is also called a 'cumulative variable of the historical process'.Lin et al. [4] considered the average total cost and minimized the bias from censoring by partitioning the entire time period into a series of small intervals, and the estimators were proven to be asymptotically normal.Huang and Lovato [5] formulated weighted log-rank statistics in a marked point process framework, and developed the asymptotic theory.
Weighted estimators for censoring data proposed by Bang and Tsiatis [6] performed well even when the survival data were heavily censored.Then, Zhao and Tsiatis [7] and Zhao and Tian [8] suggested the refinements of the weighted estimators.Furthermore, some researchers estimated the mean of the truncated mark variable indirectly.For instance, Korn [9] improved the Kaplan-Meier estimator by estimating the cumulative distribution function of an interpolated area under a quality-of-life curve.Strawderman [10] estimated the mean of the stopped longitudinal process and proved the asymptotic relative efficiency of the proposed estimators.They also proposed some new estimators with finite-sample optimality properties.
However, it should be noticed that the entire time period has to be partitioned into several small intervals to make the censored events occur at the boundaries, and the method proposed by Lin et al. [4] was applicable.The estimator derived by Bang and Tsiatis [6] works only if the observation is complete, and the refined versions are very complicated to compute.Fang et al. [11] derived an estimator of the mean mark variable with right censoring while the time-dependent/independent covariates are not considered.Liu et al. [12] presented a 'turn back time' method with time-dependent covariates to keep the actual censoring time for the survival data while the observations of cost data of time are censored before the actual censoring time.However, they did not consider repeated measurements of the longitudinal variable and the effects from time-independent covariates.
Correlation between repeated measurements for the individuals of marked variables over time and time-to-event variables needs to be considered jointly.Fortunately, Deng [13] combined the joint model technique [1] and inverse probability weighting method, and developed a novel approach to estimating the cumulative mean mark variable with timedependent covariates under right censoring.Deng [13] considered a mixed effect model for longitudinal data to examine the relationship between longitudinal data and other covariates, and an additive risk model for survival data to demonstrate the relationship between the longitudinal process and the hazard rate function.The proposed estimator for the state function from the subjects that are observed at discrete time points has been proven to be consistent.
However, there is little literature that considers the non-linear relationship in a joint model.Zhao et al. [14] characterized the relationship between the longitudinal variable and the observed time points with a latent variable and an unspecified link function to describe the joint model, but the estimation of the link function was not given.Li et al. [15] linked the multivariate sparse functional data to event-time data by a functional joint model.Do et al. [16] classified the under-representation group using a joint fairness model (JFM) approach for logistic regression models and proposed a joint modeling objective function to predict risk.Tang et al. [17] considered the multivariate longitudinal and bivariate correlated survival data and proposed the method of Bayesian penalized splines to approximate baseline hazard functions.In fact, the expected value of the longitudinal variable may depend on the time's non-linearly.Thus, such a relationship should be addressed by using the semi-parametric joint model.The baseline hazard function in the joint model can be seen as an infinite-dimensional parameter in the procedure of calculation.Generally, the estimation for the parameters in the joint model is practiced by the expectation maximization (EM) (refer to [18] for the specific algorithm).The drawback of the EM algorithm is that standard error (SE) estimates are not automatically produced.The estimation of SE in this paper is attained by calling "method = peicewise-PH-GH" in the JM package, where the baseline risk function is assumed to be piecewise constant.In this paper, we modify the linear model in the longitudinal sub-model in the traditional joint model into a semi-parametric model, in which the non-parametric part adopts the method of polynomial regression to make the estimates more accurate.In Section 2, we suggest a modified semi-parametric joint model and obtain the consistency of the proposed estimator.In Section 3, we show the feasibility of this method through numerical simulations.In Section 4, we apply the method established in this paper to a real data set from a MADIT.In Section 5, we conclude this paper with a discussion.

Description of Models
The target data involve the variables Y(t), X(t), Z(t), W, T, and δ.T is the terminal event time, and Y(t) is the patient's state history process, which is related to the vectors of the time-dependent covariates X(t), Z(t) and the time-independent covariate W. Furthermore, the observations of {Y(t), t 0} will be ceased by the terminal event T, Y(t) = 0 when t T.
Furthermore, L is the artificial end point according to the data situation.For convenience, let ν(t) = E[Y(t)] be the mean state function of Y(t), and τ ∈ [0, L] Γ(τ) = τ 0 Y(t)dt be the cumulative variable in the time period [0, τ], µ(τ) = E[Γ(τ)] and be the mean of the cumulative variable Γ(τ).Let S T (t) = P(T > t) be the survival function of T, S C (t) = P(C > t) and be the survival function of C, and S T * (t) = P(T * > t) be the survival function of T * = min{T, C}, δ = I(T ≤ C) and be the corresponding event indicator.Now, for the ith subject, the true event time is denoted by T i , i = 1, . . ., m, observed values of Y(t), Z(t), X(t), W at the time point t are denoted by y i (t), z i (t), x i (t) and w i , i = 1, 2, . . ., m, and the event indicator is denoted by We should notice that Y i (t) cannot be observed for all the time points, but at some special times t ij until the event times T * i .Thus, the measurements of the longitudinal data consist of the vectors y i = {y i (t ij ), j = 1, 2, . . ., n i }, x i = {x i (t ij ), j = 1, 2, . . ., n i } and z i = {z i (t ij ), j = 1, 2, . . ., n i } with t in i T * i .Now, we can describe the semi-parametric joint model for the data {(y i , x i , z i , w i , T * i , δ i ); i = 1, 2, . . ., m} as follows: where β is a fixed effects regression parameter, g(t) is an unknown smooth and bounded function, b i denotes the random effects coefficients with b i ∼ N(0, D), α is a fixed effects parameter that describes the correlation between longitudinal outcome and the hazard rate of event occurrence, h 0 (t) is the baseline hazards function, and i (t) is the random error that is assumed to be independent of T conditional on time-independent covariates W.

Proposed Estimator for the Parameter
It is assumed that g(t) is independent of ε i and b i .The existing approaches to approximate the unknown smooth function g(t) include the kernel method, wavelet-based methods, smoothing splines, regression splines, and so on.
We consider the polynomial basis regression splines method to express the nonparametric part.Now, for a given sequence 0 a linear combination f (t) of a set of power basis functions can be used to recover g(t): where: For convenience, Let θ = (θ t , θ y , θ T b ) be the vector of all parameters, where θ t = (α, γ) denotes the parameters related to the event-time outcome, θ y = (β, η, σ 2 ) are the parameters of the longitudinal outcomes, θ b = D are the parameters in the random-effects covariance matrix, q b is the dimensionality of the random-effects vector, and x is the Euclidean vector norm.Then, the log-likelihood contribution for the ith subject is given by: where: Now, the estimators β, α, γ, η, D, σ2 for the parameters β, α, γ, η, D, σ 2 can be derived, and the corresponding bi ∼ (0, D), ĝ(t) = ηB(t) can be obtained.

Remark 1.
The proposed semi-parametric model can handle the situation for the inferences of asymmetric distribution statistics.K interior knots describe the distribution with a kurtosis coefficient smaller than 0 well.Remark 2. The estimator β derived by maximum likelihood is consistent.Theorem 3.1 in Zeng and Cai [19] states the strong consistency of the maximum likelihood estimator.Remark 3. The estimator ĝ(t) is consistent in the case where g(t) is a smooth and bounded function.By Taylor expansion, we have that g The fitted values of Y(t) and h(t) can be estimated by the maximum likelihood method based on the observations.For convenience, we define the set are the observed distinct time points for all subjects, and N is the total number of the distinct observed time points from all subjects.
Using the inverse probability weighting method, the proposed estimator for the mean state function ν at any observed time point is as follows: where Ŷi (t (k) ) is the fitted value of Y i (t (k) ) at time point t (k) from the joint model: ) is a Kaplan-Meier estimator for S C (t (k) ), and ŜT (t (k) ) is the estimator of the survival function S T (t (k) ), which can be obtained from the joint model: Theoretically, the estimator ŜT (t) is related to x(s) for s ∈ [0, t].It can be obtained only when x(t) can be observed continuously.When x(t) are not continuously observed, we replace ŜT (t) with the Kaplan-Meier estimator.
The general values of ν(s) for s ∈ [0, t] cannot be observed continuously, and thus the estimator ν(t) cannot be obtained continuously.Therefore, we can use the kernel method to 'smooth' the estimator of the state function: where: is the kernel weight function that can be selected according to the actual situation.The bandwidth h scales the distance of t and t (k) .Then, the mean of the cumulative variable µ(t) for any time point t can be estimated as: Now, we introduce some notations in order to derive the consistency of the proposed estimator.Let Q(t) = {x(t), w} denote all the observed covariate history processes such as baseline information and observance time; LQ (t) = {Q(s) : s < t} denotes the longitudinal covariate history processes until time t, and LY (t) = {Y(s) : s < t} denotes the mark state response history processes until time t.• is the Euclidean norm in the real space.Q (t) is the derivative of Q(t) with respect to time t.The following assumptions should be imposed for the joint model.Assumption 1. Q(t) is fully observed prior to time t and is conditional on LQ (t), LY (t), and T t for any t ∈ [0, T]; the distribution of Q(t) depends only on LQ (t).Furthermore, Q(t) is continuously differentiable in [0, T], and max t∈[o,T] Q (t) < ∞ with a probability of one.
Assumption 2. The intensity of the counting process N c (t) depends only on LQ (t) and Q(t) conditional on LQ (t), LY (t), Q(t), and T t for any t < T. Assumption 3. P(X T X is full rank) is positive.Moreover, if there exists a constant vector R 0 such that with positive probability, (w, X(t)) T R 0 = l(t) for a deterministic function l(t) for all t ∈ [0, T], then R 0 = 0 and l(t) = 0. Assumption 6.There exists a positive constant a > 0 such that S T (L) a.
Now, we consider the asymptotic property of the estimator μ(t).
The proof can be found in Appendix A.

Simulation
In this section, we present some numerical results.The joint model used in our simulation study is: where Y(t) is the mark state function, is the vector of timedependent covariates for fixed parameters β = (β 1 , • • • , β p ), and p is the dimension of x(t).g(t) is a smooth and bounded function, z(t) = (1, z 1 (t)) is the vector of the time-dependent covariates for the random effects coefficients b = (b 0 , b 1 ), w is the timeindependent covariates for the fixed parameter γ, and α is the fixed parameter for the association of longitudinal outcome and the hazard rate of event occurrence.
The accuracy of the overall parametric estimates β, α, γ is evaluated using the bias and standard deviation (std.dev).We use g(t) = ηB(t) with η = (η 0 , η 1 , η 2 , η 3 ), B(t) = [1, t, t 2 , t 3 ] to simulate the smooth function.The performance of the overall non-parametric estimate ĝ(t) is evaluated by the sample standard deviation (std.dev) and the square-root of average square errors (RASE . We summarize the steps in the following Algorithm 1:

Algorithm 1: Estimation for the cumulative state function
Input: The sample size n, the covariate x, w, z(t), h 0 (t), w, the true function g(t), the true value of parameters β, α, γ, the censoring rate r.Process: Generate a random sample u i ∼ U[0, 1]; Generate a random sample x i ∼ N(µ, σ 2 ); Generate a lifetime random sample v i with the hazards function h(t); Generate a random censoring sample Output: The estimated function ĝ(t), the estimated value of parameters β, α, γ, the bias and the std.err of parameters β, α, γ, the RASE of estimated function, the std.err of η.Estimator: Give the estimation of parameters β, α, γ and the estimation of parameters in g(t) with the likelihood method.Now, we consider the following two scenarios: Scenario 1: Set x(t) = x 1 , x 1 ∼ N(2, 1.0), z(t) = t, g(t) = 1.5 t + 1.2 t 0.5 , w ∼ U[0, 1] and h 0 (t) = h 0 .The following is the form based on the joint model: (1+ t 30 ) 2 } + ( t 30 − 0.75) 0.5 , w ∼ U[0, 1] and h 0 (t) = h 0 .The following is the form based on the joint model: After 1000 repetitions in R, we obtain the results for the estimation of the parametric and non-parametric functions at sample sizes of n = 125, 250, and 500.In scenario 1, We set the censoring rate to be about 25%.The random effect b i follows a Gaussian distribution with ρ = −0.3,σ 1 = 0.3, and σ 2 = 0.2, and the error i follows i ∼ N(0, 1).The true values of parameters α, β, γ are chosen to be 0.25, −0.9, 1.0.In scenario 2, we set the censoring rate to be about 30%.The random effect b i follows a Gaussian distribution with ρ = −0.03873,σ 1 = 0.03873, and σ 2 = 0.02582, and the error i follows i ∼ N(0, 1).The true value of parameters α, β, γ is chosen by 0.5, −0.9, 1.0.
Tables 1 and 2 summarize the main findings over 1000 simulations.As the tables show, the biases are small and decrease as the sample size increases.The performance is empirical and improves with a larger sample size.
Tables 3 and 4 present the results for the estimation of g(t) of scenario 1 and scenario 2. The results show that rather large standard deviations of η 0 may be incurred, although the RASE of η is pretty small.As the sample size increases, such biases may attenuate.The proposed estimators of real functions of g(t) with two different settings are illustrated as a continuous curve in Figures 1 and 2. Additionally, Figures 3 and 4 show the true curves and fitted values of the state function v(t) and the mean of the cumulative variable µ(t) of scenario 1 and scenario 2 correspondingly.In all figures, the estimated functions approximate to the true functions.Thus, the polynomial basis regression splines method works well regardless of simple linear functions or complex exponential functions.

An Application to MADIT Data
In the previous discussion, we presented an efficient estimation for the state function from the subjects observed at discrete time points.Here, we validate our proposed method with MADIT data.A total of 181 patients from 36 centers in the USA enrolled in MADIT data.
Of the 181 patients, 89 people were implanted with cardiac defibrillators, while another 92 were not.For convenience, we encode them as ICD 1 or 0. The effect of treatment ICD is considered in the survival model because it did not directly produce any medical costs but it directly affected the life expectancy of the survival part.These patients have six types of costs: 1.
Hospitalization and emergency department visits; 2.
Outpatient tests and procedures; 3.
These costs can be incured daily from the start to the completion of the trial.It should be pointed out that R-code does not work for the daily cost data because it contains a large amount of data points.Hence, we used 12 days as a time unit.These covariates should be considered in the longitudinal process sub-model.We used the log-cost as the price process Y(t).
These data also include the patients' ID code, observed survival time in days, and death indicator.
In the data set, a total 181 patients have been encoded as follows: 1.
Daily cost of the whole trial; 6.
Cost type 1-6 of patients at all observed time points.Now, the merged data points consist of 11,321 observed points for a total of 181 subject.
We first used the JM package in R with the longitudinal data and survival data to obtain the estimated vector ( β0 , . . ., β6 ) for the fixed coefficients, the estimated parameter α for the association parameter, the estimated parameter γ for the time-independent regression parameter, and the estimated vector ( η0 , . . ., η3 ) for the non-parametric function g(t).The estimated mean daily medical cost ν(t) at observed time points can be figured out in terms of the proposed model in Equation (10), and then a smooth function for ν(t) can be obtained using the kernel method.Finally, the cumulative mean medical cost µ(t) can be calculated by integrating ν(t).
The estimated coefficients for the parameters and function are presented in Tables 5  and 6.As one can see, all estimated parameters are significantly even at a level of 0.0001.Table 7 shows the estimated results of the event process.The association between survival and hazard rate is positive, which means that higher medical costs correspond to lower survival rates, and as we know, a patient needs higher medical costs if they are severely ill.This further demonstrates that our model is efficient and reasonable.Parametric regression estimation for the joint model [13] describes the relationship between the natural logarithm of medical costs and the time unit as the fixed coefficient In this case, β 7 turns out to be −0.001389,which is negative.Compared with this, the non-parametric part in the proposed model can indicate such a relationship as a precise descent function as Figure 5 shows, and not just a negative correlation.Moreover, the function decreases over time, which is consistent with the result in [13] and further shows that the model we proposed is feasible.
Compared with the Kaplan-Meier estimator, the estimated marginal survival function of the survival time T is smoother, as Figure 6 shows.The solid line is the estimated survival rate, and the dotted line represents the 95% confidence interval of the estimated survival rate.
Figure 7 illustrates the fitted points for the mean medical cost.Figure 8 gives the fitted points for the cumulative mean medical cost, which increases linearly with unit time.Over time, the patient's cumulative medical cost increases.The estimates suggest that this increase is almost linear and show the rate of increase in medical costs per unit time.Table 8 presents the estimated cumulative cost for 5 years and the total treatment period.In practical situations, the treatment a patient was going to select could not be prognosticated.The results can provide government agencies or insurance companies with a statistical decision recommendation.

Discussion
The history process is closely related to the quality-adjusted life of patients.The academic community has carried out some work on the estimation of medical costs using the joint model, and it has paid no attention to the complex nonlinear relationships that may exist between longitudinal data and covariates.There is not much discussion of the method of estimating the cumulative mean state function.
Therefore, this paper examines the existing estimation method, and then introduces the novel semi-parametric estimation based on a joint-model technique to assist government or insurance companies in better prediction for average cost and survival probability.At first, the fitted values Ŷ(t ij ) for the state process Y(t) at discrete times are obtained by using a semi-parametric model, and the mean medical cost ν(t) can be estimated at the observed time points t ij , accordingly using the inverse probability weighting method.Then, using the kernel function method, the smooth function of ν(t) can be derived.Finally, for any time points, the estimate of the cumulative mean state function µ(t) can be obtained because ν(t) can be calculated at any time point.
Although the results for the estimation of the longitudinal process in this paper demonstrate that the proposed method performs well, the estimation of the survival process is not satisfactory.The biases of the time-independent covariate coefficients are considerably large, and the performance is only improved when the sample size is large enough.Meanwhile, although the standard deviation decreases as the sample size increases, it is still not very small.In the future, researchers can pay attention to the modeling of the survival process by considering the nonlinear survival function or the time-varying association coefficient.
To sum up, the proposed estimator is more robust because the non-parametric part in the longitudinal process is designed to compare with the traditional models and the kernel function method is added to the estimation of the cumulative mean state function.The consistent property of the proposed estimator and the numerical studies are quite supportive of such a conclusion.We have assumed that (t) is the independent of the T conditional on X(t) and W, and thus we have that Ỹ(t) = X(t)β + g(t) + (t) is independent of terminal time T, and E Ỹ(t) = EY(t).Moreover, we have: (A4) From Assumption (a1), x i (t) R 0 for some positive constants R 0 , and, based on Theorem 3.1 in Zeng and Cai [19] and Assumption (a6), we have: (IV) ≡(V I I) + (V I I I).
This completes the proof of Theorem 1.

Figure 3 .
Figure 3. True values and estimated values of estimated function of v(t) and µ(t) for scenario 1.

Figure 4 .
Figure 4. True values and estimated values of estimated function of v(t) and µ(t) for scenario 2.

Figure 6 .
Figure 6.Marginal survival compared with the Kaplan-Meier estimator.

Figure 8 .
Figure 8. Fitted points of cumulative cost.
show that (I), (I I), and (I I I), converge to 0, as m → ∞.

Table 1 .
Theestimates of parameters in the joint model for scenario 1.

Table 2 .
The estimates of parameters in the joint model for scenario 2.

Table 3 .
The estimates of non-parameters in the joint model for scenario 1.

Table 4 .
The estimates of non-parameters in the joint model for scenario 2.

Table 5 .
Estimate results of the longitudinal process.

Table 6 .
Estimate results of g(t).

Table 7 .
Estimate results of the event process.

Table 8 .
Estimated cumulative costs for 5 years and total period.