Bayesian Analysis of Tweedie Compound Poisson Partial Linear Mixed Models with Nonignorable Missing Response and Covariates

Under the Bayesian framework, this study proposes a Tweedie compound Poisson partial linear mixed model on the basis of Bayesian P-spline approximation to nonparametric function for longitudinal semicontinuous data in the presence of nonignorable missing covariates and responses. The logistic regression model is simultaneously used to specify the missing response and covariate mechanisms. A hybrid algorithm combining the Gibbs sampler and the Metropolis–Hastings algorithm is employed to produce the joint Bayesian estimates of unknown parameters and random effects as well as nonparametric function. Several simulation studies and a real example relating to the osteoarthritis initiative data are presented to illustrate the proposed methodologies.


Introduction
Semicontinuous data, characterized by nonnegative continuous value with a discrete mass of zero, appear frequently in many fields, such as medicine, health, economics, and ecology. Models for longitudinal semicontinuous data have, in particular, been receiving a lot of attention in two ways. The first approach is the two-part mixed model wherein a mixture of Bernoulli with positive support distribution is used to model zero and positive components separately (Olsen and Schafer [1]; Berk and Lachenbruch [2]; Tooze et al. [3]; Su et al. [4,5]; Liu et al. [6]; Zhou et al. [7]). However, Hasan et al. [8] and Yan and Ma [9] pointed out that such artificial separation based on the two-part modeling method breaks down the serial patterns in the analysis of time series and longitudinal data. The second approach is the compound Poisson mixed model for modelling longitudinal and repeated measurement or cluster data in an integral way. For example, Zhang [10] investigated several statistical inference methods for Tweedie compound Poisson linear mixed models from the frequentist and Bayesian perspective. Swallow et al. [11] developed a Bayesian hierarchical Tweedie regression model by incorporating serial temporal and spatial correlation into the Tweedie distribution in the analysis of longitudinal semicontinuous ecological data. Ye et al. [12] investigated the sensitivity analysis for priors in Tweedie compound Poisson random effect models under a Bayesian framework. In particular, Yan and Ma [9] incorporated serially dependent distribution-free random effects into the compound Poisson regression model for longitudinal semicontinuous data. However, all the abovementioned compound Poisson mixed models have limitations in that they either do not consider nonlinear smooth effects of covariates, such as time and age variables, or do not deal with missing responses and covariates.
It is well known that handling missing data has become an active research field in data analysis. Many methods have been proposed to make statistical inference on various regression models with nonignorable missing response or covariates. For example, Ibrahim et al. [13,14] proposed two methods by which to estimate unknown parameters in generalized linear models with nonignorable missing covariates and generalized linear mixed models with nonignorable missing responses by using the EM algorithm, respectively. In addition, based on these frequentist approaches of handling nonignorable missing response or covariate data, their Bayesian analogues have been extended to various regression models. For example, from a Bayesian perspective, see Huang et al. [15] for generalized linear models with nonignorably missing covariates, Lee and Tang [16] for nonlinear structural equation models with nonignorable missing data, Tang and Zhao [17] for nonlinear reproductive dispersion mixed models for longitudinal data with nonignorable missing covariates, Tang et al. [18] for a nonlinear dynamic factor analysis model with nonparametric prior and possible nonignorable missingness, Zhou et al. [7] for two-part hidden Markov models for semicontinuous longitudinal data with nonignorable missing covariates, Wang and Tang [19] for Bayesian quantile regression with mixed discrete and nonignorable missing covariates, and Wang et al. [20] for Bayesian latent factor on image regression with nonignorable missing data. Therefore, we propose a fully Bayesian method by which to simultaneously estimate unknown parameters, random effects and nonparametric function in a Tweedie compound Poisson partial linear mixed models on the basis of Bayesian P-spline approximation to nonparametric function in the presence of nonignorable missing covariates and responses, where the nonignorable missing data mechanism is specified by a logistic regression model.
For the sake of brevity and readability, we first introduce the main mathematical symbols and their descriptions in the rest of paper summarized in Table 1. Table 1. Symbols and description.

U
A Poisson distribution random variable µ The mean parameter of Tweedie compound Poisson distribution φ The dispersion parameter of Tweedie compound Poisson distribution p The power parameter of Tweedie compound Poisson distribution β A q × 1 vector of unknown regression parameter b i A d × 1 vector of random effect (b i ∼ N d (0, Σ)) Σ A d × d covariance matrix g(·) An unknown nonparametric function ξ An The B-spline basis function τ 2 ξ A global smoothing parameter Q The H × H penalty matrix with elements δ = (δ 2 , . . . , Vector of covariates relating to random effects (Z = {z ij : i = 1, . . . , n, j = 1, . . . n i }) T Vector of time effects (T = {t ij : i = 1, . . . , n, j = 1, . . . , n i }) r Vector of indicator variables relating to missing data mechanism (r = {r y , r x }) ϕ Vector of parameters relating to missing data mechanism (ϕ = {ϕ x , ϕ y }) α The parameter in the covariables' distribution (α = (α 1 , . . . , α m )) σ 2 m The parameter in the covariables' distribution (σ 2 m = {σ 2 xk : k = 1, . . . , m}) The paper is organized as follows. In Section 2, we give a description of the data. In Section 3, we describe a Tweedie compound Poisson partial linear mixed models in the presence of nonignorable missing covariates and responses. We present the Bayesian P-spline to model the nonparametric function. The logistic regression model is simultaneously used to specify the missing response and covariate mechanisms, and a sequence of one-dimensional conditional distributions is used to model the joint probability function of the missing covariates. In Section 4, the prior distributions and posterior distributions of unknown parameters and latent variables are presented. In Section 5, two simulation studies and an example are given to illustrate our proposed methodologies. In Section 6, we give some conclusions. In the Appendixes A and B, the conditional distributions for Gibbs sampling and the Metropolis-Hastings algorithm are given.

Data Description
In this section, we describe the Osteoarthritis Initiative (OAI) database, which is available at https://www.oai.ucsf.edu (accessed on 4 April 2017). The OAI cohort study investigated the causes of knee osteoarthritis for 4796 patients aged 45 and older, and collected some information such as age, sex, and body mass index (BMI) for these patients at baseline, 12 months, 24 months, 36 months, and 48 months. Thus, this information is collected at most five times because of the missing data involved. In addition, this OAI study adopted the Western Ontario and McMaster Universities Arthritis Index (WOMAC) disability scores to assess the pain intensity in these patients with hip and/or knee osteoarthritis. Higher scores on the WOMAC score indicate worse pain, stiffness, and functional limitations for these patients. A sample of two patients (denoted by ID 9019406 and ID 9025191) from the OAI study is presented in Table 2. The missing rates for the longitudinal WOMAC scores outcome at baseline, 12 months, 24 months, 36 months, 48 months are 0.3%, 7.1%, 10.9%, 12.1%, and 12.3%, respectively. Moreover, the missing rates for covariate BMI at five different time points are 0%, 11.5%, 16.0%, 18.9%, and 21%, respectively. It can be seen from Figure 1 that the observed WOMAC numeric score at 12 months, 24 months, 36 months, and 48 months are right-skewed with a large numerical proportion of zeros, where the bold line on the left of each histogram denotes the frequency for zero. Specifically, more than 36.9% of the observations of all time points are zeros; thus we consider the WOMAC numeric score as a longitudinal semicontinuous response with missing data in this article.

Tweedie Compound Poisson Distribution
As in Ma and Jørgensen [21], the probability density function of the Tweedie compound Poisson distribution has the following form, where p is the power parameter satisfying 1 < p < 2, µ and φ are the mean parameter and dispersion parameter, respectively, and the expression for c p (y; φ) is not analytically tractable when y > 0. If a nonnegative random variable Y is distributed as a Tweedie compound Poisson distribution, then we simply denote Y ∼ Tw p (µ, φ) in the rest of paper. Moreover, we have E(Y) = µ and Var(Y) = φµ p . Furthermore, the random number Y of the Tweedie compound Poisson distribution is readily generated from the following stochastic representation where U is distributed as a Poisson distribution with mean λ, X i is the independent and identically distributed gamma distribution with mean αγ and variance αγ 2 , and U and X i are assumed to be independent. After some calculations, the relationship between the two sets of parameters in Equations (1) and (2) are derived as It follows from Equation (2) that the joint probability distribution of (Y, U) is given by Thus, the marginal distribution of (Y, U) has the abovementioned form given in Equation (1).

The Model
For modeling, we first introduce some notations. Let y ij be the longitudinal semicontinuous outcome with missing data of the ith patient with osteoarthritis measured at time t ij (i = 1, . . . , n, j = 1, . . . , n i ). In the OAI study, n = 4796 is the number of patients with n i = 5 denoting the number of repeated observations per patient. Given random effects b i , Y i1 , . . . , Y in i are conditionally independent and each Y ij |b i is assumed to be the Tweedie compound Poisson distribution, that is where µ ij is the conditional expectation of the response Y ij , φ is the dispersion parameter to be estimated and 1 < p < 2. Inspired by GLMM method, the conditional expectation µ ij is modeled by where β is a q × 1 vector of unknown regression parameter of interest, x ij is a q × 1 vector of covariates in the presence of missing data, b i is distributed as N r (0, Σ), z ij is a r × 1 vector of covariates relating to the random effects b i , and g(t ij ) denotes an unknown nonparametric function satisfying the twice-differentiable property in term of time effects t ij . In this article, the model defined in Equations (1) and (2) is referred to as a Tweedie compound Poisson partial linear mixed model. Inspired by Lang and Brezger [22], we used the Bayesian P-spline method based on a linear combination of B-spline basic functions to approximate the unknown nonparametric function, that is where B h (·) is the hth B-spline basis function, H is the number of B-spline basis function, and ξ h is the B-spline coefficients to be estimated. Under the Bayesian framework, ξ h is treated as a random variable, and defined by the following first-order random walk; that is, . . , H and the diffuse prior ξ 1 is proportional to constant. The variance parameter τ 2 ξ is viewed as a global smoothing parameter. Although it is easy to estimate the global smoothing parameter, this global smoothing parameter is difficult to characterize in terms of the highly oscillating features for the underlying nonparametric functions g(t). To overcome this issue, we introduce the additional hyperparameters δ h as local smoothing parameters, which can improve the estimation of a function with significantly different curvatures at different points t ij . Thus, υ h is assumed to be the normal distribution with heterogeneous variance; that is, υ h ∼ N(0, τ 2 ξ /δ h ) for h = 2, . . . , H. Furthermore, let ξ = (ξ 1 , . . . , ξ H ) T and δ = (δ 2 , . . . , δ H ) T . The prior distribution for ξ is derived in the matrix form where the penalty matrix Q is given by Here, the prior distribution of smooth parameter τ 2 ξ is distributed as an inverse gamma distribution; that is,

Missing Data Mechanism Assumptions
In this article, let y i = (y i1 , . . . , y in i ) T be a n i × 1 vector of response (i = 1, . . . , n), and x ij be a q × 1 vector of covariates in the presence of missing data, respectively, whereas z ij are completely observed. In what follows, we assume that the missing data mechanism for response and covariates are nonignorable. Let y i = (y T oi , y T mi ) T and x ij = (x T oij , x T mij ) T , where y oi (n 1i × 1) and y mi (n 2i × 1) are vectors of the observed and missing components of responses in y i satisfying n 1i + n 2i = n i , respectively; x oij (q 1ij × 1) and x mij (q 2ij × 1) are vectors of the observed and missing covariate in x ij satisfying q 1ij + q 2ij = q, respectively. Let r yi = (r yi1 , . . . , r y in i ) T be an indicator variable which indicates whether y i = (y i1 , . . . , y in i ) T is missing; that is, Inspired by Ibrahim et al. [14], it is common to specify a Bernoulli distribution for the following nonignorable missing mechanism. Thus, given y i and unknown parameter ϕ y , the conditional probability function of r yij is distributed as where Pr(r yij = 1|y i , ϕ y ) is specified by a logistic regression model, in which logit Pr(r yij = 1|y i , ϕ y ) = log Pr(r yij =1|y i ,ϕ y ) Similarly, let r xij = (r xij1 , . . . , r xijq ) T be an indicator variable, which indicates whether x ij is missing, and each r xijk is defined as follows: For conditional probability density Pr(r xij |x ij , ϕ x ), we consider the following nonignorable data mechanisms, where ϕ xk = (ϕ xk0 , ϕ xk1 , . . . , ϕ xk,2k−1 ) T .
In this article, we consider the following other type of the nonignorable missing mechanism for response and covariates. Specifically, in the first type, the nonignorable missing mechanism for response is specified by a logistic regression model, For missing covariate, Pr(r xijk |x ij1 , . . . , x ijk , y ij , ϕ xk ) is given by a logistic regression model, where In what follows, we assume that the covariate x ij = (x ij1 , x ij2 , . . . , x ijq ) T is continuous, and there is missingness in the first m dimension and complete observation in the rest q − m dimension. According to Ibrahim et al. [13], the joint probability function of the missing covariates is simplied by a sequence of one-dimensional conditional distributions as follows, where i = 1, . . . , n and j = 1, . . . , n i , x 0 = (x ij,m+1 , x ij,m+2 , . . . , x ijq ), α = (α 1 , α 2 , . . . , α m ).
Here, covariates x 0 do not need to be modelled because they are always observed. In addition, continuous missing covariates are generally assumed to follow the normal distribution. For example, where mean parameter µ ijk is given by Here, α k = (α k,0 , α k1 , . . . , α k,q−1 ).

Bayesian Inference
To investigate the Bayesian inference on parameters of interest, we first introduce the following notations. Let Y o = {y o1 , . . . , y on } and Y m = {y m1 , . . . , y mn } be the sets of observed and missing values of response variables, respectively. Similarly, X o = x o11 , . . . , x o1n i , . . . , x on1 , . . . , x onn n and X m = x m11 , . . . , x m1n i , . . . , x mn1 , . . . , x mnn n are the sets of observed and missing values corresponding to covariates, respectively. Let U = {u ij : i = 1, . . . , n, j = 1, . . . , n i } denote the latent variable. Let b = {b 1 , . . . , b n } and Z = {z ij : i = 1, . . . , n, j = 1, . . . , n i } denote the vector of random effects and the vector of covariates relating to random effects. Let T = {t ij : i = 1, . . . , n, j = 1, . . . , n i } be the vector of time effects relating to the nonparametric part. Denote the vector of indicator variables and parameters relating to missing data mechanism by r = {r y , r x } and ϕ = {ϕ x ,ϕ y }, where r y = {r y1 , . . . , r yn } and r x = {r x11 , . . . , r x1n 1 , . . . , r xn1 , . . . , r xnn n }. On the whole, let where Clearly, it is difficult to generate the random sample from the posterior distribution p(θ, α, σ 2 m |Y o , X o , Z, T, r) because Equation (14) has high-dimensional integration. Thus, inspired by the data augmentation method (Tanner and Wong [23]), we adopt the following posterior distribution, p(θ, α, σ 2 m |Y, X, T, Z, r, b, U), to solve the high-dimensional integration issue. Meanwhile, it is easy to generate the random sample from p(Y m , X m , b, U, θ, α, σ 2 m |Y o , X o , Z, T, r) via the Gibbs sampler (Geman and Geman [24]). That is, random samples Y m , X m , U, b, θ, α, σ 2 m are iteratively generated by means of the following conditional dis- To derive the abovementioned conditional distributions, we adopt the following joint logarithmic likelihood function of (Y, U, b) Moreover, the prior distributions of β, p, φ, Σ, ξ, τ 2 ξ , δ ρ , ϕ, α and σ 2 xk are given by IG is the inverse gamma distribution, and logit(a) = log a 1−a . As for the choices of hyperparameters with regard to the Bayesian P-spline method, Lang and Brezger [22] pointed out that a τ = 1 and a small value for b τ for example, b τ = 0.005 or b τ = 0.0005, leading to an almost diffuse prior for τ 2 ξ . Moreover, the hyperparameters a δ and b δ are simultaneously taken to be 0.5, which can characterize the highly oscillating features for some nonparametric functions. As for the power parameter p, Ye et al. [12] adopted the following priors to conduct the sensitivity analysis: p ∼ Uniform(1, 2), logit(p − 1) ∼ N(0, 100), p − 1 ∼ Beta(0.1, 0.1) and p − 1 ∼ Beta(0.01, 0.01). As a result, Ye et al. [12] chose the logit(p − 1) ∼ N(0, 100) prior as the optimal for p in the Tweedie compound Poisson distribution based on the sensitivity analysis. The choices of hyperparameters for other prior distributions are discussed in Section 5. The conditional distributions, Gibbs sampling and Metropolis-Hastings algorithm are shown in the Appendixes A and B.
Similarly, the consistency estimates of the posterior covariance matrix var(β, φ, p, Σ, α, ϕ y , ϕ x , σ 2 xk |Y, X, Z, r, b) for parameters β, φ, p, Σ, α,ϕ y ,ϕ x and σ 2 xk can be obtained from the sample covariance matrix of their random samples. For example, the posterior covariance matrix var(β|Y, X, Z, r, b) can be consistently estimated by In addition, the corresponding standard deviation can be estimated by the diagonal elements of the sample covariance matrix of the random sample sequence.

Numerical Examples
In this section, two simulation studies and a real example relating to the OAI data are conducted to investigate the performance of our proposed Bayesian methodologies.
To investigate the effect of different prior information on the Bayesian estimate for unknown parameters, three types of prior information are considered as follows.
Type II: The hyperparameters β 0 , ϕ 0 y , ϕ 0 x1 , ϕ 0 x2 , α 0 x1 and α 0 x2 are taken to be 2 times truth values corresponding to their parameters; A, B, C x1 , C x2 , D x1 , and D x2 are taken to be 0.75I 3 , 0.75I 3 , 0.75I 2 , 0.75I 4 , while other hyperparameters are taken to be the same as those given in Type I. This scenario is viewed as an inaccurate prior information.
For each of the above-generated 50 datasets, the hybrid algorithm combining the block Gibbs sampler and Metropolis-Hastings algorithm is used to produce the joint Bayesian estimates of unknown parameters, random effects, and nonparametric function. To ensure the convergence of the hybrid algorithm for each replication, we collected 5000 observations after 5000 iterations to calculate Bayesian estimates, which are reported in Table 3, where "Bias" is the difference between the mean value of parameters obtained from 50 replication and the truth value, "SD" is the standard deviation of the estimates on the basis of 50 replications, and "RMS" is the root mean square between the estimates on the basis of 50 replications and its true value. It can be seen from Table 3 that (i) Bayesian estimates for unknown parameters were reasonably accurate in our considered three different prior information because all Bias values are less than 0.1, and (ii) the estimated values of SD and RMS are less than 0.5 and there is little difference between these two estimated values regardless of any priors. Thus, Bayesian estimates are not sensitive to our considered three prior pieces of information. In addition, examination of Figure 2 indicated that our proposed Bayesian P-spline method to approximate nonparametric function is validated to be feasible because the estimated curves of nonparametric function g(t) matched well with the true curve in our considered simulation studies.  In the second simulation study, the simulated setup is the same as the first simulation study except for the missing mechanism. Here, the other nonignorable missing mechanism model for y ij , x ij1 and x ij2 is given by where the true values of unknown parameters are taken to be ϕ y = (ϕ y0 , ϕ y1 , ϕ y2 , ϕ y3 ) T = (−2.2, 0.1, 0.1, 0.1) T , ϕ x1 = (ϕ x10 , ϕ x11 , ϕ x12 ) T = (−2.5, 0.2, 0.1) T , ϕ x2 = (ϕ x20 , ϕ x21 , ϕ x22 , andϕ x23 ) T = (−1.9, 0.05, 0.05, −0.3) T . The average proportion of missing data for x ij1 , x ij2 and y ij is 10%, 11%, and 8.5%, respectively. Similar to the first simulation study, we also considered three different prior pieces of information for their corresponding hyperparameters. These findings in Table 4 and Figure 3 show that (i) all Bias values corresponding to unknown parameters are less than 0.1 except that the Bias value for ϕ x10 is −0.1013 under the Type II prior and the Bias values for ϕ y0 and ϕ x23 are −0.1102 and −0.1029 under the Type III prior, respectively, and (ii) the estimated curves of nonparametric function g(t) also matched well with the true curve regardless of three different priors. Clearly, Bayesian estimates under the first two priors are better than those obtained from the Type III prior. All in all, our proposed Bayesian approach is feasible in our considered missing mechanism.

Real Example
In this section, the application of our proposed semiparametric Bayesian approach is illustrated by the analysis of longitudinal semicontinuous data from the OAI, which was discussed in Section 2. The OAI longitudinal data were analyzed by various approaches, such as Chen and Wehrly [25,26]. However, these authors only considered the observed data by reducing 4796 patients to 1499 patients and assumed the log transformation of the WOMAC score plus 1 to approximate the normal distribution. In this study, our scientific interest is to link the covariates, such as age, sex, and BMI with the outcome WOMAC score while accounting for nonignorable missing response with a point mass at zero and covariates data. In addition, we viewed age as the individual-level covariate modeled nonparametrically with the other covariate variables modeled parametrically. Let the outcome Y ij represent the WOMAC numeric score for the right knees of the ith (i = 1, 2, . . . , 4796) patient recorded at the jth time point (j = 1, 2, . . . , 5 corresponding to 0, 12, 24, 36, and 48 months). As discussed in Section 2, we regarded the WOMAC numerical score as a longitudinal semicontinuous outcome in this real example.
Here, given random effects b i , Y ij |b i follows the Tweedie compound Poisson distribution; that is Y ij |b i ∼ Tw p (µ ij , φ). The conditional mean µ ij is simultaneously linked to covariates, random effects, and nonparametric function as follows, where the covariates SEX ij (1 for male or 2 for female) and AGE ij are completely observed, while the outcome Y ij and covariate BMI ij are missing and their corresponding missing rates are 8.5% and 13.5%, respectively. Furthermore, we consider the following missing data mechanisms for covariate BMI ij and outcome Y ij , In addition, we assume that the missing distribution for covariate BMI ij follows the normal distribution N(α 10 + α 11 SEX ij , σ 2 BMI ) and random effect b i is distributed as the normal distribution N(0, Σ). Bayesian estimates of unknown parameters and their corresponding standard error as well as the nonparametric function are displayed in Table 5 and Figure 4. Table 5 indicates that the covariates BMI and Sex have the positive significant effect on the WOMAC score at the significance level of 0.05. The result shows that the WOMAC score increases as BMI increases. The higher the BMI score a patient has, the greater intensity of knee osteoarthritis the patient will suffer. The positive significant effect of the covariate Sex on the WOMAC score indicates that the average WOMAC score for females are higher compared with males. Women are more vulnerable to greater intensity than men. Chen and Wehrly [25,26] assumed a linear age effect on the WOMAC score parametrically, but an insignificant effect on the WOMAC score are presented in their studies. It appears from Figure 4 that the Bayesian estimates of nonparametric function g(AGE ij ) based on the P-spline method has a significant nonlinear trend. Specifically, there was a sharp decrease from age 45 to approximately age 49 and from age 60 to approximately age 73, respectively. Moreover, stabilization seems to have started at age 73. In the missing mechanism model, we found that the Bayesian estimates of unknown parameters ϕ BMI1 and ϕ Y2 significantly deviated from zero. Thus, it is reasonable to incorporate the missing data into our proposed semiparametric Bayesian model in the analysis of OAI dataset because missing data mechanisms for Y ij and BMI ij are nonignorable.  Figure 4. Nonparametric estimate of effects of age on the WOMAC numeric score in the OAI dataset.

Conclusions
In this paper, we have introduced a new Tweedie compound Poisson partial linear mixed model with nonignorable missing covariates and responses by assuming that the random effect is distributed as a multivariate normal distribution and the nonparametric function is modelled by the Bayesian P-splines simultaneously. The logistic regression model is simultaneously used to model the missing response and covariate mechanisms. This article has the following contributions: (i) our proposed Bayesian semiparametric mixed effects model can model both zero and positive components of the longitudinal semicontinuous data in an integral way while accounting for the nonignorable missing responses and covariates simultaneously; (ii) our proposed partial linear mixed models based on Bayesian P-spline can characterize the nonlinear smooth effects of covariate in the analysis of longitudinal semicontinuous data; (iii) the conditional distributions for the Gibbs sampling algorithm and Metropolis-Hastings algorithm of our proposed model are derived; and (iv) two simulation studies and a real example are used to illustrate the effectiveness and feasibility of our several considered missing mechanisms.

Acknowledgments:
We are grateful to Zhixian Yang for careful English editing during the preparation of the revision.

Conflicts of Interest:
The authors declare no conflict of interest.

Appendix A. Conditional Distributions for the Gibbs Sampling Algorithm
First, the conditional distribution of the missing part used in the Gibbs sampling algorithm is as follows.