Bayesian Variable Selection and Estimation in Semiparametric Simplex Mixed-Effects Models with Longitudinal Proportional Data

In the development of simplex mixed-effects models, random effects in these mixed-effects models are generally distributed in normal distribution. The normality assumption may be violated in an analysis of skewed and multimodal longitudinal data. In this paper, we adopt the centered Dirichlet process mixture model (CDPMM) to specify the random effects in the simplex mixed-effects models. Combining the block Gibbs sampler and the Metropolis–Hastings algorithm, we extend a Bayesian Lasso (BLasso) to simultaneously estimate unknown parameters of interest and select important covariates with nonzero effects in semiparametric simplex mixed-effects models. Several simulation studies and a real example are employed to illustrate the proposed methodologies.


Introduction
Various mixed-effects models based on simplex distribution have increasingly become popular tools in the analysis of longitudinal continuous proportional data over time in many biological, medical and clinical studies. Under the framework of generalized linear mixed models, see Qiu et al. [1] for information on developing a simplex generalized linear mixed model on the basis of the penalized quasi-likelihood (PQL) and restricted maximum likelihood (REML) inference; see Zhang and Wei [2] for information on using the maximum likelihood estimation combining the stochastic approximation (SA) algorithm and the MCMC method to infer on simplex distribution nonlinear mixed models; see Zhao et al. [3] for information on implementing the MCMC algorithm to obtain the joint Bayesian estimate of simplex distribution nonlinear mixed models from the Bayesian perspective; see Bonat et al. [4] for information on investigating the likelihood analysis for a class of simplex mixed models with logit, probit, complement log-log and Cauchy link functions; see Quintero [5] for information on presenting the sensitivity analysis for variance parameters of random effects in Bayesian simplex mixed models. The random effects in the abovementioned mixed-effects models are assumed to have a multivariate normal distribution. However, in some practical applications, it is questionable for the normal assumption for random effects to analyze the skewed, bimodal and heavy-tailed longitudinal data. Therefore, it is essential to incorporate a semiparametric hierarchical structure via a Dirichlet process prior distribution for the random effects into the simplex mixed-effects models to accommodate longitudinal proportional data.
The nonparametric Bayesian approach based on Dirichlet process (DP) prior for random effects in mixed-effects models has been receiving a lot of attention in recent years.
For example, Kleinman and Ibrahim [6] used a Dirichlet process prior for the general distribution of the random effects in generalized linear mixed model. As a variant of Dirichlet process prior, the truncation approximation Dirichlet process with stick-breaking priors is widely incorporated into various mixed-effects models to specify the general distribution of random effects. For example, Tang and Duan [7] used this approach for a semiparametric Bayesian approach to generalized partial linear mixed model; Tang and zhao [8] used this approach for nonlinear reproductive dispersion mixed models; Zhao et al. [9] used this approach for a semiparametric Bayesian approach to binomial distribution logistic mixed-effects model. In particular, Duan et al. [10] used a truncated and centered Dirichlet process prior to specify random effects in semiparametric reproductive dispersion mixed model. However, the abovementioned DP with stick-breaking prior for random effects is inappropriate when the underlying density of random effects is continuous. In addition, this type of variant for Dirichlet process prior is rather timeconsuming in the calculation process for complicated models. Therefore, to address the above issues, the goal of this paper is to propose a new semiparametric simplex mixedeffects models with the random effects distribution specified by the centered Dirichlet process mixture model (CDPMM).
Although various methodologies have been developed to make statistical inference on the aforementioned simplex mixed-effects models, little work has been performed for the variable selection of simplex mixed-effects models. Classical model-selection methods, such as the step-wise selection method [11], the model comparison via Bayes factor [12], the Akaike information criterion [13] and Deviance information criterion [14], are often used to identify the important covariates in regression analysis; however, these approaches are generally computationally intensive and unstable for complicated mixed models with many covariates. On the other hand, the regularization (penalization) method has increasingly become a popular tool for conducting variable selection in regression analysis. Commonly used regularization methods in the context of linear regression include least absolute shrinkage and selection operator (Lasso) [15], elastic net [16] and adaptive lasso [17]. In addition, Park and Casella [18] proposed the Bayesian version of the Lasso (BLasso) by assigning the conditional Laplace prior of regression coefficients and the gamma distribution of shrinkage parameter under the Bayesian framework. The BLasso procedure has been extended to various complex models including semiparametric structural equation models [19] and semiparametric joint models of multivariate longitudinal and survival data [20]. In particular, Erd et al. [21] pointed out that Bayesian penalization methods perform similarly or sometimes even better than frequentist penalization methods, since Bayesian penalization methods can easily provide credible intervals (CIs) for parameters of interest and obtain the estimate of the penalty parameter by assigning an appropriate prior distribution. Therefore, the other main purpose of this paper is to extend the BLasso procedure to the considered semiparametric simplex mixed-effects models.
The paper is organized as follows: In Section 2, we propose a new semiparametric simplex mixed-effects models with random effects following the centered Dirichlet process mixture model (CDPMM) and incorporate a BLasso procedure into the proposed simplex mixed-effects models. The required conditional distributions are derived in Section 3. Two simulation studies and a real example are used to illustrate the proposed methodologies in Section 4. Some concluding remarks are given in Section 5.
In the context of longitudinal data analysis, let y ij denote the longitudinal percentage outcome for the ith individual at the jth follow-up time t ij , and 0 < y ij < 1, i = 1, . . . , n, j = 1, . . . , n i . We assume that, given a q × 1 random effects b i corresponding to the ith individual, the responses y ij are conditionally independent and each y ij |b i is distributed as a simplex distribution with conditional means, µ ij = E(y ij |b i ), and constant dispersion parameter, σ 2 : that is, y ij |b i ∼ S − (µ ij , σ 2 ). Under the framework of GLMM, the conditional mean is linked to explanatory variables and random effects as follows: where an unknown and monotone link function f (·) is chosen as the logit link; x ij is a (p + 1) × 1 vector of covariates which consist of the constant 1 and time-dependent covariates observed at time point t ij ; β is a (p + 1) × 1 vector of unknown regression parameters; z ij is a q × 1 vector of time-dependent variables which may include some elements of x ij corresponding to random effects b i . In classical random-effects models, the random effects in (2) are generally assumed to be a multivariate normal distribution, which may give rise to biased estimates of parameters or even misleading conclusions. Thus, inspired by Ohlssen and Spiegelhalter [23], we used the DP mixture of normals to specify the random effects: that is, b i where P is an unknown random probability. Clearly, it is rather difficult and inefficient to make Bayesian estimates for regression parameter β and dispersion parameter σ 2 in Equation (2) since an unknown form of P is involved. To address the difficulty, the Dirichlet process (DP) prior is usually introduced to approximate P, i.e., P ∼ DP(τF 0 ), in which F 0 is a given base distribution such as multivariate normal distribution that serves as a starting point for constructing the nonparametric distribution, and τ is a weight that indicates the researcher's certainty of F 0 as the distribution of P. In particular, Sethuraman [24] showed that the DP prior DP(τF 0 ) has the stick-breaking prior representation; however, this approach causes a nonzero mean of random effects [25] and a discrete probability distribution of random effects [23]. Generally, the variants of Dirichlet Process proposed by Ishwaran and Zarepour [26] and Yang et al. [25] were regarded as discrete Dirichlet processes (discrete DPs). A discrete DP with stick-breaking prior for random effects is inappropriate when the underlying density of random effects is continuous. Furthermore, violation of zero mean assumption on the random effects may lead to non-identifiability in the aforementioned random effects model. In addition, the discrete DP methods with stick-breaking prior for random effects are generally computationally intensive for the complicated models.
To overcome the above issues, inspired by Ohlssen and Spiegelhalter [23] and Yang et al. [25], we incorporated the following variant of Dirichlet process into the above model in (2) to specify random effects. That is, where π g is a random probability weight satisfying 0 ≤ π g ≤ 1 and ∑ ∞ g=1 π g = 1. In addition, π g is assumed to be be independent of (µ * g , Ω g ). This variant of Dirichlet process is referred to as the centered Dirichlet process mixture model (CDPMM). As in Ishwaran and Zarepour [26], we adopt the following mixture model of the truncated approximation DP for P: where G is a limited integer satisfying 1 ≤ G < ∞. As for the selection of G, Ishwaran and Zarepour [26] pointed out that a moderate value of G such as 25 may be enough to capture a good approximation in practical application. Thus, the value of G is chosen to be 25 in the rest of this paper. Furthermore, the random probability weight, π g , is specified by the following stick-breaking procedure: where ϑ g The prior distribution for the unknown parameter τ is chosen as τ ∼ Γ(a 1 , a 2 ), such that the posterior distribution for τ is conjugated. Here, we set the hyperparameters a 1 and a 2 to be 25 and 5, respectively, such that large value of τ is generated, which results in more unique b i values.
It is rather difficult and inefficient to generate observations from posterior distributions of b i with the above DP prior via MCMC algorithm. Furthermore, a latent variable L i ∈ {1, . . . , G} is introduced to solve sample issue since this latent variable can record each b i 's cluster membership and convey its parametric value to the distribution of b i .
As in Ishwaran and Zarepour [26], the hierarchical structure defined in (4) can be written as where δ g (·) denotes a discrete probability measure concentrated at g, f 1 (π) is defined in Equation (5), the prior for µ * g associated with and the prior for where Ψ = diag(ψ 1 , . . . , ψ q ), Γ(c 1 , c 2 ) denotes the Gamma distribution with parameters c 1 and c 2 , and ξ 0 , Thus, given the values of L i , µ * and Ω, the prior for random effect b i is assumed to be N q (µ L i , Ω L i ) with To estimate the unknown parameters β and σ 2 in Equation (2) from the Bayesian perspective, it is necessary to specify priors for β and σ 2 . In order to alleviate the computational burden, the conjugate prior distribution for dispersion parameter σ 2 is taken to be where the values of hyperparameters σ 2 a and σ 2 b are taken to be 1 and 0.01, respectively. In this paper, the main goal is to incorporate the Bayesian version of lasso into our proposed model (2) to conduct parameter estimation and model selection simultaneously. Similar to Park and Casella [18] and Tang et al. [20], the following Laplace prior on β is given by where ν is the regularization parameter. Because the mass of the above presented Laplace prior is quite highly concentrated around zero with a distinct peak at zero, posterior means or modes of β k 's are shrunk towards zero, which is the key principle in using BLasso method to select the important covariates. Following Robert [15], the Laplace distribution with the form a exp(−a|x|)/2 can be represented as a scale mixture of normal distributions with independent exponentially distributed variance: that is, Therefore, the aforementioned prior for β can be reformulated as the following hierarchical structure: where the hyperparameters ν 2 a and ν 2 b are selected as 1 and 0.1, respectively, which imply diffuse prior. Similar to Park and Casella [18], the posterior distribution for h 2 β k and ν 2 in the hierarchical structure (10) have closed expressions, such that this hierarchical representation greatly simplifies the computation. Therefore, it follows from Equation (10) that the posterior distribution of ν 2 is distributed as the following Gamma distribution In addition, the posterior distributions for h 2 β 0 , . . . , h 2 β p are derived as where IG(a, b) denotes the inverse Gaussian distribution with parameter a and the shape parameter b. As for sampling from the inverse Gaussian distribution, Tang et al. [20] gave a detailed procedure.
To obtain joint Bayesian estimates of unknown parameters β and σ 2 and the random effects, as well as to select important covariates in our considered models, a hybrid algorithm combining the block Gibbs sampler and the Metropolis-Hastings algorithm is employed to draw a sequence of random observations from the joint posterior distribution p(β, σ 2 , b|Y, X, Z), as follows. In this hybrid algorithm, observations {β, σ 2 , b} are iteratively drawn from the following conditional distributions: p(β|σ 2 , b, Y, X, Z), p(σ −2 |β, b, Y, X, Z) and p(b|β, σ 2 , Y, X, Z).

Block Gibbs Sampler (A): Conditional distribution related to β
It follows from Equations (2) and (10) that the conditional distribution p(β|σ 2 , b, Y, X, Z) is proportional to which is an unfamiliar distribution. Therefore, we used the well-known Metropolis-Hastings (MH) algorithm to generate observations from the aforementioned conditional distribution as follows. Given the current value β (l) , new candidate β is generated from the proposal distribution N(β (l) , σ 2 β Σ β ) and is accepted with probability (l) , and the variance coefficient σ 2 β can be chosen, such that the average acceptance rates are approximately 0.25 or more.

Block Gibbs Sampler (B): Conditional distribution related to σ −2
The conditional distribution p(σ −2 |β, b, Y, X, Z) can be derived as which can be simplified as Clearly, it is straightforward and efficient to draw observations for σ −2 from the Gamma distribution via any statistical software.

Block Gibbs Sampler (C): Conditional distribution related to θ b
Let θ b denote all unknown parameters associated with distribution of random effects b i , i = 1, . . . , n. θ b can be iteratively sampled by using the following nine steps: Step (a). Conditional distribution of ξ given (µ * , Ψ, b) is given Step (b). For  = 1, . . . , q, the diagonal elements of Ψ is conditionally distributed as where µ * g  is the jth element of µ * g and ξ  is the jth element of ξ.
Step (d). Following Ishwaran and Zarepour [26], the conditional distribution of τ|π can be expressed as where ν * g is a random weight sampled from the beta distribution and it is sampled with step (e).
Step (h). The conditional distribution of L i |π, µ, Ω, b is given by where π * ig is proportional to (π g p(b i |µ g , Ω g )) with b i |µ g , Ω g ∼ N q (µ g , Ω g ), and π g (g = 1, . . . , G) are sampled from step (e). Given L i , µ and Ω, the prior of b i is distributed as N q (µ L i , Ω L i ), with µ L i and Ω L i being the L i elements of sets µ and Ω, respectively.
Step (i). The conditional distribution for b = {b i : i = 1, . . . , n} The conditional distribution p(b i |β, σ 2 , Y, X, Z) is non-standard and cannot be derived directly via Gibbs sampling for i = 1, . . . , n. Specifically, specified by Equation (1) and µ by Equation (2). The Metropolis-Hastings algorithm used to sample observation b i is implemented as follows. At the th iteration with a current value b The variance, σ 2 b , can be chosen such that the average acceptance rate is approximately 0.25 or more.
Then, we can obtain a series of sample observations-{(β (l) , σ 2 (l) , b (l) ) : l = 1, 2, . . . , L} -via the above iterative process. Then, Bayesian estimates of β, σ 2 and b i for given i can be obtained by sample mean as follows: Similarly, the consistent estimates of the posterior covariance matrices of β and σ 2 can be obtained via the sample covariance matrices.

Numerical Examples
To investigate the behavior of our proposed model and the BLasso method under the Bayesian framework, we conduct four simulation studies and a real example related to a prospective ophthalmology study.

Simulation Study
In the first simulation study, we assume that, given the random effects b i = (b i1 , b i2 ) T , the longitudinal percentage responses, y ij , are conditionally independent and each y ij |b i (i = 1, . . . , 100, j = 1, . . . , 6) follows the simplex distribution-that is, y ij |b i ∼ S − (µ ij , σ 2 ). The conditional mean parameter µ ij = E(y ij |b i ) is specified as follows: where x 1ij randomly takes 1 or -1 with equal probability-x 2ij and x 3ij i.i.d ∼ N(0, 1), t ij = 0.2j for j = 0, . . . , 5. Moreover, the true values of the parameters are specified as follows: β = (β 0 , . . . , β 4 ) T = (−0.45, 0.00, 0.45, 0.00, 0.45) T . This implies that a covariate corresponding to 0 is unimportant, and that σ 2 = 1. The true distribution of random effect, b i , is assumed to be b i1 where the random effects cover the symmetric and skewed features with mean 0. A total of 500 Monte Carlo replications were conducted on the basis of the above-simulated setup.
In the second simulation study, 500 simulated datasets were generated by using the same setup as specified in the first simulation study except for the distribution of random effects. That is, random effects are distributed as where random effects have bimodal features with 0.
Fore each dataset generated from the abovementioned two simulation studies, the hybrid algorithm combining the block Gibbs sampler and the Metropolis-Hastings algorithm in conjunction with the BLasso method and the stick-break prior of CDPMM was used to produce Bayesian estimates of parameters and random effects as well as simultaneously select the important covariates. To investigate the convergence for these Bayesian algorithms, we computed the estimated potential scale reduction (EPSR, proposed by Gelman et al. [27]) of parameters via three parallel sequences of observations based on three different initial values. It can be seen from Figure 1 that the EPSR values were less than 1.2 after about 7000 iterations in both simulations for all the test runs. Therefore, L = 5000 observations collected after 7000 iterations were used to compute the simulation results for all replications. Results obtained under simulations 1 and 2 are reported in Table 1, where 'Bias' is the difference between the true value and the mean of the estimates based on 500 replications; 'RMS' is the root mean square of differences between the true values and their corresponding estimates based on 500 replications. Compared with the Lasso from the frequentist view, the BLasso would not shrink the non-significant elements of β exactly toward 0 since the sampling-based method is involved. Thus, as suggested by Tang et al. [20], the criterion for variable selection is that a coefficient is viewed as 0 if its 95% confidence interval includes zeros. In Table 1, 'F0' denotes the proportion that the number of 95% confidence interval for regression parameter including zero in 500 replications is divided by 500. The larger the values of F0 corresponding to non-significant regression parameters, and the smaller the values of F0 corresponding to significant parameters, the better the performance of the posited model.  Note: Bias denotes the difference between the true value and the mean of the estimates based on 500 replications; RMS denotes the root mean square of differences between the true values and their corresponding estimates based on 500 replications; F0 denotes the proportion of the 95% confidence interval for regression parameter including zero in 500 replications.
Examination of Table 1 indicated that (i) the Bayesian estimates of the unknown parameters β and σ 2 were reasonably accurate under the two abovementioned simulation studies since their absolute biases were less than 0.10 and their RMS values were less than 0.16; (ii) BLasso could correctly identify the zero and nonzero coefficients in most cases because the F0 values corresponding to important covariates were less than 10%, whilst the F0 values corresponding to unimportant covariates were near to 90%. On the other hand, to investigate the effectiveness of using the CDPMM prior for the random effects, we introduced the following RMSE (root of mean squared error) criterion in term of random effects, where p m (·) andp m (·) denote, respectively, the true density function for random effect b im and kernel density estimation for the estimated values of random effect {b im : i = 1, . . . , n}; h l is chosen to be the l L th quantile of the dataset {b im : i = 1, . . . , n}. The sample quantiles for the estimated values of RMSE are reported in Table 2. Furthermore, we chose a typical replication whose RMSE value is equal to the median in the 500 replications. Therefore, on the basis of the selected replication, the estimated densities of b i1 and b i2 based on the CDPMM prior against their corresponding true densities are plotted in Figures 2 and 3, which indicated that the finite mixture of normal distributions can flexibly capture the symmetric, skewed and bimodal shapes of random effect b i . From Table 2, based on the results of 500 replication in both simulations, the estimated means and standard deviation (SD) of random effects b i1 and b i2 is approximate to their corresponding true values, the 25%, 50% and 75% quantiles of RMSE(b i ) values are small and close enough, which indicated that it is robust to apply CDPMM method to estimate random effects. All these findings indicated that (i) our proposed Bayesian procedure could capture the true information of b i well, regardless of their true distributions and forms, and (ii) BLasso could identify the true model with a high probability.   To compare the performance of the CDPMM prior for the random effects with the discrete DP given by Ishwaran and Zarepour [26] and Yang et al. [25], we conducted the following third simulation study. In this simulation study, 500 simulated datasets were generated by using the same setup as specified in the first simulation study, and fitted by the model with the discrete DP for the random effects. In the fourth simulation, we reanalyzed the aforementioned 500 datasets generated in the second simulation by using a parametric Bayesian approach with the random effects distribution specified by a multivariate normal distribution. The aim of this simulation was to compare the semiparametric approach based on the CDPMM prior with the parametric approach based on the Gaussian prior from the Bayesian perspective. Results obtained under the third and fourth simulations are reported in Table 3. Our programs were written in Matlab. It roughly took 119.3 s and 186.9 s in an Intel(R) Xeon(R) Silver 4216 CPU @ 2.10GHz (Intel, Santa Clara, CA, USA) serve to run 12,000 iterations for our proposed CDPMM and discrete DP, respectively; this indicates that the CDPMM method is much more efficient than the discrete DP in our considered simulations. It can be seen from Tables 2 and 3 that (i) our proposed CDPMM and discrete DP methods have the same performance in term of the 'Bias' and 'RMS' values, but the F0 values corresponding to the non-significant parameters under the proposed CDPMM prior are higher than those under the discrete DP prior; (ii) the RMS values and the correct rates of variable selection based on the F0 values under the semiparametric CDPMM prior are better than those under the parametric Gaussian prior.

Real Example
In this section, the application of our proposed approach to the skewed longitudinal proportional data is illustrated by the analysis of a prospective ophthalmology study [28] from the Bayesian perspective. The prospective ophthalmology data used in the study were obtained from the Supplementary Materials of a paper by Song and Tan [29] and were available from https://biometrics.biometricsociety.org/home/archive/supplementarymaterials, accessed on 5 september 2022. This prospective ophthalmology study described that the eyes of 31 patients before surgery were injected by three gas concentration levels of C 3 F 8 , and all patients after surgery were followed up three-eight times over a three-month period. The outcome variable was the percentage of remaining gas volume relative to the initial volume of gas injected. These longitudinal proportional data from a prospective ophthalmology study were analyzed by Qiu et al. [1], Song and Tan [29] and Song et al. [30], respectively. However, these authors did not consider to conduct variable selection in the analysis of this dataset. Our scientific interest in this study is to investigate the effect of three initial gas concentration levels of C 3 F 8 and time on the percentage of remaining gas volume, while accounting for selecting the important covariates based on the BLasso method. Let the response y ij denote the percentage of gas left in the eye for patient i at the jth follow-up day, t ij , and y ij |b i ∼ S − (µ ij , σ 2 ). Thus, the conditional mean in our proposed semiparametric mixed-effects model is given by where t ij is the time covariate for days after the gas injection; w ij is the covariate of gas concentration levels equal to −1, 0 and 1, corresponding to the concentration levels of 15%, 20% and 25%, respectively; and random effects b i1 and b i2 are specified by CDPMM in (4), which characterize the effect fluctuations of interception and logarithmic time among patients.
The abovementioned MCMC algorithm was used to produce the joint Bayesian estimates of parameters and random effects in this real example. In the implementation of MCMC process, the hyperparameter values were taken to be the same as those given in simulation. Similarly, we used the EPSR method given in simulation to investigate the convergence for these algorithms. The EPSR values of all parameters against the iteration numbers was plotted in Figure 4, which indicated that the MCMC algorithm converged after 4000 iterations since their EPSR values were less than 1.2 after about 4000 iterations. Hence, L = 4000 observations collected after 4000 iterations were used to calculate Bayesian estimates of parameters and random effects. Results are reported in Table 4 and Figure 5, which indicated that (i) the estimated densities of random effects b i1 and b i2 were bimodal and skewed, which indicated that traditional normality assumption for random effects is inappropriate in this real example; (ii) the square of logarithmic time (log 2 (t ij )) was detected to be an important covariate with a significantly negative effect on the percentage of gas left in the eye, since its corresponding 95% confidence interval did not include zero. The gas concentration levels (w ij ) and the logarithmic time (log 2 (t ij )) were insignificant at significance level 0.05 because their 95% confidence interval included zero.

Conclusions
In this paper, we introduced a new semiparametric simplex mixed-effects models with the random effects following the centered Dirichlet process mixture model (CDPMM). The advantages of the proposed model based on CDPMM are the following: (i) it can capture the features of skewed and bimodal longitudinal proportional data; (ii) it can characterize absolutely continuous distributions for random effects. The novelty of our approach is that we adopted the BLasso procedure to simultaneously estimate parameters of interest, provide credible intervals (CIs) for parameters and conduct both shrinkage and variable selection for our considered models. A hybrid algorithm combining the Gibbs sampler and the MH algorithm was used to simultaneously obtain Bayesian estimates of unknown parameters, random effects and their standard errors and credible intervals. Empirical results show that (i) the proposed semiparametric Bayesian method provides quite accurate estimates of parameters (see Table 1); (ii) the average frequencies of correctly identifying unimportant predictors were near to 90%; (iii) CDPMM can effectively capture the potential features of normal, gamma and mixture normal distributions (see Table 2 and

Conflicts of Interest:
The authors declare no conflict of interest.