Next Article in Journal
N-Methods
Previous Article in Journal
Dynamics of Predator-Prey Model with Fear Factor and Impulsive Nonlinear Effects
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Variational Bayesian Quantile Regression with Non-Ignorable Missing Response Data

1
School of Digital Economy and Trade, Guangzhou Huashang College, Guangzhou 511300, China
2
School of Mathematics and Statistics, Guangxi Normal University, Guilin 541006, China
3
School of Statistics and Data Science, Xinjiang University of Finance and Economics, Urumqi 830012, China
4
Center for Applied Statistics, School of Statistics, Renmin University of China, Beijing 100872, China
*
Author to whom correspondence should be addressed.
Axioms 2025, 14(6), 408; https://doi.org/10.3390/axioms14060408
Submission received: 22 April 2025 / Revised: 21 May 2025 / Accepted: 23 May 2025 / Published: 27 May 2025

Abstract

:
For non-ignorable missing response variables, the mechanism of whether the response variable is missing can be modeled through logistic regression. In Bayesian computation, the lack of a conjugate prior for the logistic function poses a significant challenge. Introducing a new Pólya-Gamma variable and employing lower-bound approximation are two common methods for parameter inference in conjugate Bayesian logistic regression. It can be observed that these two methods yield essentially the same variational posterior in the calculation of the variational Bayesian posterior. This paper applies a popular Bayesian spike-and-slab LASSO prior for variable selection in quantile regression with non-ignorable missing response variables, which demonstrates good performance in both simulations and practical applications.

1. Introduction

In many application fields, such as economics, sociology, and biomedicine, some subjects may have missing responses or predictors due to various reasons, including study dropout, unwillingness of study participants to answer certain questions in the questionnaire, and information loss caused by uncontrollable factors. Statistical inference for missing data problems is quite challenging.
Rubin [1] categorized missing data into three mechanisms: missing completely at random (MCAR), where the missingness process is independent of the observed and missing quantities; missing at random (MAR), where the missingness process depends on the observed quantities but not on the missing quantities; and non-ignorable missingness or not missing at random (NMAR), where the missingness process depends on both the observed and missing quantities. In missing data analysis, the NMAR assumption may be more reasonable than the classical MAR assumption.
Bayesian parameter estimation methods are often used to address the estimation problem of parametric models involving missing data for several reasons. First, Markov chain Monte Carlo (MCMC) methods widely used in statistical computing, such as the Gibbs algorithm [2] and the Metropolis–Hastings (MH) algorithm [3,4], can be employed to estimate the posterior distributions of parameters, nonparametric functions, and missing data. Second, compared to the setting without missing data, Bayesian methods with missing data only require an additional step in the Gibbs sampler. Therefore, Bayesian methods can easily handle missing data without the need for new statistical inference techniques [5]. Third, some prior information can be directly incorporated into the analysis, resulting in more accurate parameter estimation when good prior assumptions are available. Fourth, sampling-based Bayesian methods do not rely on asymptotic theory and may provide more reliable statistical inference even in small sample situations. In recent years, there have been many studies on NMAR data analysis, such as those by Lee and Tang [6], Tang and Zhao [7], and Xu and Tang [8].
Furthermore, variable selection can be viewed as a special case of model selection, which is achieved through spike-and-slab priors in Bayesian variable selection. This paper chooses the spike-and-slab LASSO prior [9] for parameter estimation and variable selection. The missingness mechanism of the response variable can be obtained through logistic regression, and variational Bayesian algorithms are considered for model parameter estimation. In the calculation of variational posteriors, due to the lack of a conjugate prior for logistic regression, we cannot compute the variational posteriors by specifying a reasonable variational family. Introducing Pólya-Gamma latent variables [10] and employing lower-bound approximation [11] are two methods that can yield conjugate posteriors for Bayesian logistic regression. In the posterior computation process of variational Bayesian methods, these two methods will have the same variational posteriors. Considering the characteristics of the spike-and-slab prior, it is unnecessary to calculate the complex variational lower bound, and the algorithm can still converge.
In this study, we propose a variational Bayesian quantile regression algorithm to address the challenges posed by non-ignorable response missing data. Unlike traditional methods, the variational Bayesian approach offers an efficient and scalable solution by transforming the posterior inference problem into an optimization task, ensuring faster convergence and reducing computational burden. The quantile regression framework provides a robust alternative to mean-based models, capturing the conditional distribution of the response variable across different quantiles, which is particularly valuable when the data exhibit heteroscedasticity or skewness. While Li’s paper [12] also considers missing covariates and response variables, it does not address variable selection. In our work, we employ a prior that enables variable selection, allowing for effective variable screening alongside parameter estimation. Additionally, the convergence criteria of our algorithm differ from Li’s. Li uses the minimal change in the variational lower bound as the stopping condition, which involves more complex computations. In contrast, our approach is more computationally efficient. Moreover, the proposed method incorporates variable selection through a Bayesian shrinkage prior, effectively identifying significant predictors while accounting for the missing data mechanism. This combination of variational inference, quantile regression, and variable selection not only enhances estimation accuracy but also offers a flexible and computationally efficient tool for analyzing complex missing data structures. These features highlight the novelty and practical relevance of our approach.
This article is organized as follows: Section 2 introduces the model, prior, and variational Bayesian logistic regression used in this paper; Section 3 proposes the corresponding variational Bayesian algorithm for data with non-ignorable missing responses; Section 4 conducts simulation studies for the proposed algorithm; Section 5 applies the algorithm to real data analysis; and relevant conclusions are presented in Section 6.

2. Model Prior and Variational Bayesian Logistic Regression

In this paper, the response variable y i , i = 1 ,   2 ,   ,   n may be missing, while all covariates (explanatory variables) X i = x i 1 , , x i p are completely observable. The incomplete observations are as follows:
( X i , y i , r i ) , i = 1 ,   2 ,   ,   n .  
where r i 0 , 1 determines whether y i is missing. When r i = 1 , y i is missing. Let y = { y o , y m } , r = ( r 1 ,   r 2 ,   ,   r n ) , where y o and y m represent the observed response variables and the missing response variables, respectively. Let p r i y i , X i , φ denote the conditional distribution of r i given y i , X i , and φ, where φ is the unknown parameter vector in the conditional probability function p r i y i , X i , φ . The missingness mechanism of the data is completely determined by this conditional distribution.
We consider the following non-ignorable missingness mechanism:
p r i y i , X i , φ = p r i = 1 y i , X i , φ r i 1 p r i = 1 y i , X i , φ 1 r i  
where p r i y i , X i , φ can be modeled through logistic regression:
l o g i t p r i y i , X i , φ = φ 01 + φ 11 y i + φ 21 x i 1 + + φ 2 p x i p = W i T φ
where φ = φ 01 , φ 11 , φ 21 , , φ 2 p T are the logistic regression model parameters, and W i T = 1 , y i , x i 1 , , x i p are the covariates of the logistic regression model.
The quantile regression model with non-ignorable missing responses. For 0 < τ < 1 , it is constrained such that the τ -th quantile equals zero. The τ -th quantile regression model is expressed as follows:
Q τ y i x i = X i T β , i = 1 , , n .  
p r i = 1 y i , x i , φ = e W i T φ 1 + e w i T φ = σ W i T φ  
where β = (β1, …, βp)T are the quantile regression model parameters to be estimated when r i = 1 , y i is missing.

2.1. Spike-and-Slab LassoLASSO Prior

The spike-and-slab LassoLASSO (SSL) prior [9] is represented as follows:
β j γ j = γ j φ 1 β j + 1 γ j φ 0 β j , j = 1 , 2 , , p ,
φ 1 β j = λ 1 2 exp λ 1 β j , φ 0 β j = λ 0 2 exp λ 0 β j .
where λ1 is chosen to be a smaller value, while λ0 should be chosen to be a larger value. In the Bayesian framework, the Laplace distribution is not conjugate, but it can be hierarchically represented using the normal distribution (⋅, ⋅) and the exponential distribution Exp(⋅):
p β j τ 1 j 2 , γ j = 1 ~ N 0 , τ 1 j 2 ,    p ( τ 1 j 2 λ 1 2 ~ E x p λ 1 2 2  
p β j τ 0 j 2 , γ j = 0 ~ N 0 , τ 0 j 2 ,    p ( τ 0 j 2 λ 0 2 ~ E x p λ 0 2 2  
Figure 1 describes four types of spike-and-slab priors with a mixing proportion of γ j = 1 / 2 . They are the Normal mixture (where both φ 1 β j and φ 0 β j are normal distributions), the Normal and Point-mass mixture (where φ 0 β j is a point-mass function at 0 and φ 1 β j is a normal distribution), the Laplace and Point-mass mixture (where φ 0 β j is a point-mass function at 0 and φ 1 β j is a Laplace distribution), and the SSL (where both are Laplace distributions). It can be seen that the Normal mixture cannot adequately penalize smaller coefficients, making it difficult to achieve variable selection. On the other hand, the point-mass spike-and-slab prior has an over-shrinkage problem and may miss important variables. Therefore, the SSL prior can be considered as a balance between the two.
Penalized priors (such as LASSO) and spike-and-slab priors are common priors in Bayesian variable selection. When λ 1 = λ 0 = λ , the SSL prior degenerates to the LassoLASSO prior. When λ 0     , φ 0 β j   0 . That is, in the limit case, SSL can be transformed into the “gold standard” point-mass spike-and-slab prior. Thus, SSL integrates the penalized likelihood (LassoLASSO) and the spike-and-slab prior. SSL is also adaptive (there is no spike-and-slab adaptive LassoLASSO), and the proof of its adaptivity can be found in the discussion by Rockova and George [9]. SSL uses the spike component to encourage sparsity by shrinking many regression coefficients to zero, while the slab component captures larger signals, enabling simultaneous variable selection and parameter estimation. Unlike traditional LassoLASSO with fixed regularization, SSL automatically adjusts the sparsity parameter based on the data, reducing the need for manual tuning and providing multiplicity correction, which lowers the false positive rate.

2.2. Bayesian Logistic Regression Based on Pólya-Gamma Latent Variables

The logistic function does not have a conjugate prior, which poses a challenge for Bayesian inference in logistic regression. Polson et al. [10] proposed a new data augmentation strategy for Bayesian inference in logistic regression models.
If ωP (b, c), where P (b, c) denotes the Pólya-Gamma distribution with parameters (b, c), its expectation is
E ω = b 2 c tanh c 2 = b 2 c ( e c 1 e c + 1 )
The probability density of ω has the following property:
p ( ω b , c ) e x p c 2 2 ω p ( ω b , 0 )
If ω∼(b, 0) and p(ω) represents its probability density, then
e ψ a 1 + e ψ b = 2 b e κ ψ 0   e ω ψ 2 / 2 p ( ω ) d ω
where b > 0 ,   κ = a b / 2 . Applying this result to (5) (see Section 3.1 of Polson et al. [10]), we obtain that
L φ , ω = i = 1 n ( e W i T φ ) r i 1 + e W i T φ i = 1 n   e x p r i 1 2 W i T φ ω i 2 W i T φ 2 p ω i 1 , 0
where ω = ( ω 1 , , ω n ) , ω i ~ P G ( 1 , 0 ) , and its probability density is denoted as p ω i 1 , 0 .
Assuming a prior p ( φ ) for φ , the posterior of φ is as follows:
q ( φ ω , y ) p ( φ ) i = 1 n   e x p κ i W i T φ ω i W i T φ 2 / 2
where κ i = r i 1 / 2 . If p ( φ ) is a Gaussian prior, then the posterior q ( φ ω , y ) is conjugate with the prior. The posterior density of the variable ω i is
q ( ω i | φ ) e x p ω i 2 W i T φ 2 p ω i 1 , 0
Then, the posterior of ω i is P G ( 1 , W i T φ ) , and E ω i | φ = 1 2 W i T φ tanh W i T φ 2 .

2.3. Bayesian Logistic Regression Based on Lower-Bound Approximation

The log-likelihood of ( 5 ) is as follows:
l φ = log L φ = log i = 1 n ( e W i T φ ) r i 1 + e W i T φ = i = 1 n   r i W i T φ g ( W i T φ )
where g t = log ( 1 + e t ) , t R . We take the logarithm of the logistic function:
log σ t = log ( 1 + e t ) 1 = t 2 log ( e t 2 + e t 2 )
Jaakkola and Jordan [11] approximate log ( e t 2 + e t 2 ) using a first-order Taylor expansion:
log σ t t 2 log e η 2 + e η 2 1 4 η tanh η 2 t 2 η 2 = t η 2 + log σ η 1 4 η tanh η 2 t 2 η 2
Substituting g W i T φ = log σ W i T φ into l φ :
l φ i = 1 n log σ η i η i 2 + r i 1 2 W i T φ 1 4 η i t a n h η i / 2 W i T φ 2 η i 2 = : f ( φ , η )
We want the lower bound f ( φ , η ) of l φ to be as large as possible. For a given φ , the lower bound f φ , η = f η , so we maximize f η . We define a function f a ( R R , a 0 ) :
f a ( x ) = l o g σ x x 2 1 4 x t a n h ( x / 2 ) a 2 x 2
Then, f a ( x ) is symmetric about x = 0 , and f a ( x ) reaches its maximum at x = ± a . The proof of this conclusion can be found in Kolyan Ray et al.’s work [13]. Therefore, when η i = W i T φ , the lower bound f ( η ) reaches its maximum value.
The posterior of φ is as follows:
q φ = c p φ L φ c p φ e x p { f φ , η } p ( φ ) i = 1 n   e x p r i 1 2 W i T φ 1 4 η i t a n h η i / 2 W i T φ 2 = p ( φ ) i = 1 n   e x p r i 1 2 W i T φ E ω i | φ W i T φ 2 / 2
Comparing the posterior q( φ ) under these two methods, we find that whether using the lower bound approximation or introducing the new variable Pólya-Gamma method, φ will have essentially the same variational posterior.

3. Variational Bayesian Algorithm for Bayesian Quantile Regression with Non-Ignorable Missing Responses

3.1. Hierarchical Model and Prior

In Bayesian quantile regression, it is often assumed that y i follows an asymmetric Laplace (AL) distribution, where
f y i β , σ = τ ( 1 τ ) σ exp ρ τ y X i j T β σ .
where ρ τ ( u ) = u ( τ I ( u < 0 ) ) , I ( ) is an indicator function.according According to the findings of Kozumi and Kobayashi [14], the asymmetric Laplace (AL) distribution is expressed as a mixture of a normal distribution and an exponential distribution.
y i | β , σ , e i ~ N X i T β + k 1 e i , σ k 2 e i , e i | σ   ~   E x p 1 σ ,
where k 1 = ( 1 2 τ ) / ( τ 1 τ ) , k 2 = 2 / ( τ 1 τ ) .
The Bayesian quantile regression model with non-ignorable missing responses has the following hierarchical representation:
y i N X i T β + k 1 e i , k 2 σ e i ,   w h e n   r i = 1 ,   y i   i s   m i s s i n g . r i | ω i e x p r i 1 2 W i T φ ω i 2 W i T φ 2 , ω i P G 0 , 1 , i = 1 , 2 , , n . e i σ e x p 1 σ .
SSL prior can be hierarchically represented using the normal distribution and the exponential distribution:
β j τ 1 j 2 , γ j = 1 N 0 , τ 1 j 2 , τ 1 j 2 λ 1 2 E x p λ 1 2 / 2 , j = 1 , 2 , , p . β j τ 0 j 2 , γ j = 0 N 0 , τ 0 j 2 , τ 0 j 2 λ 0 2 E x p λ 0 2 / 2 . γ j π B e r n o u l l i π
The prior of hyperparameters:
σ I G a σ 0 , b σ 0 , π B e t a a 0 , b 0 λ 1 2 G a c 1 , d 1 , λ 0 2 G a c 0 , d 0
where G a a , b and I G ( a , b ) represent the gamma distribution and inverse gamma distribution, respectively, with shape parameter a and scale parameter b . B e t a · , · isas beta distribution. If p ( φ ) is a Gaussian prior:
φ N μ φ 0 , Σ φ 0
In the incomplete data setting, the missing data y m introduce additional complexity because the posterior distribution of the model parameters ( β , σ , φ ) is a marginal distribution:
q β , σ , φ y o , X , r = q β , σ , φ , y m y o , X , r d y m
This marginalization often results in a complicated and intractable posterior distribution since the integral rarely has a closed-form solution. By augmenting the data with y m , the posterior becomes a complete-data posterior:
q β , σ , φ , y m y o , X , r
which is typically easier to work with because—conditionally on the imputed y m —the model often simplifies to standard forms like conjugate priors, making Bayesian updating more straightforward.

3.2. Coordinate Ascent Variational Inference Algorithms

Let θ Θ , D is the observed data, and p ( θ | D ) is the conditional posterior distribution of θ . Variational Bayesian methods aim to find a distribution q ( θ ) within a given family of distributions F , minimizing the Kullback–Leibler (KL) divergence between q ( θ ) and p ( θ | D ) . Denoting p θ as the prior distribution of the parameters, we have
log p D = log p θ p D θ q θ q θ d θ + log q θ p θ D q θ d θ .
In this equation, the first term is the evidence lower bound observed (ELBO), and the second term is the Kullback–Leibler (KL) divergence between q ( θ ) and p ( θ | D ) , denoted as K L ( q | | p ) . The logarithm of the likelihood of the observed data, log p ( D ) , is independent of the choice of q ( θ ) .
Therefore, minimizing K L ( q | | p ) is equivalent to maximizing the evidence lower bound:
q ^ θ = arg max q θ F E q θ   log p D , θ E q θ   log q θ .
For ease of solving q ^ θ , consider mean-field Variational Bayesian Bayes (MFVB). Coordinate Ascent Variational Inference (CAVI) is one of the most commonly used VB algorithms. CAVI iteratively optimizes each factor of the mean-field variational density while keeping the other factors fixed, ultimately increasing the ELBO until a local optimum is reached. Let q ^ θ = t = 1 T q t ( θ t ) , and denote θ t = ( θ 1 , , θ t 1 , θ t + 1 , . . . , θ T ) . According to the results of Blei et al. [15], the optimal variational posterior is given by
q * θ t exp E θ t log p θ t D , θ t .

3.3. Variational Posterior

In this paper, the parameters under consideration are
θ = β , τ 0 j 2 , τ 1 j 2 , γ j j = 1 p , π , λ 1 2 , λ 2 2 , e i , ω i i = 1 n , σ , φ , y m
The derivation of the variational posterior is provided in Appendix A, and the variational posterior of β is
q β N μ , Σ ,
The variational posterior of τ 1 j 2 is
q τ 1 j 2 G I G 1 2 , a 1 j , b 1 j ,
The variational posterior of τ 0 j 2 is
q ( τ 0 j 2 ) G I G ( 1 2 , a 0 j , b 0 j ) ,
The Variationalvariational posterior of γ j is:
q γ j Bernoulli ϕ j ,
The Variationalvariational posterior of π is
q π ~ Beta a , b ,
The Variationalvariational posterior of λ 1 2 is
q λ 1 2 ~ Γ ( c 1 ~ , d 1 ~ ) ,
The Variationalvariational posterior of λ 0 2 is
q λ 0 2 ~ Γ ( c 0 ~ , d 0 ~ ) ,
The Variationalvariational posterior of e i :
q ( e i ) ~ G I G ( 1 2 , a e = y i X i T E [ β ] 2 σ k 2 , b e = ( k 1 2 + 2 k 2 ) / ( σ k 2 ) ) .
The Variationalvariational posterior of σ :
q σ ~ I G ( a σ = 3 n 2 + r 0 , b σ ) ,
The Variationalvariational posterior ( φ ):
q φ ~ N ( μ φ , Σ φ ) ,
The Variationalvariational posterior of ω i is
q ω i ~ P G 1 , c ^ i ,
For a missing individual i , denoted as y i m , suppose there are n m individuals missing in total, the Variationalvariational posterior is
q ( y i m ) N ( μ y , σ y ) ,
The CAVI algorithm is used to update the parameters in the posterior distributions until convergence. A common convergence criterion is that the ELBO no longer changes (or the change is small enough to be ignored). Since we focus on variable selection and the ELBO involves many complex expectations and entropy calculations, we calculate the entropy of ϕ = ( ϕ 1 ,   ,   ϕ p ) until it no longer changes (or the change is very small, less than a given threshold). It can be seen that, when ϕ tends to 0, the algorithm converges. Please refer to Algorithm 1.
E(ϕ) = ϕ log(ϕ) + (1pϕ) log (1pϕ)
Algorithm 1: Variational Bayesian algorithm with non-negligible missing response
Variational Bayesian parameter initialization;
Setting initial values :   δ = 10 3 , J = 1000 , t = 1 , E n t ( ϕ ) ( 0 ) = 0 ;
while  E n t ( ϕ ) ( t ) E n t ( ϕ ) ( t 1 ) δ and 1 t J  do
Σ ( t ) E ( t 1 ) [ 1 / σ ] k 2 i = 1 n   X i τ X i E e i + E ( t 1 ) D τ 1 1 ;
μ ( t ) Σ E ( t 1 ) [ 1 / σ ] k 2 i = 1 n   X i T E ( t 1 ) 1 e i y i k 1 ;
a 1 j ( t ) E ( t 1 ) γ j E ( t 1 ) β j 2 , b 1 j ( t ) E ( t 1 ) γ j E ( t 1 ) λ 1 2 ;
a 0 j ( t ) E ( t 1 ) 1 γ j E ( t 1 ) β j 2 , b 0 j ( t ) = E ( t 1 ) 1 γ j E ( t 1 ) λ 0 2 ;
l o g i t ϕ j ( t ) E ( t 1 ) 1 2 l o g τ 0 j 2 τ 1 j 2 + β 1 2 2 1 τ 0 j 2 1 τ 1 j 2 + l o g λ 1 2 λ 0 2 + λ 1 2 2 τ 0 j 2 λ 1 2 2 τ 1 j 2 + l o g π 1 π ;
a ( t ) a 0 + j = 1 p   E ( t 1 ) γ j , b ( t ) b 0 + p j = 1 p   E ( t 1 ) γ j ;
c ~ 1 ( t ) c 1 + j = 1 p   E ( t 1 ) γ j , d ~ 1 ( t ) d 1 + j = 1 p   E ( j 1 ) γ j E ( i 1 ) r 1 2 2 ;
c ~ 0 ( t ) c 0 + p j = 1 p   E ( t 1 ) γ j , d ~ 0 ( t ) d 0 + j = 1 p   E ( t 1 ) 1 γ j E ( t 1 ) τ 0 2 2 ;
a e ( t ) y 1 X i T E ( t 1 ) [ β ] 2 σ k 2 , b e ( t ) k 1 2 + 2 k 2 / σ k 2 ;
a σ t 3 n 2 + a σ 0 , b σ ( t ) b σ 0 + 1 2 k 2 i = 1 n   y i X i T E ( t 1 ) [ β ] k 1 E ( t 1 ) e i 2 E [ ( 1 ) e i + i = 1 n   E ( t 1 ) e i ;
Σ φ ( t ) W E ( t 1 ) [ d i a g ( ω ) ] W T + Σ φ 0 1 1 ;
μ φ ( t ) Σ φ W r 1 2 1 n + Σ φ 0 1 μ φ 0 ;
c ˆ i ( t ) E ( t 1 ) w i T φ 2 ;
σ y ( t ) φ 1 2 E ( t 1 ) ω i m + E ( t 1 ) 1 2 σ k 2 e i m 1 ;
μ y ( t ) σ y φ 1 r i m 1 2 + E ( t 1 ) X m m T β + k 1 e m m σ k 2 e i m ;
Update E n t ( ϕ ) ( t ) , t t + 1 ;
end while

4. Simulation Study

In this section, we generate simulated data as follows:
y i = X i T β + ε i ,   i = 1 , 2 , , n
Let x i j is the jth element of X i and x i j ∼(0, 1), εi are independently and identically distributed. We consider the following distributions for εi: (1) normal distribution εi∼(0, 1); (2) Cauchy distribution εi∼C(0, 1); (3) t-distribution with 3 degrees of freedom εit(3). The values of β are set as follows:
Simulation 1: β = (3, 1.5, 0, 0, 2, 0, 0, 0), a sparse model;
Simulation 2: β = (5, 5, 5, 5, 5, 5, 5, 0), a dense model;
Simulation 3: β = (5, 0, 0, 0, 0, 0, 0, 0), an ultra-sparse model.
The missing data mechanism M0  φ 01 = φ 11 = φ 21 = = φ 28 = 0.1
l o g i t p r i y i , X i , φ = 0.1 + 0.1 y i + 0.1 x i 1 + + 0.1 x i 8 ,
We generate n = 50 data points. To reduce the influence of priors on the results, the hyperparameters in the priors are set as a 0 = b 0 = 0.01; a σ 0 = b σ 0 = 0.01; c 1 = d 1 = 0.01; c 0 = d 0 = 0.01; μφ0 is a vector consisting of the means of non-missing data, and Σφ0 is a diagonal matrix of the variances of non-missing data. We use the mean squared error (MSE) of βj estimates and running time (in seconds, T) to measure the estimation accuracy and computational efficiency of each method. The running time is obtained using the tic and toc functions in the “tictoc” package in R (unit: seconds). For each setting, we conduct 100 simulations. The tables report the average results of 100 simulations, with standard deviations in parentheses. The quantiles considered are τ = 0.2, 0.5, 0.8. In R (version 4.3.1), there are no publicly available software packages for methods related to non-ignorable missing response variables. Therefore, we only present the parameter estimation and variable selection results of the algorithm proposed in this chapter.
Table 1, Table 2 and Table 3 show the average results of β estimates using the algorithm proposed in this chapter for Simulations 1, 2, and 3, respectively, when ε follows a normal distribution, Cauchy distribution, and t-distribution. The values in parentheses are the average MSE of β over 100 simulations. The results indicate that our algorithm can provide good estimates of β. When ε follows a Cauchy distribution, the estimation error of β is slightly larger, but the difference is not significant. This is consistent with the characteristics of the Cauchy distribution for ε.
In the algorithm proposed in this chapter, the values of ϕ = (ϕ1, ⋯, ϕp) determine the variable selection. When ϕj ≥ 1/2, the j-th variable is selected; otherwise, the j-th variable is not selected. Table 4, Table 5 and Table 6 show the variable selection results when ε follows a normal distribution, Cauchy distribution, and t-distribution, respectively. A value of 1 indicates that the variable is selected, while 0 indicates that it is not selected. It can be observed that all important covariates are identified.
We stop the proposed algorithm when the entropy of ϕ no longer changes (or changes very little, less than a given threshold). Figure 2, Figure 3 and Figure 4 show the entropy calculation results of ϕ when ε follows a normal distribution, Cauchy distribution, and t-distribution, respectively. It can be seen that, in all simulations, our algorithm converges within 100 iterations, which is far fewer than the number of samples in sampling-based MCMC algorithms. Therefore, the algorithm we propose has high computational efficiency.
Variational Bayesian (VB) inference is often considered superior to MCMC in terms of speed, scalability, and efficiency. Unlike MCMC, which relies on iterative sampling and can be computationally expensive, VB transforms the inference problem into an optimization task, leading to faster and more predictable convergence. VB produces deterministic results, avoiding the Monte Carlo error inherent in MCMC, and is more memory-efficient as it does not require storing large numbers of posterior samples. It is particularly well-suited for large-scale models, where MCMC may struggle due to slow sampling and high computational costs. Additionally, VB tends to offer more interpretable posterior distributions by approximating them with simpler parametric families. However, while VB is faster and more scalable, it may underestimate uncertainty due to its reliance on approximations, whereas MCMC remains more flexible and accurate for complex posterior distributions.

5. Real Data Analysis

In this section, we analyze data from HIV-positive patients in the AIDS Clinical Trials Group (ACTG175) study [16], which can be obtained using the command data (ACTG175) in the R package “BART”. The ACTG175 dataset contains 27 variables, and 2139 HIV-positive patients were randomly assigned to the following four groups: (1) 532 received zidovudine treatment; (2) 522 received didanosine treatment; (3) 524 received a combination of zidovudine and didanosine; and (4) 561 received a combination of zidovudine and zalcitabine. These patients were monitored at weeks 2, 4, and 8 after the start of the experiment and then every 12 weeks thereafter. Monitoring ended when the CD4 T-cell count declined by 50% or more, or the patient died.
We are interested in the relationship between the dependent variable Y, the CD4 T-cell count at 96 ± 5 weeks, and the covariates age (X1), weight (X2), baseline CD4 T-cell count (X3), CD4 T-cell count at 20 ± 5 weeks (X4), baseline CD8 T-cell count (X5), and CD8 T-cell count at 20 ± 5 weeks (X6). Due to patient death or dropout, the dependent variable Y has missing records, with missing rates of 39.66%, 36.21%, 35.69%, and 37.43% for the four groups, respectively. Relevant medical research indicates that CD4 T-cell count is related to disease progression, and patients with lower CD4 T-cell counts are more likely to drop out of the study. This suggests that the missingness of Y (CD4 T-cell count at 96 ± 5 weeks) is related to the CD4 T-cell count. In summary, the missingness of Y is not random, and we can establish the following model:
Qτ (yixi) = β1xi1 + ⋯ + β6xi6, i = 1, ⋯, n.
p r i = 1 y i , x i , φ = e φ 0 + φ 1 y i + φ 21 x i 1 + + φ 26 x i 6 1 + e φ 0 + φ 1 y i + φ 21 x i 1 + + φ 26 x i 6
Table 7 summarizes the coefficients and 95% confidence intervals for the quantile regression of the four treatment groups when = 0.5. The QR row shows the results of quantile regression estimation when there is no missing data. It can be seen that the proposed method effectively imputes missing data. Figure 5 shows that our algorithm achieves convergence in the analysis of the AIDS data. From the estimated coefficients of the quantile regression, age and weight factors do not significantly influence the observed CD4 T-cell count at 96 ± 5 weeks. The observed value at 20 ± 5 weeks has a significant positive effect on the CD4 T-cell count at 96 ± 5 weeks and is the main influencing factor for the 96 ± 5-week measurement. The influence of the baseline measurement at a longer interval on the 96 ± 5-week measurement is relatively small and not even significant in groups (1) and (2) (corresponding to variable selection with ϕ < 1 / 2 ). Interestingly, the CD8 T-cell count at 20 ± 5 weeks has a significant negative effect in groups (2) and (3), indicating that higher previous CD4 levels and lower CD8 levels contribute to an increase in CD4 T-cell count. This finding is similar to the use of the CD4/CD8 ratio as an indicator of antiretroviral therapy efficacy in existing studies, suggesting that a higher baseline ratio favors CD4 T-cell count recovery and immune function reconstruction. The results of this study also suggest that, in the mid-term assessment of HIV infection treatment efficacy, attention should be paid not only to CD4 T-cell levels but also to CD8 T-cell levels. With unchanged baseline CD4 levels, a low CD4/CD8 ratio may negatively impact long-term treatment outcomes.

6. Conclusions

In this paper, we consider a Bayesian quantile regression model with non-ignorable missing response data, where the non-ignorable missing mechanism is specified by a logistic function. The model parameters of the quantile regression are assigned a spike-and-slab LassoLASSO prior, which enables effective parameter estimation and variable selection without requiring further processing of the parameter posteriors. Bayesian variable selection based on the spike-and-slab prior is computationally expensive, and variational Bayes is a popular Bayesian computational method. Since the logistic function does not have a conjugate prior, we introduce a new Pólya-Gamma variable in the logistic function, making the parameter posteriors of the logistic function conjugate. Finally, we propose a variational algorithm for Bayesian quantile regression with non-ignorable missing response data, which performs well in both simulation studies and real data analysis.

Author Contributions

J.Z. and W.W. contributed equally to this work. They were both involved in the development of the methodology, data analysis, and drafting of the manuscript. M.T. served as the corresponding author, providing guidance on the research design, supervising the study, and revising the manuscript critically for important intellectual content. All authors have read and agreed to the published version of the manuscript.

Funding

The research is supported by Guangzhou Huashang College Project: 2023HSDS25 and Beijing Natural Science Foundation (1242005).

Data Availability Statement

The researchers can be obtained using the command data (ACTG175) in the R package “BART”.

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A

Denote D τ = diag ( τ j 2 = γ j τ 1 j 2 + ( 1 γ j ) τ 0 j 2 . The variational posterior of β is
q β e x p E β 1 2 σ k 2 i = 1 n y i X i T β k 1 e i 2 e i 1 2 β T D τ 1 β
Then, q β N μ , Σ , where
Σ = E 1 / σ k 2 i = 1 n X i T X i E e i + E D τ 1 1 , μ = Σ E 1 / σ k 2 i = 1 n X i T E 1 e i y i k 1 .
The Variationalvariational posterior of τ 1 j 2 is
q τ 1 j 2 e x p E τ 1 j 2 γ j 1 2 l o g τ 1 j 2 β j 2 2 1 τ 1 j 2 λ 1 2 2 τ 1 j 2
Then, q τ 1 j 2 G I G 1 2 , a 1 j , b 1 j , G I G · , · , · is Generalized Inverse Gaussian distribution. Where, where
a 1 j = E γ j E β j 2 , b 1 j = E γ j E λ 1 2
The Variationalvariational posterior of τ 0 j 2 is
q τ 0 j 2 e x p E τ 0 j 2 1 γ j 1 2 l o g τ 0 j 2 β j 2 2 1 τ 0 j 2 λ 0 2 2 τ 0 j 2 ,
Then, q ( τ 0 j 2 ) G I G ( 1 2 , a 0 j , b 0 j ) , where
a 0 j = E [ ( 1 γ j ) ] E [ β j 2 ] , b 0 j = E [ ( 1 γ j ) ] E [ λ 0 2 ]
The Variationalvariational posterior of γ j is
q γ j exp E γ j γ j 1 2 log τ 0 j 2 τ 1 j 2 + β j 2 2 1 τ 0 j 2 1 τ 1 j 2 + log λ 1 2 λ 0 2 + λ 0 2 2 τ 0 j 2 λ 1 2 2 τ 1 j 2 + log ( π 1 π )
Then, q γ j Bernoulli ϕ j , E γ j = ϕ j , where
ϕ j = σ ( E 1 2 log τ 0 j 2 τ 1 j 2 + β j 2 2 1 τ 0 j 2 1 τ 1 j 2 + log λ 1 2 λ 0 2 + λ 0 2 2 τ 0 j 2 λ 1 2 2 τ 1 j 2 + log ( π 1 π ) )
The Variationalvariational posterior of π is
q π exp E π a 0 1 + j = 1 p γ j log π + b 0 1 + p j = 1 p γ j log ( 1 π )
Then, q π ~ Beta a , b , where
a = a 0 + j = 1 p E γ j , b = b 0 + p j = 1 p E γ j ,
The Variationalvariational posterior of λ 1 2 is
q λ 1 2 exp E λ 1 2 c 1 1 + j = 1 p γ j log λ 1 2 d 1 + j = 1 p γ j τ 1 j 2 2 λ 1 2
Then, q λ 1 2 ~ Γ ( c 1 ~ , d 1 ~ ) , where
c 1 ~ = c 1 + j = 1 p E γ j , d 1 ~ = d 1 + j = 1 p E γ j E τ 1 j 2 2 .
The Variationalvariational posterior of λ 0 2 is
q λ 0 2 exp E λ 0 2 c 0 1 + p j = 1 p γ j log λ 0 2 d 0 + j = 1 p ( 1 γ j ) τ 0 j 2 2 λ 0 2
Then, q λ 0 2 ~ Γ ( c 0 ~ , d 0 ~ ) , where
c 0 ~ = c 0 + p j = 1 p E γ j , d 0 ~ = d 0 + j = 1 p E ( 1 γ j ) E τ 0 j 2 2 ,
The Variationalvariational posterior of e i is
q ( e i ) exp E e i 1 2 log e i 1 2 σ k 2 y i X i T β k 1 e i 2 e i e i σ
Then, q ( e i ) ~ G I G ( 1 2 , a e = y i X i T E [ β ] 2 σ k 2 , b e = ( k 1 2 + 2 k 2 ) / ( σ k 2 ) ) .
The Variationalvariational posterior of σ :
q ( σ ) e x p { E σ [ n 2 log σ 1 2 σ k 2 i = 1 n y i X i T β k 1 e i 2 e i n log σ i = 1 n e i σ ( r 0 + 1 ) log σ s 0 σ ] }
Then, q σ ~ I G ( a σ = 3 n 2 + r 0 , b σ ) , where
b σ = s 0 + 1 2 k 2 i = 1 n y i X i T E [ β ] k 1 E [ e i ] 2 E [ e i ] + i = 1 n E [ e i ]
The Variationalvariational posterior of φ is
q φ exp E φ i = 1 n r i 1 2 W i T φ ω i 2 W i T φ 2 1 2 ϕ μ φ 0 T Σ φ 0 1 ϕ μ φ 0 ,
Then, q φ ~ N ( μ φ , Σ φ ) , where
Σ φ = ( W E [ d i a g ( ω ) ] W T + Σ φ 0 1 ) 1 , μ φ = Σ φ W r 1 2 1 n + Σ φ 0 1 μ φ 0 , r = r 1 , , r n T , 1 n = ( 1 , 1 , , 1 ) T n   o n e s .
The Variationalvariational posterior of ω i is
q ω i exp E ω i ω i 2 W i T φ 2 p ω i 1 , 0 ,
Then, q ω i ~ P G ( 1 , c ^ i ) , c ^ i = E W i T φ 2 .
For a missing individual i , denoted as y i m , suppose there are n m individuals missing in total, and the Variationalvariational posterior is
q y i m e x p E y i m r i m 1 2 y i m φ 1 ω i m 2 y i m φ 1 2 y i m X i m T β k 1 e i m 2 2 σ k 2 e i m
Then, q ( y i m ) N ( μ y , σ y ) , where
σ y = ( φ 1 2 E [ ω i m ] + E [ 1 2 σ k 2 e i m ] ) 1 , μ y = σ y φ 1 r i m 1 2 .

References

  1. Rubin, D.B. Inference and missing data. Biometrika 1976, 63, 581–592. [Google Scholar] [CrossRef]
  2. Geman, S.; Geman, D. Stochastic Relaxation, Gibbs Distributions, and the Bayesian Restoration of Images. IEEE Trans. Pattern Anal. Mach. Intell. 1984, 6, 721–741. [Google Scholar] [CrossRef] [PubMed]
  3. Hastings, W.K. Monte Carlo Sampling Methods Using Markov Chains and Their Applications; Oxford University Press: Oxford, UK, 1970; pp. 97–103. [Google Scholar]
  4. Metropolis, N.; Rosenbluth, A.W.; Rosenbluth, M.N.; Teller, A.H.; Teller, E. Equation of State Calculations by Fast Computing Machines. J. Chem. Phys. 1953, 21, 1087–1092. [Google Scholar] [CrossRef]
  5. Ibrahim, J.G.; Chen, M.H.; Lipsitz, S.R.; Herring, A.H. Missing-Data Methods for Generalized Linear Models: A Comparative Review. J. Am. Stat. Assoc. 2005, 100, 332–346. [Google Scholar] [CrossRef]
  6. Lee, S.Y. Bayesian analysis of nonlinear structural equation models with nonignorable missing data. Psychometrika 2006, 71, 541–564. [Google Scholar] [CrossRef]
  7. Tang, N.S.; Zhao, H. Bayesian analysis of nonlinear reproductive dispersion mixed models for longitudinal data with nonignorable missing covariates. Commun. Stat.-Simul. Comput. 2014, 43, 1265–1287. [Google Scholar] [CrossRef]
  8. Xu, D.; Tang, N. Bayesian adaptive Lasso for quantile regression models with nonignorably missing response data. Commun. Stat.-Simul. Comput. 2019, 48, 2727–2742. [Google Scholar] [CrossRef]
  9. Ročková, V.; George, E.I. The spike-and-slab lasso. J. Am. Stat. Assoc. 2018, 113, 431–444. [Google Scholar] [CrossRef]
  10. Polson, N.G.; Scott, J.G.; Windle, J. Bayesian inference for logistic models using Pólya–Gamma latent variables. J. Am. Stat. Assoc. 2013, 108, 1339–1349. [Google Scholar] [CrossRef]
  11. Jaakkola, T.S.; Jordan, M.I. Bayesian parameter estimation via variational methods. Stat. Comput. 2000, 10, 25–37. [Google Scholar] [CrossRef]
  12. Li, X.; Tuerde, M.; Hu, X. Variational Bayesian Inference for Quantile Regression Models with Nonignorable Missing Data. Mathematics 2023, 11, 3926. [Google Scholar] [CrossRef]
  13. Ray, K.; Szabó, B.; Clara, G. Spike and slab variational Bayes for high dimensional logistic regression. Adv. Neural Inf. Process. Syst. 2020, 33, 14423–14434. [Google Scholar]
  14. Kozumi, H.; Kobayashi, G. Gibbs sampling methods for Bayesian quantile regression. J. Stat. Comput. Simul. 2011, 81, 1565–1578. [Google Scholar] [CrossRef]
  15. Blei, D.M.; Kucukelbir, A.; McAuliffe, J.D. Variational Bayesian inference: A review for statisticians. J. Am. Stat. Assoc. 2017, 112, 859–877. [Google Scholar] [CrossRef]
  16. Hammer, S.M.; Katzenstein, D.A.; Hughes, M.D.; Gundacker, H.; Schooley, R.T.; Haubrich, R.H.; Henry, W.K.; Lederman, M.M.; Phair, J.P.; Niu, M.; et al. A Trial Comparing Nucleoside Monotherapy with Combination Therapy in HIV-Infected Adults with CD4 Cell Counts from 200 to 500 per Cubic Millimeter. N. Engl. J. Med. 1996, 335, 1081–1090. [Google Scholar] [CrossRef] [PubMed]
Figure 1. Four types of spike-and-slab priors.
Figure 1. Four types of spike-and-slab priors.
Axioms 14 00408 g001
Figure 2. Convergence iterations of the VB algorithm for 3 simulations when ε∼(0, 1). (a) Number of iterations in Simulation 1; (b) Number of iterations in Simulation 2; (c) Number of iterations in Simulation 3.
Figure 2. Convergence iterations of the VB algorithm for 3 simulations when ε∼(0, 1). (a) Number of iterations in Simulation 1; (b) Number of iterations in Simulation 2; (c) Number of iterations in Simulation 3.
Axioms 14 00408 g002
Figure 3. Convergence iterations of the VB algorithm for 3 simulations when ε∼(0, 1). (a) Number of iterations in Simulation 1; (b) Number of iterations in Simulation 2; (c) Number of iterations in Simulation 3.
Figure 3. Convergence iterations of the VB algorithm for 3 simulations when ε∼(0, 1). (a) Number of iterations in Simulation 1; (b) Number of iterations in Simulation 2; (c) Number of iterations in Simulation 3.
Axioms 14 00408 g003
Figure 4. Convergence iterations of the VB algorithm for 3 simulations when ε∼(3). (a) Number of iterations in Simulation 1; (b) Number of iterations in Simulation 2; (c) Number of iterations in Simulation 3.
Figure 4. Convergence iterations of the VB algorithm for 3 simulations when ε∼(3). (a) Number of iterations in Simulation 1; (b) Number of iterations in Simulation 2; (c) Number of iterations in Simulation 3.
Axioms 14 00408 g004
Figure 5. Convergence iterations of the VB algorithm for the AIDS data. (a) Number of iterations for Group (1); (b) Number of iterations for Group (2); (c) Number of iterations for Group (3); (d) Number of iterations for Group (4).
Figure 5. Convergence iterations of the VB algorithm for the AIDS data. (a) Number of iterations for Group (1); (b) Number of iterations for Group (2); (c) Number of iterations for Group (3); (d) Number of iterations for Group (4).
Axioms 14 00408 g005
Table 1. Comparison of parameter estimation and running time for 3 simulations when εN(0, 1).
Table 1. Comparison of parameter estimation and running time for 3 simulations when εN(0, 1).
Simulationτ β ^ 1 β ^ 2 β ^ 3 β ^ 4 β ^ 5 β ^ 6 β ^ 7 β ^ 8T
Simulation 1β3.0001.5000.0000.0002.0000.0000.0000.000
τ = 0.23.4921.0360.1530.1592.374−0.460−0.246−0.3740.041
(0.242)(0.214)(0.023)(0.025)(0.140)(0.211)(0.060)(0.139)
τ = 0.52.9861.478−0.0080.0022.0580.047−0.0280.0420.039
(0.059)(0.022)(0.034)(0.051)(0.033)(0.071)(0.045)(0.027)
τ = 0.82.6961.294−0.0040.3081.735−0.075−0.174−0.0680.040
(0.517)(0.728)(0.611)(0.615)(0.429)(0.390)(0.381)(0.533)
Simulation 2β5.0005.0005.0005.0005.0005.0005.0000.000
τ = 0.25.0154.8085.0444.9855.0054.9044.977−0.0010.012
(0.466)(0.564)(0.238)(0.317)(0.384)(0.502)(0.433)(0.205)
τ = 0.54.9784.9474.9954.9564.9934.9795.0130.0060.011
(0.019)(0.022)(0.019)(0.029)(0.019)(0.024)(0.033)(0.027)
τ = 0.85.0464.8644.8994.7745.0024.9714.8740.1070.012
(0.503)(0.389)(0.325)(0.346)(0.360)(0.364)(0.428)(0.285)
Simulation 3β5.0000.0000.0000.0000.0000.0000.0000.000
τ = 0.25.312−0.169−0.045−0.1440.4450.030−0.2150.0730.019
(0.526)(0.410)(0.413)(0.285)(0.437)(0.109)(0.320)(0.209)
τ = 0.54.9930.009−0.054−0.015−0.115−0.063−0.008−0.0190.012
(0.010)(0.008)(0.013)(0.023)(0.016)(0.019)(0.022)(0.010)
τ = 0.85.025−0.2420.080−0.0380.042−0.058−0.109−0.2410.019
(0.326)(0.582)(0.142)(0.324)(0.153)(0.430)(0.611)(0.510)
Table 2. Comparison of parameter estimation and running time for 3 simulations when ε∼C(0, 1).
Table 2. Comparison of parameter estimation and running time for 3 simulations when ε∼C(0, 1).
Simulationτ β ^ 1 β ^ 2 β ^ 3 β ^ 4 β ^ 5 β ^ 6 β ^ 7 β ^ 8T
Simulation 1β3.0001.5000.0000.0002.0000.0000.0000.000
τ = 0.22.8901.3600.186−0.0102.157−0.029−0.2550.1780.024
(0.621)(0.210)(0.343)(0.267)(0.137)(0.199)(0.170)(0.219)
τ = 0.53.0321.591−0.224−0.0712.255−0.3040.284−0.2460.029
(0.291)(0.317)(0.131)(0.299)(0.137)(0.129)(0.331)(0.229)
τ = 0.83.0371.872−0.404−0.1822.1220.242−0.1940.0990.023
(0.259)(0.262)(0.433)(0.425)(0.137)(0.129)(0.323)(0.219)
Simulation 2β5.0005.0005.0005.0005.0005.0005.0000.000
τ = 0.24.7524.8254.7785.2984.5874.8184.7420.3210.016
(0.225)(0.310)(0.399)(0.255)(0.324)(0.119)(0.390)(0.219)
τ = 0.54.6454.7694.5444.6424.7745.2814.9084.9450.018
(0.276)(0.218)(0.233)(0.295)(0.247)(0.112)(0.184)(0.105)
τ = 0.85.1825.5434.7505.5524.9315.3394.7180.2800.018
(0.326)(0.433)(0.447)(0.215)(0.337)(0.179)(0.290)(0.219)
Simulation 3β5.0000.0000.0000.0000.0000.0000.0000.000
τ = 0.24.9700.018−0.243−0.1540.2300.244−0.2380.2510.021
(0.373)(0.220)(0.383)(0.275)(0.267)(0.119)(0.391)(0.119)
τ = 0.55.411−0.1170.1850.2380.3550.1630.1932010.020
(0.284)(0.310)(0.217)(0.195)(0.197)(0.199)(0.110)(0.199)
τ = 0.84.8790.1190.2950.4230.1480.2530.1650.1140.026
(0.371)(0.388)(0.313)(0.385)(0.237)(0.169)(0.333)(0.288)
Table 3. Comparison of MSE and running time for 3 simulations when ε∼t(3).
Table 3. Comparison of MSE and running time for 3 simulations when ε∼t(3).
Simulationτ β ^ 1 β ^ 2 β ^ 3 β ^ 4 β ^ 5 β ^ 6 β ^ 7 β ^ 8T
Simulation 1β3.0001.5000.0000.0002.0000.0000.0000.000
τ = 0.22.9231.4490.1450.2101.7620.3240.1810.1490.037
(0.164)(0.120)(0.213)(0.185)(0.125)(0.129)(0.410)(0.189)
τ = 0.53.0181.4090.012−0.2222.020−0.0360.0560.1610.036
(0.186)(0.220)(0.198)(0.117)(0.237)(0.119)(0.298)(0.119)
τ = 0.83.2451.4180.129−0.1762.1950281−0.1240.1150.022
(0.121)(0.210)(0.213)(0.285)(0.237)(0.109)(0.320)(0.209)
Simulation 2β5.0005.0005.0005.0005.0005.0005.0000.000
τ = 0.24.8505.3624.9874.8635.0815.0794.5520.1710.007
(0.123)(0.245)(0.265)(0.134)(0.239)(0.221)(0.179)(0.238)
τ = 0.54.9974.8574.9524.9305.0075.0285.028−0.0740.011
(0.298)(0.210)(0.256)(0.235)(0.277)(0.219)(0.335)(0.349)
τ = 0.84.8124.9774.5774.7374.7935.0204.7460.2100.012
(0.253)(0.203)(0.323)(0.335)(0.177)(0.187)(0.289)(0.219)
Simulation 3β5.0000.0000.0000.0000.0000.0000.0000.000
τ = 0.25.151−0.127−0.0290.0220.039−0.1290.014−0.1340.016
(0.253)(0.137)(0.128)(0.134)(0.267)(0.214)(0.197)(0.298)
τ = 0.55.1710.029−0.0860.084−0.097−0.201−0.4740.2770.012
(0.203)(0.110)(0.278)(0.235)(0.117)(0.279)(0.220)(0.289)
τ = 0.85.2560.306−0.200−0.012−0.0050.254−0.2880.1360.017
(0.184)(0.212)(0.196)(0.187)(0.117)(0.109)(0.120)(0.109)
Table 4. Variable selection results for 3 simulations when ε∼N(0, 1).
Table 4. Variable selection results for 3 simulations when ε∼N(0, 1).
Simulationτβ1β2β3β4β5β6β7β8
Simulation 1β3.0001.5000.0000.0002.0000.0000.0000.000
τ = 0.211001000
τ = 0.511001000
τ = 0.811001000
Simulation 2β5.0005.0005.0005.0005.0005.0005.0000.000
τ = 0.211111110
τ = 0.511111110
τ = 0.811111110
Simulation 3β5.0000.0000.0000.0000.0000.0000.0000.000
τ = 0.210000000
τ = 0.510000000
τ = 0.810000000
Table 5. Variable selection results for 3 simulations when ε∼C(0, 1).
Table 5. Variable selection results for 3 simulations when ε∼C(0, 1).
Simulation τβ1β2β3β4β5β6β7β8
Simulation 1β3.0001.5000.0000.0002.0000.0000.0000.000
τ = 0.211001000
τ = 0.511001000
τ = 0.811001000
Simulation 2β5.0005.0005.0005.0005.0005.0005.0000.000
τ = 0.211111110
τ = 0.511111110
τ = 0.811111110
Simulation 3β5.0000.0000.0000.0000.0000.0000.0000.000
τ = 0.210000000
τ = 0.510000000
τ = 0.810000000
Table 6. Variable selection results for 3 simulations when ε∼t(3).
Table 6. Variable selection results for 3 simulations when ε∼t(3).
Simulation τβ1β2β3β4β5β6β7β8
Simulation 1β3.0001.5000.0000.0002.0000.0000.0000.000
τ = 0.211001000
τ = 0.511001000
τ = 0.811001000
Simulation 2β5.0005.0005.0005.0005.0005.0005.0000.000
τ = 0.211111110
τ = 0.511111110
τ = 0.811111110
Simulation 3β5.0000.0000.0000.0000.0000.0000.0000.000
τ = 0.210000000
τ = 0.510000000
τ = 0.810000000
Table 7. Results of the AIDS data analysis.
Table 7. Results of the AIDS data analysis.
β1β2β3β4β5β6
(1)QR0.050−0.0280.1280.531−0.008−0.107
Est0.052−0.0300.1340.550−0.010−0.111
CI(−0.060, 0.134)(−0.158, 0.045)(−0.040, 0.204)(0.453, 0.612)(−0.135, 0.090)(−0.195, 0.019)
(2)QR0.048−0.0350.1250.6520.122−0.166
Est0.055−0.0310.1290.6200.125−0.169
CI(−0.041, 0.141)(−0.105, 0.043)(−0.002, 0.200)(0.519, 0.650)(0.005, 0.274)(−0.301, −0.139)
(3)QR0.0400.0810.2010.5130.060−0.211
Est0.0360.0790.1960.5070.062−0.209
CI(−0.054, 0.121)(−0.003, 0.159)(0.112, 0.309)(0.398, 0.614)(−0.070, 0.227)(−0.337, −0.098)
(4)QR0.0590.0640.2710.515−0.095−0.063
Est0.0650.0620.2620.511−0.098−0.061
CI(−0.030, 0.149)(−0.038, 0.152)(0.131, 0.465)(0.398, 0.615)(−0.188, 0.037)(−0.187, 0.069)
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Zhang, J.; Wang, W.; Tian, M. Variational Bayesian Quantile Regression with Non-Ignorable Missing Response Data. Axioms 2025, 14, 408. https://doi.org/10.3390/axioms14060408

AMA Style

Zhang J, Wang W, Tian M. Variational Bayesian Quantile Regression with Non-Ignorable Missing Response Data. Axioms. 2025; 14(6):408. https://doi.org/10.3390/axioms14060408

Chicago/Turabian Style

Zhang, Juanjuan, Weixian Wang, and Maozai Tian. 2025. "Variational Bayesian Quantile Regression with Non-Ignorable Missing Response Data" Axioms 14, no. 6: 408. https://doi.org/10.3390/axioms14060408

APA Style

Zhang, J., Wang, W., & Tian, M. (2025). Variational Bayesian Quantile Regression with Non-Ignorable Missing Response Data. Axioms, 14(6), 408. https://doi.org/10.3390/axioms14060408

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop