Next Article in Journal
Wafer Delay Minimization in Scheduling Single-Arm Cluster Tools with Two-Space Process Modules
Previous Article in Journal
Hyers–Ulam Stability of Isometries on Bounded Domains–III
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Research on Quantile Regression Method for Longitudinal Interval-Censored Data Based on Bayesian Double Penalty

School of Science, Hubei University of Technology, Wuhan 430068, China
*
Author to whom correspondence should be addressed.
Mathematics 2024, 12(12), 1782; https://doi.org/10.3390/math12121782
Submission received: 30 April 2024 / Revised: 29 May 2024 / Accepted: 5 June 2024 / Published: 7 June 2024

Abstract

The increasing prominence of the problem of censored data in various fields has made studying how to perform parameter estimation and variable selection in censored mixed-effects models one of the hotspots of current research. In this paper, considering the situation that the response variable is restricted by the bilateral limit, a double-penalty Bayesian Tobit quantile regression model was constructed to carry out parameter estimation and variable selection in the interval-censored mixed-effects model, and at the same time, the fixed-effects and random effects coefficients are compressed in Tobit’s mixed-effects model, so as to reduce the estimation error of the model at the same time as the variable selection of the model is carried out. The posterior distribution of each unknown parameter was derived using the conditional Laplace prior and the mixed truncated normal distribution of interval-censored data, and then the Gibbs sampling algorithm for unknown parameter estimation was constructed. Through Monte Carlo simulation, it was found that the new method is more advantageous than the classical method in terms of variable selection and parameter estimation accuracy in various situations, such as different model sparsity, different data censoring ratios and different random error distributions, and the model is able to realize automatic variable selection. Finally, the new method is used to analyze the correlation between the crime rate and various economic indicators in China.

1. Introduction

In practical research in statistics, the problem of censored data due to various factors is becoming more and more prominent, and studying how to estimate parameters and select variables in censored mixed-effects models has become a hot research topic. Interval-censored data challenge traditional least squares estimation methods due to their boundedness, asymmetry, and bias. Earlier studies, such as Song et al. [1] and Ferrari [2], have pointed out that general linear regression methods lead to significant estimation bias when dealing with interval-censored data. Although Lesaffre et al. [3] attempted regression analysis through parameter transformation, the results were not satisfactory. To cope with the asymmetric problem, Espinheira [4] proposed a regression model based on the Beta distribution, but the method was limited to proportionate data that conformed to the Beta distribution. Zhao [5] investigated the variable selection problem combined with the penalty function in the context of interval censoring. On the other hand, Ying et al. [6] proposed a resampling algorithm applicable to the nature of large samples that significantly improved the computational efficiency. However, the problem of parameter estimation and variable selection for interval-censored data in high-dimensional contexts still requires in-depth research.
As early as 1958, Tobin J proposed the Tobit model with restricted response variables [7]. Since this type of model and traditional linear regression models rely mainly on the mean to estimate the regression coefficients [8], they often fail to reveal the full extent of regression information. To improve on this, Powel investigated quantile regression estimation for the Tobit model in 1986 [9], but its asymptotic covariance matrix is affected by the error density function, which affects estimation reliability [10]. When applied to high-dimensional longitudinal data, the complexity of truncated-tailed data, random effects, and random errors further exacerbates the difficulty of parameter estimation [11]. Therefore, the development of novel and efficient sampling algorithms to optimize parameter estimation and variable selection for Tobit quantile regression models is important in terms of improving model accuracy and providing reliable statistical tools for related fields.
In Tobit quantile regression models, although mixed-effects models can comprehensively consider the covariates affecting the response variables [12], they are computationally intensive and may affect the accuracy of parameter estimation [13]. Parameter estimation and variable selection of mixed-effects models with censored data using different penalization methods can effectively make up for the shortcomings of traditional methods [14]. Facing the complexity of variable selection and estimation [15], the Bayesian method avoids the difficulty of penalty parameter selection by treating the parameters as distributions [16]. Incorporating the penalty function into the Bayesian method to construct a hierarchical quantile regression model provides a new idea for parameter estimation of the Tobit quantile regression model. In this paper, based on the Bayesian framework, a double-penalty quantile regression method is proposed to deepen the application and effect study of the Tobit quantile regression model in censored data.
In recent years, the use of penalized methods for downscaling and modeling analysis of high-dimensional data has attracted much attention. Alhamzawi et al. [17] and Alhamzawi [18] proposed Tobit quantile regression methods for adaptive Lasso and adaptive elastic net from a Bayesian perspective, which achieved variable selection through the gamma prior and Gibbs sampling algorithms. Alhusseini [19], on the other hand, introduces Lasso penalties into a Bayesian Tobit quantile regression model with coefficient estimation using scale-mixed homogeneous prior parameters. However, applying these methods directly in mixed-effects models with latent variables may lead to biased regression coefficient estimates. For this reason, Alhamzawi and Ali [20] use a mixture of asymmetric Laplace distributions for regularization to avoid the non-convex miniaturization problem. Abbas [21] introduces ridge regression parameters in the covariance matrix to deal with the multicollinearity problem. Although most of the current research focuses on fixed-effects models, the inference of censored statistics in mixed-effects models, especially from the perspective of Bayesian incorporation of penalties, still needs to be further explored.
This paper focuses on longitudinal data Tobit models containing latent variables, which are transformed into Tobit models for interval-censored data by adjusting the constraints on the response variables. Based on the Bayesian approach and combined with the penalty function, the Bayesian double-penalty Tobit quantile regression model with interval censoring of the response variable is constructed. We will explore the parameter estimation and variable selection problems under different penalty function methods, different censoring proportions, and different random error distributions, respectively, with a view to providing new perspectives and ideas for research in this field.

2. Model Building and Estimation Methods

2.1. Bayesian Tobit Hierarchical Quantile Regression Model for Longitudinal Interval-Censored Data

Since the response variable has both lower and upper bounds for interval-censored data, the estimates obtained by directly building the binned regression model of Powell [9] will be out of the upper and lower bounds. In the mixed-effects model, the Tobit model for interval-censored data will be developed based on the Tobit linear model with the introduction of latent variables as follows:
{ y i j = x i j β + z i j α i + ε i j y i j = y i j I (   N ˜ y i j M ˜ ) + M ˜ I ( y i j > M ˜ ) +   N ˜ I ( y i j <   N ˜ )
xij′ is the value of the explanatory variable taken by the i-th individual at the j-th time observation point, zij′ is the covariate corresponding to the random effect, β and αi are the fixed-effect and random effect coefficient vectors, respectively. The distribution of perturbation term εij is unknown; yij is the value of the response variable that can be observed for the i-th individual at the j-th time point, yij* is the corresponding unknown potential-dependent variable, and I ( ) is an indicative function. To be specific,
I (   N ˜ y i j M ˜ ) = { 1 ,                     N ˜ y i j M ˜ 0 ,                   y i j > M ˜       o r       y i j <   N ˜
I ( y i j > M ˜ ) = { 1 ,                   y i j > M ˜ 0 ,                   y i j M ˜ }
I ( y i j <   N ˜ ) = { 1 ,                   y i j <   N ˜ 0 ,                   y i j   N ˜ }
For the estimation of the quantile regression coefficients, β and α in the above model can be obtained by minimizing the following Equation (5).
min β ,   α i = 1 n j = 1 m ρ τ ( y i j max ( N ˜ , min ( M ˜ , x i j β + z ι j α i ) ) )
Among them, ρ τ ( u ) = u ( τ 1 ( u 0 ) ) is the quantile loss function. According to the traditional Bayesian quantile regression method, assuming y i j ~ A L D ( μ , σ , τ ) , the likelihood function of the sample is as follows:
L ( y ij , β , α i , σ , τ ) = ( τ ( 1 τ ) σ ) nm exp { i = 1 n j = 1 m ρ τ ( y ij μ i j σ ) }
Kottas and Krnjaji [22] found that the asymmetric Laplace distribution can be decomposed into a mixture of normal and exponential distributions. The ALD distribution was further decomposed into N(0,1) and E(1/σ). Then, yij* can be expressed as [23]:
y ij = x i j β + z i j α i + k 1 ν i j + k 2 σ v i j θ i j , i = 1 , , n ; j = 1 , , m
Among them θ ~ N ( 0 , 1 ) , ν ~ E ( 1 / σ ) ,   k 1 = 1 2 τ τ ( 1 τ ) , k 2 = 2 τ ( 1 τ ) . So, y ij | ν i j , σ , β , α i ~ N ( x i j β + z i j α i + k 1 ν i j , k 2 σ ν i j ) .
Assuming that each parameter has a prior distribution, a Bayesian Tobit hierarchical quantile regression model for interval-censored data can be built (P-BTQR):
{ y i j = y i j I (   N ˜ y i j M ˜ ) + M ˜ I ( y i j > M ˜ ) +   N ˜ I ( y i j <   N ˜ ) y i j = x i j β + z i j α i + k 1 ν i j + k 2 σ ν i j θ i j θ i j ~ N ( 0 , 1 ) , ν i j | σ ~ E ( 1 σ ) β ~ π ( β ) , α i ~ π ( α i ) , σ ~ π ( σ )

2.2. Bayesian Double Lasso Penalized Quantile Regression Method for Tobit Model

Lasso penalties and adaptive Lasso penalties are applied to fixed-effect β and random effect αi, respectively, and the dual Lasso Bayesian Tobit quantile regression method (PDL-BTQR) and dual-adaptive Lasso Bayesian Tobit quantile regression method (PDAL-BTQR) for interval-censored data are proposed.
Firstly, in the mixed-effects model, both fixed-effects β and random effects αi are assumed to have conditional Laplace priors as follows:
π ( β | σ , λ 1 ) = l = 1 k λ 1 2 σ exp { λ 1 σ | β l | } , l = 1 , , k
π ( α i | σ , λ 2 ) = i = 1 n t = 1 q λ 2 2 σ exp { λ 2 σ | α i t | } , t = 1 , , q
Using the integral constancy equation proposed by Mallows and Andrews [24]:
a 2 e a | z | = 0 + 1 2 π s exp ( z 2 2 s ) a 2 2 exp ( a 2 s 2 ) d s
Let η 1 = λ 1 σ , η 2 = λ 2 σ , S = ( s 1 , , s k ) , R = ( r 11 , , r n q ) , we can obtain:
π ( β , S | η 1 2 ) = l = 1 k 1 2 π s l exp ( β l 2 2 s l ) η 1 2 2 exp ( η 1 2 s l 2 )
π ( α i , R | η 2 2 ) = i = 1 n t = 1 q 1 2 π r it exp ( α i t 2 2 r i t ) η 2 2 2 exp ( η 2 2 2 r i t )
Then, β | s l ~ N ( 0 , s l ) , s l | η 1 2 ~ E ( η 1 2 2 ) ; α i | r i t ~ N ( 0 , r i t ) , r it | η 2 2 ~ E ( η 2 2 2 ) .
Additionally, assuming η 1 2 ~ I G ( e 0 , f 0 ) , η 2 2 ~ I G ( g 0 , h 0 ) . From Equations (8), (11) and (12), the posterior distribution of the fixed-effects β and random effects αi can be derived as:
π ( β , α i | y ij , σ , η 1 , η 2 ) exp { 1 σ i = 1 n j = 1 m ρ τ ( y i j max ( N ˜ , min ( M ˜ , x i j β + z ι j α i ) ) ) l = 1 k η 1 | β l | i = 1 n t = 1 q η 2 | α i t | }
In Bayesian statistics, maximizing the conditional posterior density function is equivalent to minimizing the negative logarithmic posterior density function because the logarithmic function is monotonically increasing and can be converted from multiplication to addition, making the optimization process simpler. For Bayesian Tobit quantile regression with a double Lasso penalty, define a posterior distribution that contains the data likelihood, the prior distribution, and the Lasso penalty term. The parameters are then estimated by minimizing the negative logarithmic posterior. Maximize the conditional posterior density in Equation (13) and also to minimize the Bayesian Tobit quantile regression function in Equation (14) with the following double Lasso penalty:
i = 1 n j = 1 m ρ τ ( y i j max ( N ˜ , min ( M ˜ , x i j β + z ι j α i ) ) ) + l = 1 k η 1 | β l | + i = 1 n t = 1 q η 2 | α i t |
Next, assuming π ( σ ) obeys the inverse Gamma distribution π ( σ ) ~ I G ( c 0 , d 0 ) . The priori density function is π ( σ ) ( σ ) c 0 1 exp { d 0 σ } . Based on the prior distribution and prior density function, the likelihood function of the resulting model can be rewritten as:
L ( y ij | β , α i , ν , σ ) = ( 2 π k 2 σ ) n m 2 ( i = 1 n j = 1 m ν ij ) 1 2   exp { 1 2 k 2 σ i = 1 n j = 1 m ( y ij μ i j ν i j ) 2 }
The conditional posterior distribution of the latent variable yij* is obviously an interval truncated normal distribution as follows:
π ( y i j | y i j , β , α i , ν , σ ) ~ y i j I ( N ˜ y i j M ˜ ) + TN ( , N ˜ ] ( x i j β + z i j α + k 1 ν , k 2 σ ν ) I ( y i j < N ˜ ) + TN [ M ˜ , + ) ( x i j β + z i j α + k 1 ν , k 2 σ ν ) I ( y i j > M ˜ )
where TN ( , N ˜ ] ( ) and TN [ M ˜ , + ) ( ) denote the right-truncated and left-truncated normal distributions, respectively. yij is only related to the distribution of yij*, so yij is also normally truncated. The conditional posterior densities for each of the other unknown parameters in the model will be derived below:
For, sl, l = 1, , k, there are:
π ( s l | y ij , β , η 1 2 , α i ) s l 1 2 1 exp ( 1 2 ( β l 2 s l 1 + η 1 2 s l ) )
The posterior distribution of the mixture parameters is π ( s l | y i j , β , η 1 2 , α i ) ~ GIG ( 1 / 2 , | β l | , | η 1 | ) , where G I G ( ξ , ψ , ζ ) is the inverse Gaussian distribution and the conditional probability density function is as follows, where K ξ ( ) denotes the third-class modified Bessel function:
f ( x | ξ , ψ , ζ ) = ( ζ / ψ ) 2 K ξ ( ψ ζ ) x ξ 1 exp { 1 2 ( ψ 2 x 1 + n 2 x ) } , x > 0 , < ξ < , ψ , ζ 0
For η 1 2 , there is:
π ( η 1 2 | y i j , R , β , α i ) ( η 1 2 ) k exp ( η 1 2 2 l = 1 k s l ) ( η 1 2 ) e 0 1 exp ( f 0 η 1 2 )
The penalty parameter is η 1 2 , the obedience shape parameters are (k + e0), the scale parameter is the inverse Gamma distribution of ( f 0 + s l / 2 ) denoted as:
π ( η 1 2 | y i j , R , β , α i ) ~ I G ( k + e 0 , f 0 + l = 1 k s l 2 )
For fixed effects β:
π ( β | y i j , α i , ν , σ , S ) L ( y i j | β , α i , ν , σ ) π ( β , S | η 1 2 ) exp { 1 2 k 2 σ i = 1 n j = 1 m ( y i j x i j β z i j α i k 1 ν i j ν i j ) 2 } l = 1 k exp ( β l 2 2 s l )
This yields a fixed-effects posterior distribution that obeys a normal distribution π ( β | y i j , α i , ν , σ , S ) ~ N ( Θ , Δ ) , where Δ = ( X D X + H ) 1 , Θ = X D ( y i j z i j α i k 1 ν ) , H = d i a g ( s 1 1 , s 2 1 , , s l 1 ) , and D = d i a g ( 1 / k 2 σ ν ij ) .
For rit, i = 1, …, n, t = 1, …, q, there is:
π ( r i t | y ij , α i , η 2 2 ) r q 1 2 1 exp ( 1 2 ( α i t 2 r i t 1 + η 2 2 r i t ) )
This means that the mixing parameter rit obeys the inverse Gaussian distribution, noted as π ( r i t | y ij , α i , η 2 2 ) ~ GIG ( 1 2 , | α i t | , | η 2 | ) . For η 2 2 , there is:
π ( η 2 2 | y ij , R , β , α i ) ( η 2 2 ) ( n q + g 0 ) 1 exp ( ( h 0 + 1 2 i = 1 n t = 1 q r i t ) η 2 2 )
the posterior distribution of the penalty parameter η 2 2 is:
π ( η 2 2 | y i j , R , β , α i ) ~ G ( n q + g 0 , h 0 + 1 2 i = 1 n t = 1 q r i t )
For random effects αi:
π ( α i | y i j , β , σ , ν , R )   L ( y i j | β , α i , ν , σ ) π ( α i , R | η 2 2 ) exp ( 1 2 k 2 σ i = 1 N j = 1 M ( y i j x i j β z i j α i k 1 ν i j ν i j ) 2 ) i = 1 n t = 1 q exp ( α i 2 2 r i t )
The random effect follows a normal distribution where the mean is Λ and the variance is Γ , denoted as π ( α i | y i j , β , ν , σ , R ) ~ N ( Λ , Γ ) , where Γ = ( Z D Z + Q ) 1 , Λ = Z D ( y i j x i j β k 1 ν i j ) , Q = d i a g ( r i t 1 ) , and D = d i a g ( 1 / k 2 σ ν ij ) .
The posterior density function for vij has:
π ( ν i j | y ij , β , α i , σ , ) ν i j 1 2 exp { 1 2 ( φ i j 2 ν i j 1 + γ i j 2 ν i j ) }
where φ ij 2 = ( y i j x i j β z i j α i ) 2 k 2 σ , γ i j 2 = k 1 2 k 2 σ + 2 σ , π ( ν i j | y i j , β , α i , σ , ) ~ G I G ( 1 2 , φ i j , γ i j ) .
Finally, for σ:
π ( σ | y i j , β , α i , ν i j ) L ( y i j | β , α i , ν i j , σ ) π ( ν i j | σ ) π ( σ )   ~ IG ( κ , ι )
Among them: κ = 3 n m 2 + c 0 , ι = d 0 + 1 2 k 2 i = 1 n j = 1 m [ ( y ij x i j β z i j α ν i j ) 2 + ν i j ] .

2.3. Bayesian Dual Adaptive Lasso Penalized Quantile Regression for Tobit Models

Adaptive LASSO assigns different penalties to different coefficients to improve the estimation accuracy and have an Oracle ability [25]. Under the Lasso penalty studied by Alhamzawi et al. [26], in the mixed-effects model, since both fixed β and random effects αi have dependent conditional parameters η1 and η2, respectively, but the compression coefficients should be different for each of the different β and αi. Therefore, this section proposes a Tobit quantile regression model with a dual adaptive Lasso penalty from a Bayesian perspective. First, assume that the fixed-effects β and the random effects αi both have conditional Laplace priors as follows:
π ( β | σ , λ l ) = l = 1 k λ l 2 σ exp { λ l σ | β l | } , l = 1 , , k
π ( α i | σ , λ q ) = i = 1 n t = 1 q λ q 2 σ exp { λ q σ | α i t | } , t = 1 , , q
Among them, λ l = ( λ 1 , λ 2 , , λ k ) , λ q = ( λ 1 , λ 2 , , λ q ) , let η l = λ l σ , η q = λ q σ , for β and αi. Equations (28) and (29) can be rewritten as:
π ( β | η l ) = l = 1 k η l 2 exp { η l | β l | } , l = 1 , , k
π ( α i | η q ) = i = 1 n t = 1 q η q 2 exp { η q | α i t | } , t = 1 , , q
So, there is:
π ( β , α i | y ij , σ , η l , η q ) exp { 1 σ i = 1 n j = 1 m ρ τ ( y i j max ( N ˜ , min ( M ˜ , x i j β + z ι j α i ) ) ) l = 1 k η l | β l | i = 1 n t = 1 q η q | α i t | }
To maximize the conditional posterior density function of β and αi, also to minimize the Bayesian Tobit quantile regression function of the double adaptive Lasso penalty in Equation (32).
i = 1 n j = 1 m ρ τ ( y i j max ( N ˜ , min ( M ˜ , x i j β + z ι j α i ) ) ) + l = 1 k η l | β l | + i = 1 n t = 1 q η q | α i t |
The double adaptive lasso with oracle properties [27], using Bayesian methods, is theoretically equivalent. Considering that the Laplace prior in Equation (28) has no conjugate prior with Equation (29), an expression for the conditional prior distribution can be written by integrating. The conditional prior distribution of the fixed effects of Equation (30) can be rewritten, again using the integral constants, as:
π ( β , R | η l 2 ) = l = 1 k 1 2 π r l exp ( β l 2 2 r l ) η l 2 2 exp ( η l 2 r l 2 )
By introducing auxiliary variables R = ( r 1 , , r l ) , available β | r l ~ N ( 0 , r l ) , r l | η l 2 ~ E ( η l 2 / 2 ) , assuming the prior distribution of η l 2 is the inverse Gamma distribution η l 2 ~ I G ( e 0 , f 0 ) .
Similarly, for the random effects αi, the conditional prior distribution is:
π ( α i , S | η q 2 ) = i = 1 n t = 1 q 1 2 π s it exp ( α i t 2 2 s i t ) η q 2 2 exp ( η q 2 2 s i t )
among them, by introducing auxiliary variables S = ( s 11 , , s 1 q , , s n q ) . The joint prior of (αi, S) can be viewed as a mixture of the normal and exponential distributions, i.e., α i | s i t ~ N ( 0 , s i t ) , s it | η q 2 ~ E ( η q 2 / 2 ) , and assuming η q 2 ~ G ( g 0 , h 0 ) .
Next, assuming π(σ)~IG (c0, d0), the likelihood function of the model is:
L ( y ij | β , α i , ν , σ ) = ( 2 π k 2 σ ) n m 2 ( i = 1 n j = 1 m ν ij ) 1 2   exp { 1 2 k 2 σ i = 1 n j = 1 m ( y ij μ i j ν i j ) 2 }
In contrast to the dual Lasso Bayesian Tobit quantile regression method, here it is sufficient to change the conditional posterior distribution of R , η l 2 , S , η q 2 . Further derivation is as follows:
For rl, l = 1, …, k, there is:
π ( r l | y ij , β , η l 2 , α i ) r l 1 2 1 exp ( 1 2 ( β l 2 r l 1 + η l 2 r l ) )
The posterior distribution of the mixing parameter rl is π ( r l | y i j , β , η l 2 , α i ) ~ GIG ( 1 2 , | β l | , | η l | ) .
For η l 2 , l = 1, …, k, there is:
π ( η l 2 | y i j , R , β τ , α i τ ) ( η l 2 ) exp ( η l 2 2 r l ) ( η l 2 ) e 0 1 exp ( f 0 η l 2 )
The adaptive penalty parameter η l 2 obeys the inverse Gamma distribution with the shape parameter being 1 + e0 and the scale parameter being (f0 + rl/2), denoted as π ( η l 2 | y i j , R , β , α i ) ~ I G ( 1 + e 0 , f 0 + r l 2 ) . For sit, i = 1, …, n, t = 1, …, q, there is:
π ( s i t | y ij , β , α i , η q 2 ) s q 1 2 1 exp ( 1 2 ( α i t 2 s i t 1 + η q 2 s i t ) )
The mixing parameters sit obey the inverse Gaussian distribution, denoted as:
π ( s i t | y ij , β , α i , η q 2 ) ~ GIG ( 1 2 , | α i t | , | η q | )
For η q 2 , q = 1, …k, there is:
π ( η q 2 | y ij , S , β , α i ) ( η q 2 ) ( n q + g 0 ) 1 exp ( ( h 0 + 1 2 i = 1 n t = 1 q s i t ) η q 2 )
the posterior distribution of the penalty parameter η q 2 is:
π ( η q 2 | y i j , S , β , α i ) ~ G ( n q + g 0 , h 0 + 1 2 i = 1 n t = 1 q s i t )
Bayesian approaches, which utilize prior distributions of regression coefficients and regularization parameters, allow for a Bayesian treatment of the adaptive Lasso that quantifies uncertainty by introducing prior knowledge and providing posterior distributions [28].

2.4. Gibbs Sampling Algorithm for Parameter Estimation and Variable Selection

2.4.1. Gibbs Sampling Algorithm for DL-BTQR

The Gibbs sampling algorithm for the dual Lasso Bayesian Tobit quantile regression method (DL-BTQR) is as follows.
(1)
Given the initial value α(0), β(0), σ(0), from truncated normal distributions π ( y i j | y i j , β , α i , ν , σ ) to generate unobserved latent variables y i j ;
(2)
From conditional posterior distribution π ( ν i j | y i j , β , α i , σ ) ~ GIG ( 1 2 , φ i j , γ i j ) to generate νij;
(3)
From conditional posterior distribution to generate σ;
(4)
From conditional posterior distribution π ( s l | y i j , β , η 1 2 , α i ) ~ GIG ( 1 2 , | β l | , | η 1 | ) to generate sl;
(5)
From conditional posterior distribution π ( η 1 2 | y i j , R , β , α i ) ~ G ( k + e 0 , f 0 + l = 1 k s l 2 ) to generate η 1 2 ;
(6)
From conditional posterior distribution π ( β | y i j , α i , ν , σ , S ) ~ N ( Θ , Δ ) to update the fixed effects coefficient β;
(7)
From conditional posterior distribution π ( r i t | y ij , α i , η 2 2 ) ~ GIG ( 1 2 , | α i t | , | η 2 | ) to generate rit;
(8)
From conditional posterior distribution π ( η 2 2 | y i j , R , β , α i ) ~ G ( n q + g 0 , h 0 + 1 2 i = 1 n t = 1 q r i t ) to generate η 2 2 ;
(9)
From conditional posterior distribution π ( α i | y i j , β , ν , σ , R ) ~ N ( Λ , Γ ) to update the random effect coefficients αi;
Repeat (2)–(9) until convergence.

2.4.2. Gibbs Sampling Algorithm for DAL-BTQR

The main advantage of the Bayesian double-penalized adaptive lasso with Gibbs sampling algorithm does include the fact that it does not require consistent initial estimates of regression coefficients. In high-dimensional data, the number of features may be much larger than the number of samples, causing traditional regression methods to fail. The Bayesian adaptive lasso combined with the Gibbs sampling algorithm, on the other hand, is able to handle such high-dimensional situations without the need for consistent initial estimates by introducing prior distributions and posterior inferences that give parameter values directly through sampling, allowing for efficient variable selection and parameter estimation. The Gibbs sampling algorithm for the dual adaptive Lasso Bayesian Tobit quantile regression method (DAL-BTQR) is as follows.
(1)
Given the initial value α0, β0, τ, σ;
(2)
From conditional posterior distribution π ( ν i j | y i j , β , α i , σ ) ~ G I G ( 1 2 , φ i j , γ i j ) to generate νij; from truncated normal distributions π ( y i j | y i j , β , α i , ν , σ ) to generate unobserved latent variables y i j ;
(3)
From conditional posterior distribution π ( σ | y i j , β , α i , ν i j ) ~ IG ( κ , ι ) to generate σ;
(4)
From conditional posterior distribution π ( r l | y i j , β , η l 2 , α i ) ~ GIG ( 1 2 , | β l | , | η l | ) to generate rl;
(5)
From conditional posterior distribution π ( η l 2 | y i j , R , β , α i ) ~ I G ( 1 + e 0 , f 0 + r l 2 ) to generate η l 2 ;
(6)
From conditional posterior distribution π ( β | y i j , α i , ν , σ , R ) ~ N ( Θ , Δ ) to update the fixed effects coefficient β;
(7)
From conditional posterior distribution π ( s i t | y i j , β , α i , η q 2 ) ~ GIG ( 1 2 , | α i t | , | η q | ) to generate sit;
(8)
From conditional posterior distribution π ( η q 2 | y i j , S , β , α i ) ~ G ( n q + g 0 , h 0 + 1 2 i = 1 n t = 1 q s i t ) to generate η q 2 ;
(9)
From conditional posterior distribution π ( α i | y i j , β , ν , σ , S ) ~ N ( Λ , Γ ) to update the random effect coefficients αi;
Repeat (2)–(9) until convergence.

3. Comparative Analysis of Monte Carlo Simulations

The analog data is provided by:
{ y i j = x i j β + z i j α i + ε i j y i j = y i j I (   N ˜ y i j M ˜ ) + M ˜ I ( y i j > M ˜ ) +   N ˜ I ( y i j <   N ˜ )
among them, x i j = ( x i j 1 , x i j 2 , x i j 3 , x i j 4 , x i j 5 , x i j 6 , x i j 7 , x i j 8 ) of any x follows a standard normal distribution. In some practical applications, there may indeed be strong correlations between neighboring variables and weak correlations between variables in more distant locations. Choosing ρ = 0.5 may be a reasonable approximation. The correlation coefficient between any two explanatory variables xl and xk is ρ | l k | = 0.5 , let N ˜ = 0 and M ˜ = 6 . Take two sets of explanatory variable coefficients as sparse longitudinal data: β = ( 1 , 0 , 0 , 1 , 0 , 0 , 0 , 0 ) , dense longitudinal data: β = ( 1 , 1 , 0.5 , 0.5 , 0.5 , 0.5 , 0.5 , 0.5 ) , assuming that α i = ( α i 1 , α i 2 , α i 3 ) ~ i . i . d . N 3 ( 0 , D ) , D = d i a g ( 1 , 1 , 0 ) , z i j = ( 1 , x i j 1 , x i j 2 ) and ε i j ~ N ( 0 , 1 ) . Each simulation is performed 100 times, with the following weak prior information during the simulation process: σ ~ I G ( 10 6 , 10 6 ) , η 1 2 ~ I G ( 10 6 , 10 6 ) , η 2 2 ~ I G ( 10 6 , 10 6 ) , η l 2 ~ I G ( 10 6 , 10 6 ) , η q 2 ~ I G ( 10 6 , 10 6 ) .
This section simulates first the estimation results of the three methods of unpenalized Bayesian Tobit quantile regression (P-BTQR) [19], double Lasso penalized Bayesian Tobit quantile regression (PDL-BTQR) [21], and double adaptive Lasso penalized Bayesian Tobit quantile regression (PDAL-BTQR) [17] for the interval-censored data under different quantile points, and conducts two explanatory variable coefficients; then changing the censoring ratio and conducting a comparative study of the coefficients of the two explanatory variables under the three methods; and finally changing the distribution conditions of the random errors for the simulation of parameter estimation. In order to evaluate the accuracy of model estimation, mean square error (MSE) is still selected as the evaluation index in this section. Its mean value indicates the mean value of MSE and the coefficients of the explanatory variables under 100 simulations, and the standard deviation is the standard deviation of MSE and the coefficients of the explanatory variables under 100 simulations, and the confidence level of the confidence interval of each explanatory variable is 95%.

3.1. Comparative Analysis of Simulation Results at Different Quartiles

By maintaining ρ = 0.5 , D = d i a g ( 1 , 1 , 0 ) , ε i j ~ N ( 0 , 1 ) constant, this section investigates the estimation results of sparse longitudinal data and dense longitudinal data under different quartiles and methods, which contain the mean and standard deviation of MSE with the estimated coefficients of each explanatory variable and their corresponding interval estimates. The results are presented in Table 1 and Table 2.

3.1.1. Simulation Results under Different Quartiles of Sparse Longitudinal Data

In the mixed-effects model, the double-penalized quantile regression method differs from existing studies in that both fixed effects and random effects have corresponding compression coefficients, thus eliminating irrelevant variables to a greater extent. In addition, this paper breaks with the existing literature by considering the estimation and selection of variables under the influence of different random effects only.
According to Figure 1, Figure 2 and Figure 3, the discussion is developed for sparse longitudinal data, and the interval censoring range is [0, 6]. At the lower quantile τ = 0.25, the lowest estimation error and fluctuations are observed from the MSE mean, and standard deviation metrics for the double adaptive Lasso penalty, i.e., the PDAL-BTQR method has the best estimation, and the redundant variables are almost all compressed to 0. At the middle quantile τ = 0.5, the MSEs mean obtained by the P-BTQR, PDL-BTQR, and PDAL-BTQR methods. The standard deviation of MSE for the P-BTQR method without penalty is two times that of the PDL-BTQR method and nearly three times that of the PDAL-BTQR method, which fully demonstrates the superiority of the two-penalty method for parameter estimation. At the middle and high quartiles τ = 0.75, the mean MSEs of the PDL-BTQR method and the PDAL-BTQR method are not significantly different, and the former is slightly higher than the latter, but both are smaller than the mean MSEs of the P-BTQR method. This means that both double-penalized methods can obtain accurate estimation results. At the high quantile τ = 0.95, the MSE means of the three methods are significantly higher, but the PDAL-BTQR method obtains the smallest MSE mean, standard deviation, and more accurate range of interval estimation in 100 simulations. In addition, in the model setting, assuming the explanatory variable coefficients β1 and β2 are disturbed by random effects, especially at the extreme quantile with the fluctuations of with are more obvious, indicating that their disturbances have the greatest impact on the estimation at the extreme quantile τ = 0.25 and τ = 0.95.

3.1.2. Simulation Results under Different Quartiles of Dense Longitudinal Data

According to Figure 4, Figure 5 and Figure 6, under the condition of dense longitudinal data, the estimation effect of the PDAL-BTQR method is better than that of the PDL-BTQR and P-BTQR methods at the lower quartile τ = 0.25, with the smallest MSE mean value of 0.062. At the middle quartile τ = 0.5 and the MSE mean value of the PDL-BTQR method is slightly lower than that of the PDAL-BTQR and P-BTQR methods. At the point where τ = 0.75, the MSE means of the double-punishment PDAL-BTQR and PDL-BTQR methods were smaller than those of the P-BTQR unpunished method, so the parameter estimation and variable selection of the unpunished method under this condition was not as good as those of the double-punished method. In terms of the mean and standard deviation of MSE, the estimation effects of the PDAL-BTQR method and the PDL-BTQR method were not significantly different and were similar to the results obtained in Table 1. At the high quantile, τ = 0.95, the mean square error of all three methods becomes relatively large, and the mean MSEs of the P-BTQR and PDL-BTQR methods are 0.245 and 0.201, respectively, while the mean MSE of the PDAL-BTQR method is 0.193, indicating that it has the best estimation effect of the dual adaptive Lasso at the high quantile.
In summary, it can be concluded that the double-penalized Bayesian Tobit quantile regression method can obtain more accurate parameter estimates for two different sets of explanatory variable coefficients at both low and high quantile points, and its performance is more advantageous than that of the no-penalty method, although the performance is comparable at the middle quantile point, but the accuracy of the double-penalized method is higher at the extreme quantile point.

3.2. Comparative Analysis of Simulation Results under Different Censoring Ratios

The estimation results of each method under different censoring ratios were compared by setting the censoring ratios to 10%, 20%, and 40%, respectively, keeping ρ = 0.5 , ε i j ~ N ( 0 , 1 ) , D 1 = d i a g ( 1 , 1 , 0 ) and taking N ˜ = 0 and M ˜ = 3 as constant. Since the estimation results of each quantile are similar, Table 3 and Table 4 only show the simulation results under 0.5 quantile.

3.2.1. Simulation Results under Different Censoring Ratios of Sparse Longitudinal Data

According to Figure 7, Figure 8 and Figure 9, in the sparse longitudinal data model, with increasing censoring ratios, the mean MSE values for the P-BTQR method were 0.029, 0.030, and 0.033 for the three conditions comparing censoring ratios of 10%, 20%, and 30%, respectively; the mean MSE values for the PDL-BTQR method were 0.026, 0.027, and 0.032, respectively; and the mean MSE values for the PDAL-BTQR method were 0.025, 0.026, and 0.031, respectively. The mean MSE values corresponding to the three methods are reduced in order, because the Lasso quantile regression, compared to quantile regression, imposes a Lasso penalty on each explanatory variable, which can improve the speed of model calculation and reduce the bias of parameter estimation, and the results obtained from the simulation of the PDL-BTQR method are more accurate than those of the P-BTQR method, which is more accurate than the PDL-BTQR method. The adaptive Lasso penalty function breaks through this limitation. From the simulation results, the PDAL-BTQR method is more effective than the PDL-BTQR and P-BTQR methods for parameter estimation and variable selection under different censoring ratios.

3.2.2. Simulation Results under Different Censoring Ratios for Dense Longitudinal Data

According to Figure 10, Figure 11 and Figure 12, for dense longitudinal data, the mean MSEs of the P-BTQR, PDL-BTQR, and PDAL-BTQR methods are 0.042, 0.040, and 0.041, respectively, at a 10% censoring ratio, and the estimation method with a double Lasso penalty is more accurate. As the censoring ratio increases to 20%, the mean MSE of the P-BTQR method becomes larger while the mean MSEs of the PDL-BTQR and PDAL-BTQR methods decrease, which also indicates that the parameter estimation performance of the dual-penalty method is significantly better than that of the P-BTQR method. When the censoring ratio is further increased to 30%, the MSEs of all three methods increase, but the MSEs of the double-penalized PDL-BTQR and PDAL-BTQR methods are 0.046 and 0.047, respectively, which are still smaller than the MSE of the P-BTQR method of 0.048, indicating that the MSEs of the fixed-effects and random effects coefficients in the interval-censored mixed-effects model with double penalties are still smaller than the MSEs of the P-BTQR method. The estimation method for estimating the fixed-effects and random effects coefficients in the interval-censored mixed-effects model yields more accurate estimates of the model parameters. The PDL-BTQR method outperforms the other two methods for dense longitudinal data under the condition that the censoring ratio becomes larger, and the PDAL-BTQR method has the best estimation for sparse longitudinal data.
The MSE mean value increases correspondingly with the increase in the censoring proportion, implying that the estimation accuracy of fixed effects is decreasing, especially β1 and β2, which is the result of assuming the previous two variables subject are to random effects in this section during the simulation. Combining the simulation results of the coefficients of the two sets of explanatory variables, the PDL-BTQR method and the PDAL-BTQR method have better estimation results, and the advantages of their variable selection and estimation are more prominent than those of the P-BTQR method.

3.3. Comparative Analysis of Simulation Results under Different Random Error Distributions

Keeping ρ = 0.5 and D = d i a g ( 1 , 1 , 0 ) and taking N ˜ = 0 and M ˜ = 6 as constant, this paper will consider the variable selection and estimation results of three methods, P-BTQR, PDL-BTQR, and PDAL-BTQR, simulated under random errors obeying a standard normal distribution, a t(3) distribution, and an ALD(0,0.5,1) distribution, respectively. The advantage of the Bayesian-based framework is that the unknown parameters can be viewed as obeying a certain prior conditional distribution. After the prior conditional distributions of the different parameters to be estimated are given, the Gibbs sampling algorithm is used for parameter estimation and variable selection, and the estimation results under the 0.5 quantile are shown in Table 5 and Table 6.

3.3.1. Simulation Results under Different Random Error Distributions for Sparse Longitudinal Data

According to Figure 13, Figure 14 and Figure 15, from the estimation results of sparse longitudinal data with different random error distributions, when the random errors obey the standard normal distribution, the mean values of MSE estimated by the three methods P-BTQR, PDL-BTQR, and PDAL-BTQR are 0.054, 0.041, and 0.039, respectively, and all three methods can obtain the parameter estimation results with less deviation, but the PDL-BTQR method performs better than the P-BTQR method. Comparing the estimation results of the two double-penalty methods, the PDAL-BTQR method is superior, and its corresponding MSE standard deviation is the smallest at 0.052, indicating that the fluctuation of the 100 simulation results is smaller and its effect of obtaining accurate estimation is more stable. When the random errors obey the t(3) distribution, the mean MSE of the unpunished P-BTQR method increases significantly to 0.077, indicating that the parameter estimation results obtained by the P-BTQR method under the t(3) distribution are not as accurate as those under the standard normal distribution, while the corresponding mean MSEs of the PDL-BTQR method and the PDAL-BTQR method are 0.057 and 0.056, respectively, at this time. The difference between the two is not significant, which fully indicates that the estimation effect of the double-penalty method has obvious advantages. When the random errors obey the ALD(0,0.5,1) distribution, the estimation effects of the PDL-BTQR method and the PDAL-BTQR method are almost the same, and the mean MSE values are lower than those of the P-BTQR method. From the simulation results, the PDL-BTQR method and the PDAL-BTQR method perform better than the P-BTQR method regardless of the change in the random error.

3.3.2. Simulation Results under Different Random Error Distributions for Dense Longitudinal Data

According to Figure 16, Figure 17 and Figure 18, in the dense longitudinal data model, when the random errors obey the standard normal distribution, the MSEs obtained by the P-BTQR method and the PDAL-BTQR method have the same mean values, and both can obtain more accurate estimation results, and the PDL-BTQR method is better in comparison. When the random errors obey the t(3) distribution, similar to the results in Table 5, the mean MSE values of the three methods also increase significantly, but the P-BTQR method obtains the highest mean MSE value of 0.074, indicating that the estimation effect of the no-penalty method is poor, while the mean MSE values of the PDL-BTQR method and the PDAL-BTQR method are 0.065 and 0.068, respectively, indicating that the double-penalty method yields less biased parameter estimates. For the sparse longitudinal data, the mean MSEs of the PDL-BTQR and PDAL-BTQR methods are lower than those of the dense longitudinal data under the t(3) distribution, indicating that the dual-penalty method is more advantageous in handling the sparse longitudinal data. When the random errors obey the ALD(0,0.5,1) distribution, the mean MSE values obtained by the P-BTQR, PDL-BTQR, and PDAL-BTQR methods are 0.046, 0.043, and 0.044, respectively, and the two-penalty methods continue to perform better than the no-penalty methods. In addition, the simulation results show that the PDL-BTQR method has the best parameter estimation effect for processing dense longitudinal data under different random error distribution conditions.
In summary, the estimation error of the two-penalty Bayesian Tobit quantile regression method is the smallest regardless of whether the random errors obey the standard normal distribution or the t(3) distribution and the ALD(0,0.5,1) distribution. For different types of longitudinal data, the PDL-BTQR method and the PDAL-BTQR method both yield better parameter estimation and variable selection results for mixed-effects models with censored response variable intervals.

3.4. Time Consumption for the Methods

One important topic in modeling analysis is about the time required for computation. Although, with the advance in computer technology, the existing computational speed for much ordinary data can be handled comfortably, with the increasing requirements for model accuracy and the emergence of high-dimensional massive and complex data, computation time consumption is an issue of extreme concern even for the most advanced computers. The double-penalized Bayesian quantile regression method proposed in this study also involves large-scale operations. Below we use the sparse longitudinal data mixed-effects model from Section 3 to provide a demonstration of the various methods proposed in this thesis in terms of computing time. These methods include:
(1)
Unpenalized Bayesian Tobit quantile regression for interval-censored data (P-BTQR);
(2)
Single-Lasso penalized Bayesian Tobit quantile regression for interval-censored data (PL-BTQR);
(3)
Single-Adaptive Lasso penalized Bayesian Tobit quantile regression for interval-censored data (PAL-BTQR);
(4)
Double-Lasso penalized Bayesian Tobit quantile regression for interval-censored data (PDL-BTQR);
(5)
Double-Adaptive Lasso penalized Bayesian Tobit quantile regression for interval-censored data (PDAL-BTQR)
The prior settings in the Bayesian approach and the parameter settings in the double-penalized quantile regression are the same as in the previous simulations:
The number of iterations for all Bayesian methods was 20,000. The computer configuration is: Intel(R) Core (TM) 2 Duo CPU, 2.10 GHz, 2 G RAM; the running platform software is R software version 4.4.2, and the Bayesian methods all use the BUGS 1.4.
Table 7 gives the average user time, system time and elapsed time of the above methods in 50 repetitions of simulation, all in seconds. Since, in the Bayesian method, we call the BUGS software, the user time and system time do not include the real sampling time of the calculation, so here we compare the total running time as more appropriate.
The double-penalized Bayesian method is slightly shorter in running time than the single-penalized method and it offers greater advantages in practical applications. Since the dual-penalty method is able to consider the effects of both fixed and random effects and penalize them appropriately, it can provide more accurate parameter estimates and more reliable prediction results. In addition, the dual-penalty method is capable of automatic variable selection and parameter compression, which further improves the generalization ability and interpretability of the model.
In summary, the dual-penalty method provides more accurate parameter estimates and more reliable prediction results while maintaining a similar runtime as the single-penalty method. This makes the double-penalized Bayesian Tobit quantile regression method an attractive option, especially when dealing with complex data and constructing high-precision models.

4. Interprovincial Longitudinal Crime Rate Data Analysis

The Bayesian double-penalty-based longitudinal interval-censored data quantile regression method studied in this paper may be more suitable for the case where the dependent variables are not categorical variables data, with higher model accuracy and better variable selection and model estimation in continuous-type data.
Currently, many scholars at home and abroad have conducted more in-depth empirical studies on the relationship between crime rates and some conventional economic indicators, such as regional income disparity, urbanization, and unemployment rate, which are important reasons for the rise of crime rate indicators [29]. Based on the study of Monte Carlo simulation analysis in the previous section, this section will discuss the relationship between crime rates and economic indicator data for 31 provinces across the country from 2010 to 2016 using the two new methods, PDL-BTQR and PDAL-BTQR, which contain 1302 observations; 31 provinces across the country were classified into eastern, central and western regions. The conclusion found that crime rates were higher in the eastern and western regions than in the central region, suggesting a correlation with regional income disparities at this stage of the country’s history.
Since the crime rate is expressed by the number of crimes with approved arrests per 10,000 people, using it as a response variable will be limited by the left-hand side being greater than 0. Secondly, in this section, we will select the top 10 regions with high crime rates in the eastern and western regions, calculate the average crime rate of these 10 regions, and use the average crime rate as the upper limit of the response variable to obtain the upper limit of the response variable of 9.35. Therefore, the crime rate is a set of response variable bilaterally constrained data, i.e., between [0, 9.35], the censored rate is 12.9%.
According to previous studies, scholars have studied the main causes of rising crime rates from several perspectives, including total economy, urban population, education, wealth gap, and employment. Therefore, this section identifies the explanatory variables as: gross per capita product, urbanization rate, regional income gap, education level, and unemployment rate. Among them, the crime rate data are obtained from the Chinese Prosecution Yearbook and the Chinese Law Yearbook, and the economic indicators are obtained from the Chinese Statistical Yearbook. The specific variable definitions and descriptive statistics are shown in Table 8. To provide insight into the relationship between the variables in the dataset, analysis of correlation coefficients and covariances was introduced and the results are displayed in Figure 19 and Figure 20.
Utilizing equation { y i j = x i j β + z i j α i + ε i j y i j = y i j I (   N ˜ y i j M ˜ ) + M ˜ I ( y i j > M ˜ ) +   N ˜ I ( y i j <   N ˜ ) for the interval-censored model; among these, the response variable yij denotes the value of the crime rate of the i-th province in the j-th year, i = 31 and j = 7 ; y i j is a latent variable.   x i j = ( 1 ,   x 1 i j ,   x 2 i j ,   x 3 i j ,   x 4 i j ,   x 5 i j ) for the intercept distance term and five explanatory variables; among them x 1 i j , x 2 i j , x 3 i j , x 4 i j , x 5 i j are the observed values of the relevant explanatory variables for the i-th province in the j-th year, respectively. β = (β0, β0, …, β0)′ are the coefficients of each explanatory variable; αi is the random effects coefficient; z i j denotes an explanatory variable that produces a random effect and z i j x i j , assuming that z i j = x i j .
We are interested in the degree of influence of each explanatory variable on the response variable at different quantile points, and Table 9 shows the estimation results of both the PDL-BTQR and PDAL-BTQR methods at each quantile point.
Table 8 shows that GDP per capita, education level, and unemployment rate are inversely related to crime rate at each quantile, i.e., as the regional GDP per capita, education level, and unemployment rate increase, they effectively suppress the increase in crime rate, especially the education level, whose estimated coefficients at each quantile are larger in absolute value than GDP per capita and unemployment rate. The urbanization rate and regional income disparity act as positive shocks to crime rates, i.e., an increase in the urbanization rate and an increase in income disparity both lead to an increase in crime rates, and both estimation methods indicate that the urbanization rate reaches a maximum at the 0.5 quantile and the income disparity has the greatest impact at the 0.7 quantile. A high urbanization rate responds to a certain extent to the high mobility of the mobile population, which is more likely to breed crime, while an increase in regional income disparity and a widening gap between rich and poor in the region can easily cause class conflicts, trigger social unrest, and generate delinquent behavior. In the real data, the estimation of the model and variable selection in this paper are performed simultaneously, and the penalty part enables automatic variable selection.

5. Discussion

In this paper, considering the situation that the response variable is restricted by the bilateral limit, we construct a double-penalized Bayesian Tobit quantile regression model for interval-censored data, add the penalty function to the fixed-effect and random-effect coefficients at the same time, make parameter estimation and variable selection of the interval-censored mixed-effects model, and obtain the estimation results of the two sets of longitudinal data in a Monte Carlo simulation under different estimation methods, different censoring ratios, and different random error distributions, and use the new method to analyze and discuss the correlation between crime rate and various economic indicators in China. The Monte Carlo simulation is used to obtain the estimation results of the two sets of longitudinal data under different estimation methods, different censoring ratios, and different random error distributions.
In the mixed-effects model with censored data, the general Tobit quantile regression method cannot obtain effective estimation of the parameters [1]. On the one hand, due to the random effects added to the mixed-effects model on the basis of the general linear model [30], there are a large number of unknown parameters and the distribution of random errors is unknown, and the random errors under different distributions will increase the complexity of the model computation, which will bring great difficulties to the model parameter estimation; on the other hand, due to the restricted response variable generating a large number of censored data, the mixed-effects model contains latent variables that make the Markov Chain of the parameter estimation Monte Carlo (MCMC) sampling algorithm for parameter estimation extremely complex, resulting in low computational efficiency and a large bias in the estimation results. In recent years, parameter estimation and variable selection based on the idea of a penalty function under the Bayesian framework is one of the hot topics of academic discussion [1]. Therefore, on the basis of existing research, in order to solve the above problems, this paper is devoted to constructing a Bayesian double-penalty Tobit quantile regression model for censored data, so as to provide a new way of thinking for the parameter estimation and variable selection methods of censored mixed-effects models [31].
For interval-censored data, Richard Cox proposed the Cox PH model in 1972 [32], which is mainly used to study the relationship between multiple independent variables and the dependent variable (survival time) and can handle censored data, which is highly practical. Cox proportional risk model may not be the best choice when the data are truncated and an explicit concept of survival time does not exist. This paper discusses the parameter estimation and variable selection under the condition that the response variable is subject to bilateral restrictions at the same time. Due to the characteristics of boundedness and bias of interval-censored data [33], the estimation results of simple regression methods cannot effectively screen important variables and exclude redundant variables [34]. The main reason is that the fitted and estimated values obtained by traditional regression methods may exceed the upper and lower bounds of the response variables, and the model interpretation is relatively weak [35]. The Tobit quantile regression method provides a new way of parameter estimation for the mixed-effects model with interval-censored response variables [36]. Therefore, this paper firstly combines the Bayesian method and constructs the Bayesian empirical likelihood function under interval-censored data [37]. Secondly, the penalty function is introduced, and a more efficient Gibbs sampling algorithm is constructed using the truncated normal distribution of the asymmetric Laplace prior part [38]. Finally, Monte Carlo simulation experiments and real data analysis are carried out, which fully illustrate the advantages of Bayesian double-penalty Tobit quantile regression model such as high estimation efficiency and robustness.
However, although this paper proposes the double-penalized Bayesian Tobit quantile regression method for the mixed-effects model with censored data and constructs the Gibbs sampling algorithm for parameter estimation, and the simulation results confirm that the estimation effect of the new method is better than that of the traditional method, there are still some shortcomings. This paper only analyzes the Lasso penalty and adaptive Lasso penalty for the commonly used variable selection methods, and subsequently can use SCAD, elastic net, adaptive elastic net, and other penalty methods for parameter estimation and variable selection; this paper centers on the study of linear models and subsequently can construct a Bayesian Tobit quantile regression model for the censored data under the nonlinear model and explore the nonlinear model’s variable selection problem.

6. Conclusions

This paper proposes a Bayesian double-penalized Tobit quantile regression method for interval-censored data in mixed-effects models. The method compresses the fixed and random effects parameters using an unconditional Laplace prior to improve the estimation accuracy. The posterior distributions are derived from a mixture of truncated normal distributions and a Gibbs sampling algorithm is constructed for parameter estimation. Both simulation and real data analysis show that the method outperforms traditional methods in parameter estimation and variable selection, and is particularly suitable for dealing with censored data.
(1)
Significantly improved model accuracy and efficiency
In complex mixed-effects models, the double-penalty approach of PDL-BTQR and PDAL-BTQR effectively reduces the estimation error of the model and improves the accuracy of parameter estimation by compressing the random effects coefficients. This approach significantly improves the predictive power and interpretability of the model by simultaneous parameter estimation and variable selection when dealing with longitudinal data, regardless of whether the data are sparse or dense. In addition, this dual-penalty strategy helps to identify and exclude redundant variables, thus further optimizing the model structure.
(2)
Demonstrated robustness in handling complex and variable datasets
In practical applications, data often have different censoring ratios and complex random error distributions. In this case, the dual-penalty method shows good robustness. Whether facing high censoring ratios or different random error distributions, the dual-penalty approach provides stable parameter estimation and accurate variable selection. In particular, the PDL-BTQR method excels when dealing with dense longitudinal data, while the PDAL-BTQR method is even better when dealing with sparse longitudinal data. This robustness makes the dual-penalty method widely applicable and flexible in practical applications.
(3)
New and effective tool for dealing with interval-censored data
In statistics and data analysis, interval-censored data is a common and complex data type. Traditional treatments often make it difficult to accurately estimate parameters and make effective variable selection. However, the model proposed in this study is particularly suitable for dealing with interval-censored data, and its superiority in parameter estimation and variable selection is verified by setting a bilateral truncation of the response variable and conducting a simulation study, and the model is able to realize automatic variable selection. A new effective method is provided for dealing with data with censored characteristics.

Author Contributions

Conceptualization, K.Z. and T.S.; methodology, T.S.; software, K.Z. and T.S.; validation, K.Z., Y.L. and C.H.; formal analysis, K.Z., T.S. and Y.L.; investigation, K.Z. and T.S.; resources, K.Z.; data curation, T.S.; writing—original draft preparation, T.S. and Y.L.; writing—review and editing, K.Z., Y.L. and C.H.; visualization, K.Z.; supervision, Y.L.; project administration, C.H.; funding acquisition, Y.L. and C.H. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China, grant number 11701161; the National Social Science Fund of China, grant number 17BJY210; the Key Humanities and Social Science Fund of the Hubei Provincial Department of Education, grant number 20D043; and the Humanities and Social Science Fund of the Hubei Provincial Department of Education, grant number 22Y059.

Data Availability Statement

The data will be made available by the authors on request.

Acknowledgments

We would like to sincerely thank the editor-in-chief, the editor, and the anonymous reviewers for their useful feedback and suggestions.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Song, X.K.; Ming, T. Marginal Models for Longitudinal Continuous Proportional Data. Biometrics 2000, 56, 496–502. [Google Scholar] [CrossRef] [PubMed]
  2. Ferrari, S.; Cribari-Neto, F. Beta Regression or Modeling Rates and Proportions. J. Appl. Stat. 2004, 31, 799–815. [Google Scholar] [CrossRef]
  3. Lesaffre, E.; Rizopoulos, D.; Tsonaka, R. The logistic transform for bounded outcomes scores. Biostatistics 2007, 8, 72–85. [Google Scholar] [CrossRef] [PubMed]
  4. Espinheira, P.L.; Ferrari, S.L.P.; Cribari-Neto, F. Infuence diagnostics in beta regression. Comput. Stat. Data Anal. 2008, 52, 4417–4431. [Google Scholar] [CrossRef]
  5. Zhao, W.; Zhang, R.; Lv, Y.; Liu, J. Variable selection for varying dispersion beta regression model. J. Appl. Stat. 2014, 41, 95–108. [Google Scholar] [CrossRef]
  6. Ying, Z.L.; Yu, W.; Zhao, Z.Q.; Zheng, M. Regression Analysis of Doubly Truncated Data. J. Am. Stat. Assoc. 2019, 115, 810–821. [Google Scholar] [CrossRef] [PubMed]
  7. Tobin, J. Estimation of relationships for limited dependent variables. Econometrica 1958, 26, 24–36. [Google Scholar] [CrossRef]
  8. Cunha Danúbia, R.; Angelo, J.D.; Helton, S. On a log-symmetric quantile Tobit model applied to female labor supply data. J. Appl. Stat. 2022, 49, 4225–4253. [Google Scholar] [CrossRef]
  9. Powell, J.L. Censored regression quantiles. J. Econom. 1986, 32, 143–155. [Google Scholar] [CrossRef]
  10. Frumento, P. A quantile regression estimator for interval-censored data. Int. J. Biostat. 2023, 19, 81–96. [Google Scholar] [CrossRef]
  11. Li, L.; Hao, R.; Yang, X. Data Augmentation Based Quantile Regression Estimation for Censored Partially Linear Additive Model. Comput. Econ. 2023, 1–30. [Google Scholar] [CrossRef]
  12. Hao, R.; Weng, C.; Liu XYang, X. Data augmentation based estimation for the censored quantile regression neural network model. Expert Syst. Appl. 2023, 214, 119097. [Google Scholar] [CrossRef]
  13. Yu, R.; Long, X.; Quddus, M.; Wang, J.H. A Bayesian Tobit quantile regression approach for naturalistic longitudinal driving capability assessment. Accid. Anal. Prev. 2020, 147, 105779. [Google Scholar] [CrossRef] [PubMed]
  14. Tibshirani, R.J. Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Ser. B Stat. Methodol. 1996, 58, 267–288. [Google Scholar] [CrossRef]
  15. Fan, J.; Li, R. Variable Selection via Non-concave Penalized Likelihood and Its Oracle Properties. J. Am. Stat. Assoc. 2001, 96, 1348–1360. [Google Scholar] [CrossRef]
  16. Kim, S.M.; Lee, S.B.; Lee, S.H.; Kim, W. Robust estimation of outage costs in South Korea using a machine learning technique: Bayesian Tobit quantile regression. Appl. Energy 2020, 278, 115702. [Google Scholar] [CrossRef]
  17. Alhamzawi, R.; Keming, Y.; Dries, F.B. Bayesian adaptive Lasso quantile regression. Stat. Model. 2012, 12, 279–297. [Google Scholar] [CrossRef]
  18. Alhamzawi, R. Bayesian Elastic Net Tobit Quantile Regression. Commun. Stat.—Simul. Comput. 2016, 45, 2409–2427. [Google Scholar] [CrossRef]
  19. Alhusseini, F. New Bayesian Lasso in Tobit Quantile Regression. Rom. Statal Rev. Suppl. 2017, 65, 213–229. [Google Scholar]
  20. Alhamzawi, R.; Ali, M.T.H. Bayesian tobit quantile regression with penalty. Commun. Stat.—Simul. Comput. 2018, 47, 1739–1750. [Google Scholar] [CrossRef]
  21. Abbas, H.K. Bayesian Lasso Tobit regression. J. Al-Qadisiyah Comput. Sci. Math. 2019, 11, 1–13. [Google Scholar] [CrossRef]
  22. Kottas, A.; Krnjaji, M. Bayesian Semiparametric Modelling in Quantile Regression. Scand. J. Stat. 2009, 36, 297–319. [Google Scholar] [CrossRef]
  23. Narjes, G.; Reza, P. The likelihood and Bayesian analyses for asymmetric Laplace nonlinear regression model. Comput. Appl. Math. 2024, 43, 21. [Google Scholar]
  24. Mallows, D.F.; Andrews, L. Scale Mixtures of Normal Distributions. J. R. Stat. Soc. Ser. B Methodol. 1974, 36, 99–102. [Google Scholar]
  25. Zou, H. The Adaptive Lasso and Its Oracle Properties. J. Am. Stat. Assoc. 2006, 101, 1418–1429. [Google Scholar] [CrossRef]
  26. Alhamzawi, R.; Yu, K. Bayesian Lasso-mixed Quantile Regression. J. Stat. Comput. Simul. 2012, 84, 868–880. [Google Scholar] [CrossRef]
  27. Luo, Y.X.; Li, H.F. The Research of Double Adaptive Lasso Quantile Regression Model with Random Effects. J. Quant. Technol. Econ. 2017, 34, 136–148. [Google Scholar]
  28. Alhamzawi, R.; Ali, H.T.M. The Bayesian adaptive lasso regression. Math. Biosci. 2018, 303, 75–82. [Google Scholar] [CrossRef]
  29. Bhattacharya, A. Analysis of the Factors Affecting Violent Crime Rates in the US. Int. J. Eng. Manag. Res. 2020, 10, 106–109. [Google Scholar] [CrossRef]
  30. Shen, P.S. Median regression model with left truncated and interval-censored data. J. Korean Stat. Soc. 2013, 42, 469–479. [Google Scholar] [CrossRef]
  31. Zhou, X.; Feng YDu, X. Quantile regression for interval censored data. Commun. Stat.-Theory Methods 2017, 46, 3848–3863. [Google Scholar] [CrossRef]
  32. Cox David, R. Regression models and life-tables. J. R. Stat. Soc. Ser. B 1972, 34, 187–202. [Google Scholar]
  33. Angelov, A.G.; Magnus, E.; Klarizze, P.; Arcenas, A.; Bengt, K. Quantile regression with interval-censored data in questionnaire-based studies. Comput. Stat. 2022, 39, 583–603. [Google Scholar] [CrossRef]
  34. Dengluan, D.; Anmin, T.; Jinli, Y. High-Dimensional Variable Selection for Quantile Regression Based on Variational Bayesian Method. Mathematics 2023, 11, 2232. [Google Scholar] [CrossRef]
  35. Wang, Z.F.; Li, T.; Xiao, L.Q.; Tu, D.S. A threshold longitudinal Tobit quantile regression model for identification of treatment-sensitive subgroups based on interval-bounded longitudinal measurements and a continuous covariate. Stat. Med. 2023, 42, 4618–4631. [Google Scholar] [CrossRef]
  36. Wang, Z.Q.; Wu, Y.; Cheng, W.L. Variational inference on a Bayesian adaptive lasso Tobit quantile regression model. Stat 2023, 12, 13. [Google Scholar] [CrossRef]
  37. Kobayashi, G. Bayesian Endogenous Tobit Quantile Regression. Bayesian Anal. 2017, 12, 161–191. [Google Scholar] [CrossRef]
  38. Alhusseini, H.H.F.; Georgescu, V. Bayesian composite Tobit quantile regression. J. Appl. Stat. 2017, 45, 727–739. [Google Scholar] [CrossRef]
Figure 1. Estimated mean of three methods for different quartiles under sparse longitudinal data.
Figure 1. Estimated mean of three methods for different quartiles under sparse longitudinal data.
Mathematics 12 01782 g001
Figure 2. Estimated standard deviation of three methods for different quartiles under sparse longitudinal data.
Figure 2. Estimated standard deviation of three methods for different quartiles under sparse longitudinal data.
Mathematics 12 01782 g002
Figure 3. Estimated confidence interval of three methods for different quartiles under sparse longitudinal data.
Figure 3. Estimated confidence interval of three methods for different quartiles under sparse longitudinal data.
Mathematics 12 01782 g003
Figure 4. Estimated means of three methods for different quartiles under dense longitudinal data.
Figure 4. Estimated means of three methods for different quartiles under dense longitudinal data.
Mathematics 12 01782 g004
Figure 5. Estimated standard deviations of three methods for different quartiles under dense longitudinal data.
Figure 5. Estimated standard deviations of three methods for different quartiles under dense longitudinal data.
Mathematics 12 01782 g005
Figure 6. Estimated confidence intervals of three methods for different quartiles under dense longitudinal data.
Figure 6. Estimated confidence intervals of three methods for different quartiles under dense longitudinal data.
Mathematics 12 01782 g006
Figure 7. Estimated means for sparse longitudinal data with different censoring ratios.
Figure 7. Estimated means for sparse longitudinal data with different censoring ratios.
Mathematics 12 01782 g007
Figure 8. Estimated standard deviations for sparse longitudinal data with different censoring ratios.
Figure 8. Estimated standard deviations for sparse longitudinal data with different censoring ratios.
Mathematics 12 01782 g008
Figure 9. Estimated confidence intervals for sparse longitudinal data with different censoring ratios.
Figure 9. Estimated confidence intervals for sparse longitudinal data with different censoring ratios.
Mathematics 12 01782 g009
Figure 10. Estimated means for dense longitudinal data with different censoring ratios.
Figure 10. Estimated means for dense longitudinal data with different censoring ratios.
Mathematics 12 01782 g010
Figure 11. Estimated standard deviations for dense longitudinal data with different censoring ratios.
Figure 11. Estimated standard deviations for dense longitudinal data with different censoring ratios.
Mathematics 12 01782 g011
Figure 12. Estimated confidence intervals for dense longitudinal data with different censoring ratios.
Figure 12. Estimated confidence intervals for dense longitudinal data with different censoring ratios.
Mathematics 12 01782 g012
Figure 13. Estimated means for sparse longitudinal data with different distributions.
Figure 13. Estimated means for sparse longitudinal data with different distributions.
Mathematics 12 01782 g013
Figure 14. Estimated standard deviations for sparse longitudinal data with different distributions.
Figure 14. Estimated standard deviations for sparse longitudinal data with different distributions.
Mathematics 12 01782 g014
Figure 15. Estimated confidence intervals for sparse longitudinal data with different distributions.
Figure 15. Estimated confidence intervals for sparse longitudinal data with different distributions.
Mathematics 12 01782 g015
Figure 16. Estimated means for dense longitudinal data with different distributions.
Figure 16. Estimated means for dense longitudinal data with different distributions.
Mathematics 12 01782 g016
Figure 17. Estimated standard deviations for dense longitudinal data with different distributions.
Figure 17. Estimated standard deviations for dense longitudinal data with different distributions.
Mathematics 12 01782 g017
Figure 18. Estimated confidence intervals for dense longitudinal data with different distributions.
Figure 18. Estimated confidence intervals for dense longitudinal data with different distributions.
Mathematics 12 01782 g018
Figure 19. Heat map between variables.
Figure 19. Heat map between variables.
Mathematics 12 01782 g019
Figure 20. Network diagram of covariance between variables.
Figure 20. Network diagram of covariance between variables.
Mathematics 12 01782 g020
Table 1. Estimation results for sparse longitudinal data at different quartiles.
Table 1. Estimation results for sparse longitudinal data at different quartiles.
MethodEstimationMSEβ1β2β3β4β5β6β7β8
10010000
τ = 0.25
P-BTQRMean0.0700.6350.002−0.0110.5840.012−0.016−0.003−0.008
Sd0.1520.2270.2510.1070.2410.1900.1390.1080.105
Confidence interval-[0.591, 0.679][−0.047, 0.051][−0.032, 0.010][0.537, 0.631][−0.027, 0.051][−0.043, 0.011][−0.024, 0.018][−0.029, 0.013]
PDL-BTQRMean0.0790.5740.0620.0010.4930.0140.001−0.005−0.010
Sd0.1370.1980.1950.0810.2440.1700.9990.0830.078
Confidence interval-[0.536, 0.612][0.024, 0.099][−0.015, 0.017][0.445, 0.541][−0.019, 0.047][−0.019, 0.021][−0.021, 0.011][−0.026, 0.006]
PDAL-BTQRMean0.0670.5950.040−0.0060.530−0.0040.002−0.007−0.003
Sd0.0610.2090.2160.0850.1630.0960.0820.0860.072
Confidence interval-[0.555, 0.635][−0.003, 0.083][−0.022, 0.010][0.498, 0.562][−0.023, 0.015][−0.014, 0.018][−0.024, 0.010][−0.017, 0.011]
τ = 0.5
P-BTQRMean0.0540.8960.052−0.0150.7400.0160.007−0.0230.011
Sd0.1410.3110.3300.1250.2460.1720.1370.1160.117
Confidence interval-[0.834, 0.958][−0.012, 0.116][−0.038, 0.008][0.690, 0.790][−0.017, 0.049][−0.021, 0.035][−0.044, −0.002][−0.011, 0.033]
PDL-BTQRMean0.0410.9120.128−0.0020.7270.0000.003−0.0010.006
Sd0.0700.2630.2730.1010.2040.1060.1020.0920.098
Confidence interval-[0.859, 0.965][0.075, 0.181][−0.021, 0.017][0.688, 0.766][−0.021, 0.021][−0.017, 0.023][−0.019, 0.017][−0.013, 0.025]
PDAL-BTQRMean0.0390.9460.097−0.0020.7260.002−0.0020.0010.005
Sd0.0520.2830.2480.1040.2070.1000.0990.0920.095
Confidence interval-[0.890, 1.002][0.049, 0.146][−0.023, 0.019][0.685, 0.767][−0.017, 0.021][−0.021, 0.017][−0.017, 0.019][−0.014, 0.024]
τ = 0.75
P-BTQRMean0.0711.2410.052−0.0080.915−0.001−0.0110.011−0.010
Sd0.1250.3290.3950.1670.3010.2040.1810.1430.161
Confidence interval-[−0.024, 0.128][1.177, 1.305][−0.041, 0.025][0.853, 0.977][−0.041, 0.039][−0.047, 0.025][−0.018, 0.040][−0.040, 0.020]
PDL-BTQRMean0.0511.2000.1490.0080.8740.0080.0080.007−0.016
Sd0.0580.3300.3040.1300.2360.1290.1480.1160.122
Confidence interval-[1.138, 1.262][0.088, 0.210][−0.018, 0.034][0.827, 0.921][−0.017, 0.033][−0.022, 0.038][−0.015, 0.029][−0.040, 0.008]
PDAL-BTQRMean0.0501.2070.1410.0060.8300.0050.0000.0040.002
Sd0.0460.3330.2870.1250.2150.1140.1330.1170.112
Confidence interval-[1.139, 1.275][0.086, 0.196][−0.018, 0.030][0.787, 0.873][−0.017, 0.027][−0.026, 0.026][−0.019, 0.027][−0.020, 0.024]
τ = 0.95
P-BTQRMean0.2251.6290.034−0.0111.285−0.0370.0150.006−0.050
Sd0.2390.4380.6000.3250.5000.3580.3570.2920.275
Confidence interval-[1.544, 1.714][−0.082, 0.150][−0.072, 0.050][1.188, 1.382][−0.106, 0.032][−0.054, 0.084][−0.052, 0.064][−0.104, 0.004]
PDL-BTQRMean0.1321.5000.1700.0151.178−0.0090.0310.006−0.031
Sd0.0960.4520.4380.2240.3650.2180.2420.1880.190
Confidence interval-[1.413, 1.587][0.084, 0.256][−0.029, 0.059][1.105, 1.251][−0.051, 0.033][−0.016, 0.078][−0.030, 0.042][−0.068, 0.006]
PDAL-BTQRMean0.1081.4160.2020.0121.0270.0000.012−0.004−0.010
Sd0.0860.4690.4080.1970.3160.1700.2010.1500.148
Confidence interval-[1.325, 1.507][0.125, 0.279][−0.027, 0.051][0.965, 1.089][−0.032, 0.033][−0.028, 0.052][−0.032, 0.025][−0.039, 0.019]
Table 2. Estimation results of the three methods with dense longitudinal data.
Table 2. Estimation results of the three methods with dense longitudinal data.
MethodEstimationMSEβ1β2β3β4β5β6β7β8
110.50.50.50.50.50.5
τ = 0.25
P-BTQRMean0.0740.6420.7040.3140.3490.3150.3160.3040.329
Sd0.0210.2040.2070.1270.1510.1430.1070.1510.129
Confidence interval-[0.602, 0.682][0.665, 0.743][0.289, 0.339][0.318, 0.380][0.286, 0.344][0.296, 0.336][0.275, 0.333][0.303, 0.355]
PDL-BTQRMean0.0700.6630.7550.3020.3280.3180.2940.3190.319
Sd0.0250.1870.2070.1180.1480.1370.1140.1270.119
Confidence interval-[0.626, 0.700][0.714, 0.796][0.279, 0.325][0.300, 0.356] [0.291, 0.345][0.271, 0.317][0.295, 0.343][0.296, 0.342]
PDAL-BTQRMean0.0620.7280.7610.2940.3460.3240.2980.3230.325
Sd0.0170.1790.1910.1270.1620.1330.1100.1260.118
Confidence interval-[0.693, 0.763][0.723, 0.799][0.269, 0.319][0.312, 0.380][0.300, 0.349][0.278, 0.319][0.298, 0.348][0.303, 0.347]
τ = 0.5
P-BTQRMean0.0400.8991.0580.3580.4020.3660.3590.3660.398
Sd0.0180.2370.2430.1260.1460.1430.1360.1270.126
Confidence interval-[0.851, 0.947][1.011, 1.105][0.333, 0.383][0.374, 0.430][0.338, 0.394][0.332, 0.386][0.342, 0.390][0.373, 0.423]
PDL-BTQRMean0.0390.9021.0400.3650.4040.3590.3760.3700.390
Sd0.0180.2270.2300.1290.1380.1380.1380.1310.129
Confidence interval-[0.860, 0.944][0.996, 1.084][0.340, 0.390][0.377, 0.431][0.331, 0.387][0.349, 0.403][0.344, 0.396][0.365, 0.415]
PDAL-BTQRMean0.0400.9051.0130.3520.3950.3670.3650.3620.382
Sd0.0180.2240.2400.1310.1420.1420.1340.1310.129
Confidence interval-[0.861, 0.949][0.965, 1.061][0.326, 0.378][0.366, 0.424][0.339, 0.395][0.339, 0.391][0.336, 0.388][0.357, 0.407]
τ = 0.75
P-BTQRMean0.0591.0851.2900.3720.4660.3990.4040.4250.426
Sd0.0330.2840.3020.1720.1690.1550.1600.1710.175
Confidence interval-[1.027, 1.143][1.230, 1.350][0.337, 0.407][0.433, 0.499][0.369, 0.429][0.373, 0.435][0.393, 0.457][0.392, 0.460]
PDL-BTQRMean0.0561.0991.2970.3830.4600.3890.4300.4210.405
Sd0.0330.2690.2980.1490.1480.1460.1640.1480.171
Confidence interval-[1.044, 1.154][1.237, 1.357][0.353, 0.413][0.431, 0.489][0.361, 0.417][0.398, 0.462][0.393, 0.449][0.372, 0.438]
PDAL-BTQRMean0.0581.0871.2720.3470.4280.3670.3960.3960.375
Sd0.0310.2640.3000.1510.1500.1540.1680.1540.173
Confidence interval-[1.034, 1.140] [1.214, 1.330][0.318, 0.376][0.399, 0.457][0.336, 0.398][0.363, 0.429][0.366, 0.426][0.341, 0.409]
τ = 0.95
P-BTQRMean0.2451.4691.8060.5240.6450.5760.5460.6410.572
Sd0.1240.4380.4780.3160.3390.3270.3290.3300.320
Confidence interval-[1.380, 1.558][1.710, 1.902][0.462, 0.586][0.578, 0.712][0.511, 0.641][0.481, 0.611][0.580, 0.703][0.510, 0.634]
PDL-BTQRMean0.2011.3991.7650.4980.5900.4950.5320.5440.493
Sd0.1160.4100.4490.2650.2870.2640.3000.2880.307
Confidence interval-[1.320, 1.478] [1.677, 1.853][0.446, 0.550][0.533, 0.646][0.445, 0.545][0.475, 0.589][0.489, 0.599][0.433, 0.553]
PDAL-BTQRMean0.1931.2951.7530.3710.5060.4310.4190.4790.406
Sd0.1170.4090.5040.2550.2840.2300.2820.2710.285
Confidence interval-[1.216, 1.374] [1.655, 1.851][0.323, 0.419][0.451, 0.561] [0.387, 0.475][0.365, 0.473][0.428, 0.530][0.350, 0.462]
Table 3. Estimation results for sparse longitudinal data with different censoring ratios.
Table 3. Estimation results for sparse longitudinal data with different censoring ratios.
MethodEstimationMSEβ1β2β3β4β5β6β7β8
10010000
Censoring ratio = 10%
P-BTQRMean0.0290.846−0.071−0.0080.931−0.016−0.0060.013−0.006
Sd0.0260.2320.2420.1270.1380.1160.1290.1200.115
Confidence interval-[0.801, 0.891][−0.119, −0.023][−0.032, 0.016][0.904, 0.958][−0.039, 0.007][−0.031, 0.019][−0.011, 0.037][−0.028, 0.016]
PDL-BTQRMean0.0260.7860.0030.0070.8730.0040.0100.006−0.003
Sd0.0160.2150.1830.1080.1320.0920.105 0.1000.096
Confidence interval-[0.744, 0.828][−0.032, 0.038][−0.014, 0.028][0.846, 0.900][−0.014, 0.022][−0.011, 0.031][−0.013, 0.025][−0.022, 0.016]
PDAL-BTQRMean0.0250.798−0.024−0.0010.893−0.0040.0030.0060.007
Sd0.0150.2240.1970.1110.1340.1010.1060.0980.101
Confidence interval-[0.755, 0.841][−0.063, 0.015][−0.023, 0.021][0.866, 0.920][−0.024, 0.016][−0.018, 0.024][−0.013, 0.025][−0.013, 0.027]
Censoring ratio = 20%
P-BTQRMean0.0300.866−0.048−0.0130.901−0.0060.0030.0020.010
Sd0.0180.2280.2480.1230.1620.1290.1240.1260.118
Confidence interval-[0.821, 0.911][−0.098, 0.002][−0.037, 0.011][0.868, 0.934][−0.031, 0.019][−0.022, 0.028][−0.022, 0.026][−0.013, 0.033]
PDL-BTQRMean0.0270.8040.038−0.0010.8400.0000.0110.0050.006
Sd0.0170.2120.1880.1060.1600.1020.1010.1040.102
Confidence interval-[0.763, 0.845][0.001, 0.075][−0.023, 0.021][0.808, 0.872][−0.020, 0.020][−0.009, 0.031][−0.014, 0.024][−0.014, 0.026]
PDAL-BTQRMean0.0260.840−0.003−0.0070.864−0.0080.012−0.0010.009
Sd0.0160.2230.1990.1120.1580.1030.1080.1050.106
Confidence interval-[0.797, 0.883][−0.042, 0.036][−0.029, 0.015][0.832, 0.896][−0.028, 0.012][−0.009, 0.033][−0.022, 0.020] [−0.012, 0.030]
Censoring ratio = 40%
P-BTQRMean0.0330.867−0.0160.0040.784−0.014−0.0070.013−0.007
Sd0.0180.2340.2530.1240.1480.1150.1080.1070.107
Confidence interval-[0.821, 0.913][−0.068, 0.036][−0.019, 0.027][0.755, 0.813][−0.039, 0.008][−0.028, 0.014][−0.008, 0.034][−0.028, 0.014]
PDL-BTQRMean0.0320.8110.0740.0070.729−0.0110.0010.007−0.006
Sd0.0190.2190.1970.1020.1450.0900.0920.0960.092
Confidence interval-[0.769, 0.853][0.036, 0.112][−0.013, 0.027][0.700, 0.758][−0.028, 0.006][−0.017, 0.019][−0.011, 0.025][−0.024, 0.012]
PDAL-BTQRMean0.0310.8330.0330.0030.743−0.0050.0010.006−0.001
Sd0.0180.2250.2000.1030.1470.0940.0920.0990.095
Confidence interval-[0.789, 0.877][−0.006, 0.072][−0.017, 0.023][0.714, 0.772][−0.023, 0.013][−0.017, 0.019][−0.013, 0.025][−0.020, 0.018]
Table 4. Estimation results for dense longitudinal data with different censoring ratios.
Table 4. Estimation results for dense longitudinal data with different censoring ratios.
MethodEstimationMSEβ1β2β3β4β5β6β7β8
110.50.50.50.50.50.5
Censoring ratio = 10%
P-BTQRMean0.0420.8100.7870.4410.4920.4310.4640.4740.459
Sd0.0150.2550.2570.1470.1430.1640.1430.1190.134
Confidence interval-[0.759, 0.861][0.736, 0.838][0.413, 0.469][0.464, 0.520][0.399, 0.463][0.437, 0.491][0.450, 0.498][0.434, 0.484]
PDL-BTQRMean0.0400.7930.8260.4440.4660.4280.4350.4590.441
Sd0.0200.2260.2480.1370.1440.1490.1510.1400.131
Confidence interval-[0.748, 0.838][0.778, 0.874][0.417, 0.471][0.438, 0.494][0.399, 0.457][0.405, 0.465][0.433, 0.485][0.416, 0.466]
PDAL-BTQRMean0.0410.7930.8170.4360.4650.4300.4430.4510.440
Sd0.0210.2350.2560.1400.1500.1600.1490.1400.139
Confidence interval-[0.748, 0.838][0.766, 0.868][0.408, 0.464][0.435, 0.495][0.399, 0.461][0.413, 0.473][0.424, 0.478][0.413, 0.467]
Censoring ratio = 20%
P-BTQRMean0.0440.8480.8450.4250.4820.4340.4170.4270.455
Sd0.0190.2690.2860.1600.1710.1550.1360.1510.127
Confidence interval-[0.796, 0.900][0.789, 0.901][0.393, 0.457][0.448, 0.516][0.403, 0.465][0.391, 0.443][0.398, 0.456][0.430, 0.480]
PDL-BTQRMean0.0380.8400.9040.4200.4560.4180.4310.4300.421
Sd0.0200.2330.2580.1430.1570.1450.1380.1510.131
Confidence interval-[0.795, 0.885][0.852, 0.956][0.392, 0.448][0.426, 0.486][0.390, 0.446][0.402, 0.460][0.403, 0.457][0.395, 0.447]
PDAL-BTQRMean0.0390.8410.8900.4180.4530.4220.4210.4340.439
Sd0.0200.2340.2660.1440.1650.1490.1370.1540.138
Confidence interval-[0.796, 0.886][0.838, 0.942][0.389, 0.447][0.420, 0.486][0.393, 0.451][0.393, 0.449][0.404, 0.464][0.412, 0.466]
Censoring ratio = 40%
P-BTQRMean0.0480.7930.8300.3750.3880.3620.3550.3710.382
Sd0.0180.2330.2370.1450.1470.1520.1370.1290.130
Confidence interval-[0.748, 0.838][0.784, 0.876][0.346, 0.404][0.360, 0.416][0.332, 0.392][0.328, 0.382][0.345, 0.397][0.357, 0.407]
PDL-BTQRMean0.0460.7940.8760.3730.3820.3560.3740.3710.374
Sd0.0190.2140.2280.1390.1470.1340.1350.1390.134
Confidence interval-[0.752, 0.836][0.830, 0.922][0.346, 0.400][0.354, 0.410][0.329, 0.383][0.347, 0.401][0.345, 0.397][0.348, 0.400]
PDAL-BTQRMean0.0470.8000.8610.3580.3720.3560.3680.3660.384
Sd0.0210.2090.2370.1450.1480.1450.1310.1400.135
Confidence interval-[0.761, 0.839][0.815, 0.907][0.329, 0.387][0.343, 0.401][0.327, 0.385][0.342, 0.394][0.339, 0.393][0.358, 0.410]
Table 5. Estimation results for sparse longitudinal data with different distributions.
Table 5. Estimation results for sparse longitudinal data with different distributions.
MethodEstimationMSEβ1β2β3β4β5β6β7β8
10010000
N(0,1)
P-BTQRMean0.0540.8960.052−0.0150.7400.0160.007−0.0230.011
Sd0.1410.3110.3300.1250.2460.1720.1370.1160.117
Confidence interval-[0.834, 0.958][−0.012, 0.116][−0.038, 0.008][0.690, 0.790][−0.017, 0.049][−0.021, 0.035][−0.044, −0.002][−0.011, 0.033]
PDL-BTQRMean0.0410.9120.128−0.0020.7270.0000.003−0.0010.006
Sd0.0700.2630.2730.1010.2040.1060.1020.0920.098
Confidence interval-[0.859, 0.965][0.075, 0.181][−0.021, 0.017][0.688, 0.766][−0.021, 0.021][−0.017, 0.023][−0.019, 0.017][−0.013, 0.025]
PDAL-BTQRMean0.0390.9460.097−0.0020.7260.002−0.0020.0010.005
Sd0.0520.2830.2480.1040.2070.1000.0990.0920.095
Confidence interval-[0.890, 1.002][0.049, 0.146][−0.023, 0.019][0.685, 0.767][−0.017, 0.021][−0.021, 0.017][−0.017, 0.019][−0.014, 0.024]
t(3)
P-BTQRMean0.0771.2480.133−0.0290.944−0.0160.058−0.0750.006
Sd0.0410.3240.3470.2290.2070.2370.2270.2310.227
Confidence interval-[1.186, 1.310][0.064, 0.202][−0.072, 0.015][0.903, 0.985][−0.061, 0.029][0.013, 0.103][−0.121, −0.029][−0.037, 0.049]
PDL-BTQRMean0.0571.1780.2500.0100.9180.0220.051−0.047−0.003
Sd0.0340.3080.2540.1720.2150.2010.1790.1620.175
Confidence interval-[1.120, 1.236][0.200, 0.300][−0.022, 0.042][0.877, 0.959][−0.017, 0.061][0.017, 0.085][−0.079, −0.015][−0.035, 0.029]
PDAL-BTQRMean0.0561.1900.208−0.0090.9010.0080.044−0.0430.006
Sd0.0330.3050.2570.1680.2010.2010.1890.1780.178
Confidence interval-[1.129, 1.251][0.158, 0.258][−0.042, 0.024][0.862, 0.940][−0.031, 0.047][0.008, 0.080][−0.077, −0.009][−0.027, 0.039]
ALD
P-BTQRMean0.0441.1180.0540.0060.8520.007−0.022−0.0060.016
Sd0.0280.3270.3060.1230.1730.1280.1520.1380.134
Confidence interval-[1.057, 1.179][−0.005, 0.113][−0.019, 0.031][0.818, 0.886][−0.018, 0.032][−0.051, 0.007][−0.033, 0.021][−0.010, 0.042]
PDL-BTQRMean0.0371.0590.1380.0240.8370.010−0.0120.0070.004
Sd0.0250.3280.2460.1120.1760.0990.1220.1100.112
Confidence interval-[0.997, 1.121][0.090, 0.186][0.002, 0.046][0.802, 0.872][−0.009, 0.029][−0.036, 0.012][−0.014, 0.028][−0.018, 0.026]
PDAL-BTQRMean0.0371.0910.1100.0100.8360.008−0.0220.0070.001
Sd0.0250.3200.2530.1130.1710.1000.1250.1060.111
Confidence interval-[1.029, 1.153][0.061, 0.159][−0.013, 0.033][0.802, 0.870][−0.012, 0.028][−0.046, 0.002][−0.013, 0.027][−0.021, 0.023]
Table 6. Estimation results for dense longitudinal data with different distributions.
Table 6. Estimation results for dense longitudinal data with different distributions.
MethodEstimationMSEβ1β2β3β4β5β6β7β8
110.50.50.50.50.50.5
N(0,1)
P-BTQRMean0.0400.8991.0580.3580.4020.3660.3590.3660.398
Sd0.0180.2370.2430.1260.1460.1430.1360.1270.126
Confidence interval-[0.851, 0.947][1.011, 1.105][0.333, 0.383][0.374, 0.430][0.338, 0.394][0.332, 0.386][0.342, 0.390][0.373, 0.423]
PDL-BTQRMean0.0390.9021.0400.3650.4040.3590.3760.3700.390
Sd0.0180.2270.2300.1290.1380.1380.1380.1310.129
Confidence interval-[0.860, 0.944][0.996, 1.084][0.340, 0.390][0.377, 0.431][0.331, 0.387][0.349, 0.403][0.344, 0.396][0.365, 0.415]
PDAL-BTQRMean0.0400.9051.0130.3520.3950.3670.3650.3620.382
Sd0.0180.2240.2400.1310.1420.142 0.1340.1310.129
Confidence interval-[0.861, 0.949][0.965, 1.061][0.326, 0.378][0.366, 0.424][0.339, 0.395][0.339, 0.391][0.336, 0.388][0.357, 0.407]
t(3)
P-BTQRMean0.0741.0561.2310.3700.4380.3620.4340.3550.376
Sd0.0410.2930.3230.2150.2140.2130.2180.2180.214
Confidence interval-[0.999, 1.113][1.171, 1.291][0.330, 0.410][0.397, 0.479][0.320, 0.404][0.391, 0.477][0.312, 0.398][0.335, 0.417]
PDL-BTQRMean0.0651.0611.2080.3810.4470.3800.4530.3570.375
Sd0.0340.2800.2930.1990.2050.2000.2080.2080.204
Confidence interval-[1.006, 1.116][1.152, 1.264][0.343, 0.419][0.407, 0.487][0.341, 0.419][0.413, 0.493][0.316, 0.398][0.336, 0.414]
PDAL-BTQRMean0.0681.0391.1840.3500.4250.3460.4240.3350.354
Sd0.0350.2780.2960.1990.2130.2070.2010.1990.207
Confidence interval-[0.984, 1.094][1.128, 1.240][0.312, 0.388][0.385, 0.465][0.306, 0.386][0.385, 0.463][0.300, 0.373][0.315, 0.393]
ALD
P-BTQRMean0.0461.0111.1550.4000.4110.4190.3740.4140.410
Sd0.0300.2960.2970.1440.1420.1330.1460.1370.145
Confidence interval-[0.953, 1.069][1.097, 1.213][0.372, 0.428][0.383, 0.439][0.393, 0.445][0.346, 0.402][0.387, 0.441][0.382, 0.438]
PDL-BTQRMean0.0431.0221.1330.4070.4220.4160.4020.4060.397
Sd0.0260.2830.2870.1370.1430.1320.1440.1380.147
Confidence interval-[0.968, 1.076][1.076, 1.190][0.381, 0.433][0.395, 0.449][0.391, 0.441][0.373, 0.431][0.379, 0.433][0.368, 0.426]
PDAL-BTQRMean0.0441.0151.1120.3930.4040.4130.3830.4060.396
Sd0.0250.2820.2790.1420.1480.1390.1530.1480.151
Confidence interval-[0.961, 1.069][1.056, 1.168][0.366, 0.420][0.375, 0.433][0.386, 0.440][0.354, 0.412][0.376, 0.436][0.367, 0.424]
Table 7. Comparison of computational runtimes for different simulation methods based on Gibbs sampling.
Table 7. Comparison of computational runtimes for different simulation methods based on Gibbs sampling.
MethodsUser TimeSystem TimeElapsed Time
P-BTQR0.2240.06954.840
PL-BTQR0.2470.084102.377
PAL-BTQR0.2190.043104.179
PDL-BTQR0.2410.02895.354
PDAL-BTQR0.2380.03398.353
Table 8. Variable definitions and descriptive statistics.
Table 8. Variable definitions and descriptive statistics.
VariantNameDefineMeanSdMaxMin
YCrime rateNumber of criminal suspects arrested by the Public Prosecutor’s Office per 10,000 population6.6842.30114.8403.549
X1Per capita GDPGDP output per unit of population4.2772.08812.3191.299
X2Urbanization rateRatio of regional urban population to total population0.5430.1370.8960.222
X3Regional income gapDifference between per capita disposable income of regional residents and per capita disposable income of national residents0.5450.5433.0480.002
X4Educational levelAverage number of students enrolled in higher education per 100,000 population0.2470.0860.6200.108
X5Unemployment rateUrban registered unemployment rate3.3690.6544.5001.200
Table 9. Estimates of the two methods at different quartiles.
Table 9. Estimates of the two methods at different quartiles.
Variant τ = 0.1 τ = 0.2 τ = 0.3 τ = 0.4 τ = 0.5 τ = 0.6 τ = 0.7 τ = 0.8 τ = 0.9
PDL-BTQR
Per capita GDP−0.097−0.079−0.060−0.068−0.099−0.134−0.153−0.117−0.010
Urbanization rate0.4560.4230.4060.4240.4490.4420.3800.3340.264
Regional income disparities0.1610.1780.1750.1820.2010.2320.2730.2290.125
Educational level−0.399−0.366−0.359−0.392−0.428−0.430−0.413−0.358−0.298
Unemployment rate−0.102−0.117−0.135−0.171−0.211−0.240−0.249−0.212−0.151
PDAL-BTQR
Per capita GDP−0.162−0.129−0.111−0.119−0.153−0.195−0.230−0.208−0.046
Urbanization rate0.6100.5810.5360.5460.5720.5630.5100.4730.356
Regional income disparities0.1870.1990.2060.2120.2400.2810.3190.2770.169
Educational level−0.592−0.592−0.585−0.602−0.646−0.643−0.630−0.556−0.475
Unemployment rate−0.126−0.149−0.166−0.190−0.239−0.257−0.259−0.218−0.163
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Zhao, K.; Shu, T.; Hu, C.; Luo, Y. Research on Quantile Regression Method for Longitudinal Interval-Censored Data Based on Bayesian Double Penalty. Mathematics 2024, 12, 1782. https://doi.org/10.3390/math12121782

AMA Style

Zhao K, Shu T, Hu C, Luo Y. Research on Quantile Regression Method for Longitudinal Interval-Censored Data Based on Bayesian Double Penalty. Mathematics. 2024; 12(12):1782. https://doi.org/10.3390/math12121782

Chicago/Turabian Style

Zhao, Ke, Ting Shu, Chaozhu Hu, and Youxi Luo. 2024. "Research on Quantile Regression Method for Longitudinal Interval-Censored Data Based on Bayesian Double Penalty" Mathematics 12, no. 12: 1782. https://doi.org/10.3390/math12121782

APA Style

Zhao, K., Shu, T., Hu, C., & Luo, Y. (2024). Research on Quantile Regression Method for Longitudinal Interval-Censored Data Based on Bayesian Double Penalty. Mathematics, 12(12), 1782. https://doi.org/10.3390/math12121782

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop