Error-Function-Based Penalized Quantile Regression in the Linear Mixed Model

Hang, Zelin; He, Xiuli

doi:10.3390/app15137461

Open AccessArticle

Error-Function-Based Penalized Quantile Regression in the Linear Mixed Model

by

Zelin Hang

and

Xiuli He

^*

School of Mathematics, Hohai University, Nanjing 210098, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2025, 15(13), 7461; https://doi.org/10.3390/app15137461

Submission received: 31 May 2025 / Revised: 26 June 2025 / Accepted: 28 June 2025 / Published: 3 July 2025

Download

Browse Figures

Versions Notes

Abstract

We study a novel Doubly penalized ERror Function regularized Quantile Regression (DERF-QR) in this paper. This is a method of variable selection by ERror Function (ERF) regularization in the linear effects model. We introduce a two-stage iterative algorithm combining the iterative reweighted

L_{1}

approach and the alternating direction method of multipliers (ADMM) to estimate the model parameters. Numerical simulations show that by using this method we remove redundant variables effectively and obtain accurate coefficient estimations. Our method outperforms two existing penalized quantile regression methods in various error conditions by comparison. Finally, we apply the methodology in a financial dataset and showcase its practicality.

Keywords:

alternating direction method of multipliers; error function; quantile regression; linear mixed-effects models

MSC:

primary 62-08; secondary 62J05

1. Introduction

Quantile regression has become a powerful tool in statistical theory, with extensive methodological developments. It has also found wide application in engineering and related fields [1]. In recent years, statistical inference and variable selection in high-dimensional quantile regression have become an important research focus [2,3]. Tan et al. proposed a convolutional smooth quantile regression method with an iteratively reweighted

L_{1}

algorithm to enhance robustness [4]. Wang et al. combined frequentist model averaging with high-dimensional quantile regression to improve model construction and selection [5]. Research on high-dimensional quantile regression has also been extended to a wider range of models [6,7].

Compared to mean regression, quantile regression offers greater robustness to outliers and skewed distributions [8], and is particularly useful for capturing heterogeneous effects in panel data models [9]. Traditional panel data models often neglect individual heterogeneity, whereas linear mixed-effects models effectively address this by capturing both data heterogeneity and potential correlations among individuals. They incorporate random effects to effectively model unobservable individual-level variations, such as behavioral preferences and baseline differences. The combination of linear mixed-effects models and quantile regression has attracted growing attention, and the variable selection for these models emerges as a critical research area.

Variable selection and dimensionality reduction have attracted extensive theoretical and practical attention in the past decade. Traditional methods often fail to perform effectively in high-dimensional settings. Several regularization techniques have been developed to overcome this difficulty in existing work. Tibshirani introduced the Lasso method in [10] and simultaneously obtained the coefficient estimation and variable selection by

L_{1}

regularization. Fan et al. proposed the non-convex SCAD penalty with Oracle properties in [11]. Based on these foundations, many regularization methods and extensions have been developed to handle more complex models.

In quantile regression for the linear mixed-effects model, neglecting random effects can lead to biased estimates by failing to capture individual heterogeneity. Including too many random-effect covariates may cause overfitting, resulting in unstable estimates and reduced interpretability and generalizability. Thus, effective variable selection is crucial for simplifying models, eliminating redundancy, and improving computational efficiency. However, it is more challenging to estimate the coefficients and select the variables due to the interaction between fixed and random effects in the linear mixed-effects model. Koenker proposed a quantile regression approach in random-effects models and applied the Lasso penalty in random effects to eliminate redundant effects, but this method does not allow for selection among fixed-effect variables [12]. Li et al. proposed using

L_{1}

penalties to simultaneously select fixed and random effects in quantile regression for linear mixed-effects models, but the estimates obtained with the Lasso penalty are biased [13]. Bondell et al. used improved Cholesky decomposition for fixed- and random-effects estimation and variable selection in linear mixed-effects models [14]. Qi et al. proposed a double-penalized likelihood method for finite mixture regression models, imposing penalties on both the mixing proportions and regression coefficients. This approach enables simultaneous parameter estimation and variable selection [15]. Inspired by the above proposition, we propose a Doubly penalized ERror Function regularized Quantile Regression (DERF-QR) method in a linear mixed-effects model in this paper.

The innovations of this paper are as follows. First, we improve the existing doubly penalized quantile regression method by incorporating a novel regularization term based on the error function in a linear mixed-effects model [16]. This penalty function approximates the

L_{0}

penalty under certain conditions and substantially enhances the performance of coefficient estimation and variable selection. Second, we propose an efficient algorithm for this method by combining a two-step iterative method with the iterative reweighted

L_{1}

proximal to the alternating direction method of multipliers algorithm (IRW-pADMM) [17].

The structure of this paper is as follows. In Section 2, we introduce the properties of penalty functions and present an estimation method for the doubly penalized method. We give the Monte Carlo simulations to validate the performance of this method in coefficient estimation and variable selection in Section 3. Section 4 shows that the method proposed in this paper is effective in financial data from multiple companies.

2. Model and Algorithm

Consider a linear mixed-effects model with random-effect coefficients as follows:

\begin{matrix} Y_{i j} & = x_{i j}^{T} β + z_{i j}^{T} α_{i} + ε_{i j}, i = 1, 2, \dots, n, j = 1, 2, \dots, m, \end{matrix}

(1)

where

Y_{i j}

and

ε_{i j}

denote the response variable and the error at the j-th observation of the i-th individuals, respectively. The vector

x_{i j} = {(x_{i j 1}, x_{i j 2}, \dots, x_{i j p})}^{T} \in R^{p}

is the fixed-effect variables, and

β = {(β_{1}, \dots, β_{p})}^{T}

denotes the corresponding p-dimensional coefficients. Similarly,

z_{i j} = {(z_{i j 1}, z_{i j 2}, \dots, z_{i j q})}^{T} \in R^{q}

and

α_{i} = {(α_{i 1}, \dots, α_{i q})}^{T}

are the random-effects covariates and their corresponding q-dimensional coefficients, respectively.

In this linear mixed-effects model, an excessive number of random-effect covariates may lead to overfitting and hinder accurate parameter estimation. To address this, we study a doubly penalized quantile regression method for the linear mixed-effects model, with regularization terms based on the error function applied to both fixed- and random-effects coefficients. For a given response variable

Y_{i j}

and quantile

τ \in (0, 1)

, the conditional quantile regression function is expressed as follows:

\begin{matrix} Q (τ ∣ x_{i j}, z_{i j}, α_{i}) & = x_{i j}^{T} β + z_{i j}^{T} α_{i}, \end{matrix}

(2)

where

Q (τ | x_{i j}, z_{i j}, α_{i}) = \inf \{y : F (y | x_{i j}, z_{i j}, α_{i}) \geq τ\}

is the

τ

-quantile of

Y_{i j}

. The model can be transformed into the following minimization problem:

\begin{matrix} \min_{β, α} \{\sum_{i = 1}^{n} \sum_{j = 1}^{m} ρ_{τ} (Y_{i j} - x_{i j}^{T} β - z_{i j}^{T} α_{i}) + λ_{β} J_{σ} (β) + \sum_{i = 1}^{n} λ_{α} J_{σ} (α_{i})\} \\ J_{σ} (β) = \sum_{l = 1}^{p} Φ (| β_{l} |), Φ (| β_{l} |) = \int_{0}^{| β_{l} |} e^{- \frac{t^{2}}{σ^{2}}} d t, \end{matrix}

(3)

where

ρ_{τ} (u) = u (τ - I (u \leq 0))

is the quantile loss function, and

λ_{β} \geq 0

,

λ_{α} \geq 0

are the penalty parameters associated with the fixed-effects and random-effects coefficients, respectively.

Φ (x)

is the error function and define

J_{σ} (β)

is the ERF(ERorr Function) regularization term [16]. The parameter

σ > 0

controls the intensity of the penalty. For clarity of discussion, we reformulate Equations (3) as follows:

\begin{matrix} \min_{β, α} { & \sum_{i = 1}^{n} \sum_{j = 1}^{m} ρ_{τ} (Y_{i j} - x_{i j}^{T} β - z_{i j}^{T} α_{i}) \\ + λ_{β} \sum_{l = 1}^{p} Φ (| β_{l} |) + λ_{α} \sum_{i = 1}^{n} \sum_{k = 1}^{q} Φ (| α_{i k} |)} . \end{matrix}

(4)

By imposing penalties on both the fixed- and random-effects coefficients in the linear mixed model, one can not only perform variable selection for the fixed effects but also eliminate redundant information in the random effects. Li et al. proved that, in doubly penalized quantile regression for linear mixed-effects models, the estimator under double Lasso penalties converges in distribution asymptotically, and the bias in estimating non-zero coefficients is controlled by the penalty parameters [13]. We implement doubly penalized regression for linear mixed-effects models by applying ERF regularization to both the fixed- and random-effects coefficients.

The

L_{1}

is the most commonly used penalty function in quantile regression. However, it introduces estimation bias, particularly for large coefficients. To address this issue, several non-convex penalties such as SCAD and MCP have been proposed [11,18]. These penalties adaptively apply varying degrees of shrinkage to different coefficient magnitudes, thereby reducing the bias introduced by

L_{1}

. Nevertheless, their non-convex and non-smooth nature poses significant challenges in solving the associated optimization problems. ERF regularization is a non-convex penalty function with smoothness, which increases flexibility in the optimization process and improves computational efficiency. Existing work has shown that this function reduces bias more effectively than most other non-convex penalties [16]. With appropriate parameters

σ

, the ERF regularization term can closely approximate the

L_{0}

penalty, which directly controls the sparsity of the model and yields an optimal sparse solution. For any

x \in R^{n}

, we have

\begin{matrix} J_{σ} (x) & \to {∥ x ∥}_{1}, if σ \to + \infty, \\ \frac{J_{σ} (x)}{σ} & \to \frac{\sqrt{π}}{2} {∥ x ∥}_{0}, if σ \to 0^{+} . \end{matrix}

Moreover, Guo provided the following four properties and their proofs [16].

(1) The error function

Φ (x)

is smooth, and the following is its derivative:

\begin{matrix} \frac{d}{d x} Φ_{σ} (x) = \exp (- {(\frac{x}{σ})}^{2}) . \end{matrix}

(5)

(2) For any

x \in R

, we have

\begin{matrix} c \sqrt{1 - e^{- a x^{2}}} \leq Φ_{σ} (x) \leq c \sqrt{1 - e^{- b x^{2}}}, \end{matrix}

(6)

where

a = \frac{1}{σ^{2}}, b = \frac{π}{{(4 σ)}^{2}}

and

c = \frac{σ \sqrt{π}}{2} .

(3)

J_{σ} \in R_{+}^{n}

is concave. For any

t \in [0, 1]

and

x, y \in R_{+}^{n}

, we have

\begin{matrix} J_{σ} (t x + (1 - t) y) \geq t J_{σ} (x) + (1 - t) J_{σ} (y) . \end{matrix}

(7)

(4) For any

x, y \in R^{n},

\begin{matrix} J_{σ} (x + y) \leq J_{σ} (x) + J_{σ} (y) . \end{matrix}

(8)

These properties show that the error function is smooth and bounded. Equation (8) further confirms that the ERF regularization satisfies the triangle inequality.

We employ the proximal operator to measure the bias of the penalty function [19]. For a function

f : R^{n} \to R^{n} \cup {+ \infty}

, the proximal operator

prox (v) : R^{n} ⟶ R^{n}

of the function f is defined as follows:

prox (v) = {\arg \min}_{x} (μ f (x) + \frac{1}{2} | | x - {v | |}_{2}^{2}) .

In Figure 1, we plot the proximal operators for

L_{1}

,

L_{2}

, SCAD, and ERF with different

σ

at

μ = 1

. The proximity of each proximal operator to the line

y = x

reflects the bias level, that is, a smaller deviation indicates lower bias. The bias of the ERF penalty increases as

σ

increases, while for all values

σ

, it remains smaller than that of

L_{1}

.

Considering the non-smoothness of the quantile loss function and the non-convexity of the error function, we can transform Equation (4) into the following form by the iterative reweighted

L_{1}

algorithm [17]:

\begin{matrix} \min_{β, α} { & \sum_{i = 1}^{n} \sum_{j = 1}^{m} ρ_{τ} (Y_{i j} - x_{i j}^{T} β - z_{i j}^{T} α_{i}) \\ + λ_{β} \sum_{l = 1}^{p} ω_{β_{l}} | β_{l} | + λ_{α} \sum_{i = 1}^{n} \sum_{k = 1}^{q} ω_{α i k} | α_{i k} |}, \end{matrix}

(9)

where the corresponding weights

ω_{β_{l}}

are computed by differentiating

Φ_{σ} (| β_{l} |)

for each

β_{l}

. Specially, we have

\begin{matrix} ω_{β_{l}} = Φ_{σ}^{'} (| β_{l} |) = \exp (- {(\frac{β_{l}}{σ})}^{2}), \\ ω_{α i k} = \exp (- {(\frac{α_{i k}}{σ})}^{2}) . \end{matrix}

We iteratively compute the weights and solve the optimization problem of Equation (9) until convergence is achieved. Gu et al. proposed the proximal ADMM and sparse coordinate descent ADMM algorithms, and proved their global convergence [20]. These two methods are particularly effective for solving high-dimensional sparse quantile regression problems, and show high estimation accuracy and computational efficiency. In this paper, we utilize the IRW-pADMM algorithm, which combines the proximal ADMM algorithm with the iterative reweighted

L_{1}

algorithm. Consider the univariate quantile regression method with ERF regularization:

\begin{matrix} \min \{\sum_{i = 1}^{n} ρ_{τ} (Y_{i} - {x_{i}}^{T} β) + λ \sum_{j = 1}^{p} Φ (| β_{j} |)\}, \end{matrix}

(10)

where

β = {(β_{1}, \dots, β_{p})}^{T}

,

x_{i}^{T} \in R^{p}

is the i-th row of the design matrix

X = {(x_{1}, x_{2}, \dots, x_{n})}^{T}

, and

Y = {(Y_{1}, \dots, Y_{n})}^{T} \in R^{n}

is the response vector. The IRW-pADMM algorithm is summarized in Algorithm 1. Here,

shrink (x, α) = sgn (x) \max (| x | - α, 0)

is the soft thresholding operator and we define

{Prox}_{x_{ρ_{τ}}} (x, α) = \{\begin{matrix} x - \frac{τ}{a}, & x > \frac{τ}{a}, \\ 0, & \frac{τ - 1}{a} < x < \frac{τ}{a}, \\ x - \frac{τ - 1}{a}, & x < \frac{τ - 1}{a}, \end{matrix}

where the parameter

δ > 0

is the penalty parameter in the augmented Lagrangian of the ADMM algorithm, which affects the convergence speed of the model.

γ > 0

is the step size parameter that regulates the update of

θ

. Based on the research conducted by Gu et al.,

η \geq λ_{\max} (X^{T} X)

is used in the update of

β

to ensure that the matrix in the proximal term

S = σ (η I_{p} - X^{T} X)

is positive semi-definite, and

λ_{\max} (X^{T} X)

denotes the largest eigenvalue of the matrix

X^{T} X

[20].

Algorithm 1 IRW-pADMM algorithm.
1:	Input: $X \in R^{n \times p}$ , $Y \in R^{n}$ , $σ$ , $λ$ , $γ$ , $τ$ , $δ$ , maxouter, maxinner
2:	Initialize: ${\hat{β}}^{(0)}$ , ${\hat{z}}^{(0)}$ , ${\hat{θ}}^{(0)}$ , $k \leftarrow 0$
3:	while $k < maxouter$ and $∥ {\hat{β}}^{(k + 1)} - {\hat{β}}^{(k)} ∥_{2} > ε_{1}$ do
4:	$ω^{(k)} \leftarrow \exp (- {(\frac{{\hat{β}}^{(k)}}{σ})}^{2})$
5:	$l \leftarrow 0$ , $β^{(l)} \leftarrow {\hat{β}}^{(k)}$
6:	while $l < maxinner$ and $∥ β^{(l + 1)} - β^{(l)} ∥_{2} > ε_{2}$ do
7:	$β^{(l + 1)} \leftarrow {(shrink (β_{j}^{(l)} + \frac{1}{δ η} X_{j}^{T} (θ^{(l)} + δ Y - δ X β^{(l)} - δ z^{(l)}), \frac{λ ω_{j}^{(l)}}{δ η}))}_{1 \leq j \leq p}$
8:	$z^{(l + 1)} \leftarrow {({Prox}_{ρ_{τ}} (Y_{i} - x_{i}^{T} β^{(l + 1)} + \frac{1}{δ} θ_{i}^{(l)}, δ))}_{1 \leq i \leq n}$
9:	$θ^{(l + 1)} \leftarrow θ^{(l)} - γ δ (X β^{(l + 1)} + z^{(l + 1)} - Y)$
10:	$l \leftarrow l + 1$
11:	end while
12:	${\hat{β}}^{(k + 1)} \leftarrow β^{(l)}$
13:	$k \leftarrow k + 1$
14:	end while
15:	return ${\hat{β}}^{(k)}$

We present the two-step DERF-QR Algorithm in Algorithm 2 for the doubly penalized quantile regression method Equation (4). In Algorithm 2, we first initialize the fixed-effect coefficients as

{\hat{β}}_{i}^{(0)} = 0

, and then compute the initial estimates of the random-effect coefficients

{\hat{α}}^{(0)}

using the IRW-pADMM algorithm. Subsequently, we alternately update the fixed- and random-effect coefficients by fixing one set and optimizing the other using the IRW-pADMM algorithm. Specifically, at each iteration k, the fixed-effect coefficients are updated by solving

{\hat{β}}^{(k + 1)} = {\arg \min}_{β} L (β, {\hat{α}}^{(k)})

where the response variable is adjusted as

{Y_{i j}}^{(k)} = Y_{i j} - {z_{i j}}^{T} {\hat{α}}_{i}^{(k)}

. The updated

{\hat{β}}^{(k + 1)}

is then substituted into the model to update the random-effects coefficients by solving

{\hat{α}}^{(k + 1)} = {\arg \min}_{α} L ({\hat{β}}^{(k + 1)}, α)

. The residuals are recalculated as

{r_{i j}}^{(k + 1)} = Y_{i j} - {x_{i j}}^{T} {\hat{β}}^{(k + 1)}

. This alternating optimization procedure is repeated until convergence, which is determined when

∥ {\hat{β}}^{(k + 1)} - {\hat{β}}^{(k)} ∥_{2} \leq ε

.

Algorithm 2 Two-step DERF-QR algorithm.

1:: Input: $X \in R^{n m \times p}, Y \in R^{n}, Z \in R^{n m \times q}, τ, ε > 0$ , maxiter
2:: Initialize: ${\hat{β}}_{i}^{(0)} = 0, k \leftarrow 0$
3:: ${\hat{α}}^{(0)} = {\arg \min}_{α} L ({\hat{β}}^{(0)}, α)$
4:: while $k < maxiter$ and $∥ {\hat{β}}^{(k + 1)} - {\hat{β}}^{(k)} ∥_{2} > ε$ do
5:: ${\hat{β}}^{(k + 1)} = {\arg \min}_{β} L (β, {\hat{α}}^{(k)})$
6:: ${\hat{α}}^{(k + 1)} = {\arg \min}_{α} L ({\hat{β}}^{(k + 1)}, α)$
7:: $k \leftarrow k + 1$
8:: end while
9:: return ${\hat{β}}^{(k)}$

To proceed further, it is critical to select appropriate penalty parameters. The Schwarz information criterion (SIC) and the generalized approximate cross-validation (GACV) are common selection criteria for quantile regression [21,22,23]. In this paper, we utilize the extended Bayesian information criterion (EBIC) to select penalty parameters [24]:

\begin{matrix} EBIC = \ln (S_{M}) + | M | + γ_{p} \ln (p), \end{matrix}

(11)

where

S_{M} = \sum_{i = 1}^{n} \sum_{j = 1}^{m} ρ_{τ} (Y_{i j} - {x_{i j}}^{T} β - {z_{i j}}^{T} α_{i})

, and

| M |

are the nonzero parameters in the model. p denotes the dimensionality of the variables. The criteria are extended from BIC to EBIC by adding a penalty term to the dimensionality of the variables [24]. The regularization term

γ_{p} \ln (p)

with parameter

γ_{p} > 0

, which is imposed on the model by restricting the dimensionality of the variables, improves EBIC in the high-dimensional cases. Specifically, each iteration in the two-step DERF-QR algorithm can be viewed as a univariate penalized quantile regression problem, in which only one parameter is updated at a time. In this paper, we take

ε = 10^{- 4}

and

ε_{1}, ε_{2} = 10^{- 4}

in Algorithms 1 and 2.

3. Monte Carlo Simulation

The simulated data are generated from the following model:

\begin{matrix} Y_{i j} & = β_{0} + x_{i j}^{T} β + z_{i j}^{T} α_{i} + υ ε_{i j}, i = 1, 2, \dots, n, j = 1, 2, \dots, m \end{matrix}

(12)

Here, let

x_{i j} = {(x_{i j 1}, \dots, x_{i j 8})}^{T}

,

β_{0} = 0

,

β = {(3, 1, 6, 0, 0, 0, 0, 0)}^{T}

,

n = 30

, and

m = 10

.

X_{k} = {(x_{11 k}, \dots, x_{n m k})}^{T} \sim N_{n m} (0, I)

is the covariate vector, and

ρ^{| l - k |}

is the correlation between

X_{l}

and

X_{k}

. For random effects, define

z_{i j} = {(1, x_{i j 1}, x_{i j 2}, x_{i j 3})}^{T}

for the first three variables, and let

α_{i} = {(α_{i 1}, α_{i 2}, α_{i 3}, α_{i 4})}^{T} \sim N (0, D)

, where

D = d i a g (1, 1, 0, 0)

.

υ

denotes the signal-to-noise ratio (SNR). In the following, we introduce PE, TP, and TN to evaluate the variable selection and parameter estimation in the model.

(1) Prediction Error (PE):

PE = {(\hat{β} - β)}^{T} Σ (\hat{β} - β),

(13)

where

\hat{β}

is the estimate of

β

,

\sum = {(ρ^{| l - k |})}_{8 \times 8} .

(2) True positives (TP): The number of variables selected correctly.

(3) True negatives (TN): The number of variables excluded correctly.

We show the performance of the DERF-QR method for different correlation coefficients and different sample sizes, that is, there are four scenarios, as follows: (1)

n = 30

,

ρ = 0.5

, (2)

n = 30

,

ρ = 0.8

, (3)

n = 50

,

ρ = 0.5

, (4)

n = 50

,

ρ = 0.8

at quantiles

τ = 0.25

,

0.5

, and

0.75

, respectively. In addition, we discuss the cases

σ = 0.2

and 2, respectively. Here,

ε_{i j} \sim N (0, 1)

and

υ = 1

. We select the penalty parameters by EBIC and take

γ_{p} = 1

. We take the coefficient

β_{l} = 0

if

| β_{l} | < 10^{- 4}

, and the same for

α_{i}

. Table 1 reports the mean values of PE, standard deviation (SD), and the mean values of TP and TN based on 100 replications.

The simulation shows that we could select the right variable by DERF-QR in all experimental conditions in Table 1. Also, PE decreases and TN decreases slightly when sample size n increases. It is observed that stronger correlation

ρ = 0.8

yields smaller PE values, compared with

ρ = 0.5

. The DERF-QR method in strong-correlation scenarios shows robust performance due to its adaptive penalty weights. This improves coefficient estimation accuracy under strong variable correlations. Parameter

σ

significantly influences the result. The PE value of

σ = 2

is greater than that of

σ = 0.2

. Additionally, we observe that the model is less sensitive to variations of the parameter

σ

at the median quantile. The median quantile regression also yields a smaller PE value, compared with other quantiles. The simulation results are presented in Figure 2.

Consider different SNR

υ = 1

and

υ = 2

, and define three different random-effect matrices,

D_{1} = d i a g (1, 1, 0, 0),

D_{2} = d i a g (2, 2, 0, 0)

and

D_{3} = d i a g (3, 3, 0, 0)

, to investigate the effects of different SNRs and random effects on the model. Take

τ = 0.5

,

n = 30

and

ρ = 0.5

. Table 2 indicates the average values of PE, TP, and TN in 100 simulations.

In Table 2, as the SNR becomes greater and the influence of random effects increases, PE and SD increase gradually, while TN decreases slightly. If the SNR becomes greater, the model is less accurate and loses stability. In this case, it is more likely to select incorrect variables. However, we observe that even in the case

υ = 2

and

D = D_{3}

, there are still relatively low errors for the model, and most incorrect variables are excluded. In general, an increase in SNR and random effects significantly impacts the coefficient estimation and variable selection. Meanwhile, the model is robust and effective in the process.

To investigate the performance of DERF-QR and other methods, we discuss the following four scenarios with different error distributions.

ε_{i j} \sim N (0, 1),

ε_{i j} \sim t (3),

ε_{i j} \sim C a u c h y (1, 0)

and

ε_{i j} \sim 0.8 N (0, 1) + 0.2 N (0, 5^{2}),

which denotes

ε_{i j} \sim mixnorm .

We would make comparisons among the following four methods: Quantile Regression without penalization (QR), Doubly penalized Lasso regularized Quantile Regression (DLQR), Doubly penalized SCAD regularized Quantile Regression (DSQR), and the DERF-QR model with the parameter

σ = 0.2

proposed in this paper. Consider two distinct scenarios as follows:

\begin{matrix} s p a r s e m o d e l β = {(3, 1, 6, 0, 0, 0, 0, 0)}^{T}, \\ d e n s e m o d e l β = {(0.75, 0.75, 0.75, 0.75, 0.75, 0.75, 0.75, 0.75)}^{T} . \end{matrix}

Repeat 100 times, and take

τ = 0.5

,

n = 30

and

ρ = 0.5

. Table 3 and Table 4 depict the results, and Figure 3 and Figure 4 show the boxplots of PE for these two models.

It can be observed from Table 3 that we could select the right variables and derive lower PE by using DLQR, DSQR, and DERF-QR in the sparse model. The DERF-QR method shows robust coefficient estimation performance. Notably, it excludes up to 80% of incorrect variables across all cases. There are four different error distributions in Table 3, where there are a significant number of outliers in the Cauchy distribution with heavy tails. DERF-QR exhibits relatively greater PE in the Cauchy distribution but maintains the lowest PE among all methods across other distributions. It is unstable to exclude incorrect variables by DSQR, even though DSQR performs well in coefficient estimation and its PE is close to that of DERF-QR. The PE in DLQR is relatively greater, even though it could exclude incorrect variables clearly. Generally, DERF-QR is the optimal one of these four methods.

In the dense model, DERF-QR still performs the best among the four methods. We could select the correct variables from all by these four methods. In most cases, the Cauchy distribution shows smaller errors, while the mixnorm distribution results in relatively greater errors. For Cauchy distribution, we obtain lower PE by DQR and DSQR, compared with DERF-QR. In other distributions, the PE of DSQR is similar to that of DERF-QR. The PE of DLQR is slightly greater than the others in all cases.

4. Financial Data Analysis

The DERF-QR method is developed in the financial statements and balance sheets of 12 companies from 2009 to 2023 [25]. Due to missing data, we select 10 companies from 2009 to 2022 for simulation. The revenue is regarded as the response variable. We aim to find the factors influencing the revenue. Consider 20 variables as explanatory variables as in Table 5.

In Table 5, X₂₀, X₂₁, X₂₂, X₂₃, X₂₄, and X₂₅ denote the numerical variables for the company’s six categories. Specifically, each category is represented using binary indicators, where applicable categories are marked with a 1 and non-applicable ones with a 0. The other explanatory variables are continuous. Consider the following linear mixed-effects model:

\begin{matrix} Y_{i j} & = x_{i j}^{T} β + z_{i j}^{T} α_{i} + ε_{i j}, i = 1, 2, \dots, 10, j = 1, 2, \dots, 14, \end{matrix}

(14)

where

Y_{i j}

denotes the revenue of company i in year j. For the random-effects coefficients

α_{i}

, we consider that random effects exist in all variables, i.e.,

z_{i j}^{T} = x_{i j}^{T} .

We empirically evaluated DERF-QR under

σ = 0.2

and

σ = 2

, at quantile levels

τ = 0.25

,

0.5

, and

0.75

. For comparison, we also included the DLQR method. The mean squared error (MSE) is defined as

MSE = \frac{1}{n m} \sum_{i = 1}^{n} \sum_{j = 1}^{m} {(Y_{i j} - {\hat{Y}}_{i j})}^{2}

and the number of non-zero coefficients (model size, MS) is used to assess variable selection performance.

Figure 5 presents the estimated coefficients across quantiles. Both DERF-QR and DLQR successfully eliminate most irrelevant variables. Notably, all methods consistently selected X₆ and X₉, while the other variables are very small or shrink to zero. This indicates that gross profit and EBITDA have a significant impact on revenue. We find that variable selection differs across quantile levels. At

τ = 0.75

, all three methods tend to select X₆, X₉, and X₁₄.

Table 6 shows the MSE and MS for each method. When

σ = 0.2

, DERF-QR achieves the lowest MSE at

τ = 0.25

and

τ = 0.75

. With

σ = 2

, DERF-QR attains the lowest MSE at

τ = 0.5

and also yields the smallest MS at both

τ = 0.5

and

τ = 0.75

. DERF-QR consistently yields the lowest MSE across quantile levels. Overall, DERF-QR consistently achieves superior estimation accuracy and variable selection across quantile levels.

5. Discussion

We conducted Monte Carlo simulations of the proposed method under various conditions, including different SNRs, sample sizes n, correlation coefficients

ρ

, and sparsity levels. The results demonstrate that DERF-QR consistently achieves the lowest PE and highest TN in most scenarios, indicating its strong performance in both variable selection and parameter estimation.

Despite DERF-QR’s strong overall performance, its estimation accuracy declines under Cauchy-distributed errors. This can be attributed to the heavy-tailed nature of the Cauchy distribution, which introduces more extreme outliers and may compromise the robustness of coefficient estimation. Therefore, evaluating the robustness of DERF-QR and related methods under heavy-tailed error distributions remains an important direction for future research.

The simulation results show that

σ

significantly affects DERF-QR’s estimation performance. We conducted simulations under the setting to examine how different values of

σ

affect the PE at

τ = 0.25

,

0.5

, and

0.75

, respectively. Taking

n = 30

,

ρ = 30

,

ε_{i j} \sim N (0, 1)

, Figure 6 shows the trend of how the PE varies with increasing

σ

at different quantile levels. As

σ

increases, the PE increases across all three quantile levels. When

τ = 0.5

, the impact of

σ

on PE is relatively smaller compared to the other quantiles. Based on these observations, we recommend selecting the

σ

within a relatively small range through cross-validation in practical applications.

Additionally, we focus exclusively on EBIC performance for

λ_{β}

and

λ_{α}

selection. Other parameter selection approaches, including the Schwarz information criterion (SIC), generalized approximate cross-validation (GACV), and cross-validation methods, were not evaluated in our comparative analysis. Future comparisons of these methods would strengthen tuning strategy evaluation.

6. Conclusions

In this paper, we established the DERF-QR method for both parameter estimation and variable selection in the linear mixed-effects model. We studied a non-convex and smooth error function regularization, which is quite different from traditional regularization methods for variable selection in quantile regression for this model. We derived the asymptotic behavior of the penalty, and viewed the penalty as a close approximation to

L_{0}

regularization as

σ \to 0^{+}

.

We conducted simulations under various scenarios. DERF-QR yielded lower PE across various signal-to-noise ratios and random-effect structures. Even under the challenging setting of

υ = 2

and

D = D_{3}

, it retained low PE and successfully eliminated 80% irrelevant variables. When

n = 50

and

ρ = 0.8

, PE remained lower than under

n = 30

and

ρ = 0.5

, indicating strong stability in the presence of larger sample sizes and high correlation. Compared with other penalized quantile regression methods, DERF-QR consistently achieved the lowest PE and the highest TN in both sparse and dense settings, demonstrating robust performance in coefficient estimation and variable selection. We also developed this method for financial data and showed that it is practically effective.

Our proposed DERF-QR method effectively extends penalized quantile regression for linear mixed-effects models, demonstrating excellent performance in both parameter estimation and variable selection. There are still some open issues for future research in this field. For instance, the IRW-pADMM algorithm involves parameter selection, and selecting inappropriate parameters could result in slow convergence during the iteration process. Therefore, improving the efficiency of the algorithm remains an important topic for future work. Additionally, this approach can be further applied and studied in other models, such as spatial mixed-effects models or generalized linear mixed models.

Author Contributions

Methodology, Z.H.; Writing—original draft, Z.H.; Writing—review & editing, X.H.; Project administration, X.H. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by [Double First Class Special Project, Jiangsu, China] grant number [1014/B22017010224] And The APC was funded by [Hohai University].

Institutional Review Board Statement

Not applicable.

Data Availability Statement

The original contributions presented in the study are included in the article, further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflict of interest.

References

Khan, I.; Muhammad, I.; Sharif, A.; Khan, I.; Ji, X. Unlocking the potential of renewable energy and natural resources for sustainable economic growth and carbon neutrality: A novel panel quantile regression approach. Renew. Energy 2024, 221, 119779. [Google Scholar] [CrossRef]
Qu, L.; Hao, M.; Sun, L. Sparse composite quantile regression with ultrahigh-dimensional heterogeneous data. Stat. Sin. 2022, 32, 459–475. [Google Scholar]
Cheng, C.; Feng, X.; Huang, J.; Liu, X. Regularized projection score estimation of treatment effects in high-dimensional quantile regression. Stat. Sin. 2022, 32, 23–41. [Google Scholar] [CrossRef]
Tan, K.M.; Wang, L.; Zhou, W.X. High-dimensional quantile regression: Convolution smoothing and concave regularization. J. R. Stat. Soc. Ser. B Stat. Methodol. 2022, 84, 205–233. [Google Scholar] [CrossRef]
Wang, M.; Zhang, X.; Wan, A.T.; You, K.; Zou, G. Jackknife model averaging for high-dimensional quantile regression. Biometrics 2023, 79, 178–189. [Google Scholar] [CrossRef]
Powell, D. Quantile regression with nonadditive fixed effects. Empir. Econ. 2022, 63, 2675–2691. [Google Scholar] [CrossRef]
Belloni, A.; Chen, M.; Madrid Padilla, O.H.; Wang, Z. High-dimensional latent panel quantile regression with an application to asset pricing. Ann. Stat. 2023, 51, 96–121. [Google Scholar] [CrossRef]
Koenker, R.; Bassett, G., Jr. Regression quantiles. Econom. J. Econom. Soc. 1978, 46, 33–50. [Google Scholar] [CrossRef]
Lamarche, C. Quantile regression for panel data and factor models. In Oxford Research Encyclopedia of Economics and Finance; Oxford University Press: Oxford, UK, 2021. [Google Scholar]
Tibshirani, R. Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Ser. B Stat. Methodol. 1996, 58, 267–288. [Google Scholar] [CrossRef]
Fan, J.; Li, R. Variable selection via nonconcave penalized likelihood and its oracle properties. J. Am. Stat. Assoc. 2001, 96, 1348–1360. [Google Scholar] [CrossRef]
Koenker, R. Quantile regression for longitudinal data. J. Multivar. Anal. 2004, 91, 74–89. [Google Scholar] [CrossRef]
Li, H.; Liu, Y.; Luo, Y. Double penalized quantile regression for the linear mixed effects model. J. Syst. Sci. Complex. 2020, 33, 2080–2102. [Google Scholar] [CrossRef]
Bondell, H.D.; Krishna, A.; Ghosh, S.K. Joint variable selection for fixed and random effects in linear mixed-effects models. Biometrics 2010, 66, 1069–1077. [Google Scholar] [CrossRef] [PubMed]
Qi, X.; Xu, X.; Feng, Z.; Peng, H. Component selection and variable selection for mixture regression models. Comput. Stat. Data Anal. 2025, 206, 108124. [Google Scholar] [CrossRef]
Guo, W.; Lou, Y.; Qin, J.; Yan, M. A novel regularization based on the error function for sparse recovery. J. Sci. Comput. 2021, 87, 31. [Google Scholar] [CrossRef]
Candes, E.J.; Wakin, M.B.; Boyd, S.P. Enhancing sparsity by reweighted ℓ₁ minimization. J. Fourier Anal. Appl. 2008, 14, 877–905. [Google Scholar] [CrossRef]
Zhang, C.H. Nearly unbiased variable selection under minimax concave penalty. Ann. Stat. 2010, 38, 894–942. [Google Scholar] [CrossRef]
Boyd, S.; Parikh, N.; Chu, E.; Peleato, B.; Eckstein, J. Distributed optimization and statistical learning via the alternating direction method of multipliers. Found. Trends^® Mach. Learn. 2011, 3, 1–122. [Google Scholar]
Gu, Y.; Fan, J.; Kong, L.; Ma, S.; Zou, H. ADMM for high-dimensional sparse penalized quantile regression. Technometrics 2018, 60, 319–331. [Google Scholar] [CrossRef]
Schwarz, G. Estimating the dimension of a model. Ann. Stat. 1978, 6, 461–464. [Google Scholar] [CrossRef]
Yuan, M. GACV for quantile smoothing splines. Comput. Stat. Data Anal. 2006, 50, 813–829. [Google Scholar] [CrossRef]
Koenker, R.; Ng, P.; Portnoy, S. Quantile smoothing splines. Biometrika 1994, 81, 673–680. [Google Scholar] [CrossRef]
Chen, J.; Chen, Z. Extended Bayesian information criteria for model selection with large model spaces. Biometrika 2008, 95, 759–771. [Google Scholar] [CrossRef]
R. Rish59, Financial Statements of Major Companies (2009–2023), Kaggle. 2023. Available online: https://www.kaggle.com/datasets/rish59/financial-statements-of-major-companies2009-2023 (accessed on 27 May 2025).

Figure 1. Common penalized proximal operators.

Figure 2. PE for different correlation coefficients and sample size.

Figure 3. Boxplots of different methods in the sparse model.

Figure 4. Boxplots of different methods in the dense model.

Figure 5. Coefficient estimation results across different quantile levels.

Figure 6. Correlation coefficients and sample size.

Table 1. PE, TP, and TN for different correlation coefficients and sample sizes.

$τ$	Sample Size	Correlation Coefficient	$σ$	PE (SD)	TP	TN
0.25	n = 30	$ρ = 0.5$	$0.2$	0.110 (0.101)	3	4.36
			2	0.290 (0.201)	3	4.36
		$ρ = 0.8$	$0.2$	0.123 (0.101)	3	3.98
			2	0.223 (0.138)	3	3.72
	n = 50	$ρ = 0.5$	$0.2$	0.067 (0.071)	3	3.89
			2	0.120 (0.092)	3	3.68
		$ρ = 0.8$	$0.2$	0.073 (0.066)	3	3.55
			2	0.103 (0.085)	3	3.33
0.5	n = 30	$ρ = 0.5$	$0.2$	0.097 (0.088)	3	3.97
			2	0.261 (0.185)	3	4.48
		$ρ = 0.8$	0.2	0.088 (0.081)	3	3.87
			2	0.191 (0.123)	3	3.81
	n = 50	$ρ = 0.5$	$0.2$	0.059 (0.046)	3	4.01
			2	0.119 (0.080)	3	4.03
		$ρ = 0.8$	$0.2$	0.074 (0.064)	3	3.27
			2	0.096 (0.079)	3	3.33
0.75	n = 30	$ρ = 0.5$	$0.2$	0.130 (0.105)	3	4.01
			2	0.321 (0.229)	3	4.11
		$ρ = 0.8$	$0.2$	0.110 (0.094)	3	4.10
			2	0.204 (0.129)	3	3.81
	n = 50	$ρ = 0.5$	$0.2$	0.064 (0.055)	3	3.85
			2	0.122 (0.091)	3	3.70
		$ρ = 0.8$	$0.2$	0.080 (0.068)	3	3.47
			2	0.117 (0.091)	3	3.53

Table 2. Average values of PE, TP, and TN in different conditions.

$υ$	Random-Effect Matrix	PE (SD)	TP	TN
1	$D_{1}$	0.097 (0.088)	3	3.97
	$D_{2}$	0.186 (0.166)	3	4.63
	$D_{3}$	0.274 (0.264)	3	4.39
2	$D_{1}$	0.286 (0.234)	3	4.35
	$D_{2}$	0.343 (0.288)	3	4.40
	$D_{3}$	0.486 (0.394)	2.99	4.25

Table 3. Comparison among different methods in the sparse model.

Error Distribution	Method	PE (SD)	TP	TN
$ε_{i j} \sim N (0, 1)$	QR	0.182 (0.100)	3	0
	DLQR	0.455 (0.324)	3	3.95
	DSQR	0.101 (0.094)	3	2.35
	DERF-QR	0.085 (0.071)	3	4.38
$ε_{i j} \sim Cauchy (1, 0)$	QR	0.063 (0.061)	3	0
	DLQR	0.526 (0.386)	3	3.90
	DSQR	0.088 (0.105)	3	3.57
	DERF-QR	0.128 (0.138)	3	4.56
$ε_{i j} \sim t (3)$	QR	0.264 (0.139)	3	0
	DLQR	0.565 (0.328)	3	3.99
	DSQR	0.145 (0.113)	3	1.79
	DERF-QR	0.078 (0.069)	3	4.51
$ε_{i j} \sim mixnorm$	QR	0.242 (0.144)	3	0
	DLQR	0.582 (0.371)	3	3.94
	DSQR	0.146 (0.102)	3	2.51
	DERF-QR	0.110 (0.098)	3	4.44

Table 4. Comparison among different methods in the dense model.

Error Distribution	Method	PE (SD)	TP
$ε_{i j} \sim N (0, 1)$	QR	0.165 (0.081)	8
	DLQR	1.02 (0.366)	8
	DSQR	0.173 (0.133)	8
	DERF-QR	0.169 (0.113)	7.99
$ε_{i j} \sim Cauchy (1, 0)$	QR	0.078 (0.046)	8
	DLQR	1.153 (0.488)	8
	DSQR	0.091 (0.108)	8
	DERF-QR	0.186 (0.115)	7.99
$ε_{i j} \sim t (3)$	QR	0.245 (0.116)	8
	DLQR	1.243 (0.505)	8
	DSQR	0.193 (0.113)	7.98
	DERF-QR	0.178 (0.093)	8
$ε_{i j} \sim mixnorm$	QR	0.263 (0.146)	8
	DLQR	1.34 (0.493)	7.98
	DSQR	0.252 (0.184)	8
	DERF-QR	0.22 (0.143)	7.96

Table 5. Explanatory variables.

Variable	Name	Variable	Name
X₁	CFF Activities	X₁₁	Market Cap
X₂	CFI Activities	X₁₂	Net Income
X₃	CFO Activities	X₁₃	Net Profit Margin
X₄	Current Ratio	X₁₄	Number of Employees
X₅	Debt/Equity Ratio	X₁₅	ROA
X₆	EBITDA	X₁₆	ROE
X₇	Earnings/Share	X₁₇	ROI
X₈	Free Cash Flow/Share	X₁₈	Return on Tangible Equity
X₉	Gross Profit	X₁₉	Shareholder Equation
X₁₀	Inflation Rate	X₂₀	Bank
X₂₁	ELEC	X₂₂	FOOD
X₂₃	IT	X₂₄	LOGI
X₂₅	Manufacturing

Table 6. MSE and MS for different methods at various quantile levels.

Method	$τ = 0.25$		$τ = 0.5$		$τ = 0.75$
Method	MSE	MS	MSE	MS	MSE	MS
DERFQR ( $σ = 0.2$ )	0.230	11	0.048	11	0.060	13
DERFQR ( $σ = 2$ )	0.256	11	0.032	11	0.061	11
DLQR	0.266	10	0.046	15	0.063	12

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Hang, Z.; He, X. Error-Function-Based Penalized Quantile Regression in the Linear Mixed Model. Appl. Sci. 2025, 15, 7461. https://doi.org/10.3390/app15137461

AMA Style

Hang Z, He X. Error-Function-Based Penalized Quantile Regression in the Linear Mixed Model. Applied Sciences. 2025; 15(13):7461. https://doi.org/10.3390/app15137461

Chicago/Turabian Style

Hang, Zelin, and Xiuli He. 2025. "Error-Function-Based Penalized Quantile Regression in the Linear Mixed Model" Applied Sciences 15, no. 13: 7461. https://doi.org/10.3390/app15137461

APA Style

Hang, Z., & He, X. (2025). Error-Function-Based Penalized Quantile Regression in the Linear Mixed Model. Applied Sciences, 15(13), 7461. https://doi.org/10.3390/app15137461

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Error-Function-Based Penalized Quantile Regression in the Linear Mixed Model

Abstract

1. Introduction

2. Model and Algorithm

3. Monte Carlo Simulation

4. Financial Data Analysis

5. Discussion

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI