Variable Selection for Additive Quantile Regression with Nonlinear Interaction Structures

Bai, Yongxin; Jiang, Jiancheng; Tian, Maozai

doi:10.3390/math13091522

Open AccessArticle

Variable Selection for Additive Quantile Regression with Nonlinear Interaction Structures

by

Yongxin Bai

¹,

Jiancheng Jiang

²

and

Maozai Tian

^3,*

¹

School of Science, Beijing Information Science and Technology University, Beijing 100872, China

²

Department of Mathematics and Statistics & School of Data Science, University of North Carolina at Charlotte, Charlotte, NC 28223, USA

³

Center for Applied Statistics, School of Statistics, Renmin University of China, Beijing 100192, China

^*

Author to whom correspondence should be addressed.

Mathematics 2025, 13(9), 1522; https://doi.org/10.3390/math13091522

Submission received: 28 March 2025 / Revised: 30 April 2025 / Accepted: 3 May 2025 / Published: 5 May 2025

(This article belongs to the Section D1: Probability and Statistics)

Download

Browse Figures

Versions Notes

Abstract

In high-dimensional data analysis, main effects and interaction effects often coexist, especially when complex nonlinear relationships are present. Effective variable selection is crucial for avoiding the curse of dimensionality and enhancing the predictive performance of a model. In this paper, we introduce a nonlinear interaction structure into the additive quantile regression model and propose an innovative penalization method. This method considers the complexity and smoothness of the additive model and incorporates heredity constraints on main effects and interaction effects through an improved regularization algorithm under marginality principle. We also establish the asymptotic properties of the penalized estimator and provide the corresponding excess risk. Our Monte Carlo simulations illustrate the proposed model and method, which are then applied to the analysis of Parkinson’s disease rating scores and further verify the effectiveness of a novel Parkinson’s disease (PD) treatment.

Keywords:

additive model; interaction structures; tensor product; quantile regression

MSC:

62G08

1. Introduction

Quantile regression [1] has become a widely used tool in both empirical studies and the theoretical analysis of conditional quantile functions. Additive models offer greater flexibility by expressing linear predictors as the sum of nonparametric functions of each covariate, which generally results in lower variance compared to fully nonparametric models. However, many practical problems require the consideration of interactions between covariates, an aspect that has been extensively explored in linear and generalized linear models [2,3], where incorporating interactions has been shown to improve prediction accuracy. Despite this, there is limited research on additive quantile regression models that explicitly account for interaction structures, particularly in the context of variable selection.

While nonparametric additive quantile regression models have been extensively studied in the literature [4,5,6,7,8], with recent advancements in the analysis of longitudinal data [9] and dynamic component modeling [10]. Specifically, Ref. [9] explored a partially linear additive model for longitudinal data within the quantile regression framework, utilizing quadratic inference functions to account for within-subject correlations. In parallel, Ref. [10] proposed a quantile additive model with dynamic component functions, introducing a penalization-based approach to identify non-dynamic components. Despite these advances, the integration of interaction effects remains relatively underexplored in the context of nonparametric additive quantile regression models. Building upon the existing literature, this paper considers an additive quantile regression model with a nonlinear interaction structure.

When p, the number of main effect terms and interaction effect terms, is large, many additive methods are not feasible since their implementation requires storing and manipulating the entire

O (p^{2}) \times n

design matrix; thus, variable selection becomes necessary. Refs. [11,12] proposed component selection and smoothing operator (COSSO) and an adaptive version of the COSSO algorithm (ACOSSO) to fit the additive model with interaction structures, respectively. But these methods violated the heredity condition, whereas [13] took the heredity constraint into account and proposed a penalty

λ (\sum_{j = 1}^{p} ∥ f_{j} ∥_{2} + \sum_{j = 1}^{p} \sum_{k = j + 1}^{p} ∥ f_{j k} ∥_{2})

on the empirical

L_{2}

norm of the main effects and interaction effects. This method shrinks the interactions, depending on the main effects that are already present in the model [9,10].

Several regularization penalties, especially Lasso, have been widely used to shrink the coefficients of main effects and interaction effects to achieve a sparse model. However, existing methods treated interaction terms and main terms similarly, which may select an interaction term but not the corresponding main terms, thus making them difficult to interpret in practice. There is a natural hierarchy among the variables in the model with interaction structures. Refs. [14,15] proposed a two-stage Lasso method to select important main and interaction terms. However, this approach is inefficient, as the solution path in the second stage heavily depends on the selection results from the first stage. To address this issue, several regularization methods [16,17,18] have been proposed that employ special reparametrizations of regression coefficients and penalty functions under the hierarchical principle. Ref. [19] introduced strong and weak heredity constraints into models with interactions. They proposed a Lasso penalty with convex constraints that produces sparse coefficients while ensuring that the strong or weak heredity is satisfied. However, existing algorithms often suffer from slow convergence, even with a moderate number of predictors. Ref. [20] proposed a group-regularized estimation method under both strong and weak heredity constraints and developed a computational algorithm that guarantees the convergence of the iterates. Ref. [21] extended the linear interaction model to a quadratic model, accounting for all two-term interactions, and proposed the Regularization Path Algorithm under the Marginality Principle (RAMP), which efficiently computes the regularization solution path while maintaining strong or weak heredity constraints.

Recent advances in Bayesian hierarchical modeling for interaction selection have been driven by innovative prior constructions. Ref. [22] developed mixture priors that link interaction inclusion probabilities to main effect strengths, automatically enforcing heredity constraints. Ref. [23] introduced structured shrinkage priors using hierarchical Laplace distributions to facilitate group-level interaction selection. Ref. [24] incorporated heredity principles into SSVS frameworks through carefully designed spike-and-slab priors. These methodological developments have enhanced the ability to capture complex dependency structures in interaction selection while preserving Bayesian advantages, with demonstrated applications across various domains, including spatiotemporal analysis.

However, all these works assumed that the effects of all covariates can be captured in a simple linear form, which does not always hold in practice. To handle this issue, several authors have proposed nonparametric and semiparametric interaction model. For example, Refs. [25,26] considered the variable selection and estimation procedure of single-index models with interactions for functional data and longitudinal data. Some recent work on semiparametric single-index models with interactions included [27,28], among others.

Driven by both practical needs and theoretical considerations, this paper makes the following contributions: (1) We propose a method for variable selection in quantile regression models that incorporates nonlinear interaction structures; (2) We modify the RAMP algorithm to extend its applicability to additive models with nonlinear interactions, which enhances its capability to adeptly handle complex nonlinear interactions while also preserving the hierarchical relationship between main effects and interaction effects throughout the variable selection process. This work addresses the challenges posed by complex nonlinear interactions while ensuring that the inherent hierarchical structure is maintained, thereby providing a robust framework for more accurate and interpretable models.

The rest of this paper is organized as follows. In Section 2, we present the additive quantile regression with nonlinear interaction structures and discuss the properties of the oracle estimator. In Section 3, we present a sparsity smooth penalized method fitted by regularization algorithm under marginality principle and derive its oracle property. In Section 4, simulation studies are provided to demonstrate the outperformance of the proposed method. We illustrate an application of the proposed method on a real dataset in Section 5, and a conclusion is presented in Section 6.

2. Additive Quantile Regression with Nonlinear Interaction Structures

Suppose that

{(Y_{i}, x_{i}) : i = 1, \dots, n}

is an independent and identically distributed (iid) sample, where

x_{i} = (x_{i 1}, \dots, x_{i p})

is a p-dimensional vector of covariates. The

τ

th (

0 < τ < 1

) conditional quantile of

Y_{i}

given

x_{i}

is defined as

Q_{Y_{i} | x_{i}} (τ) = inf {t : F (t | x_{i}) \geq τ}

, where

F (\cdot | x_{i})

is the conditional distribution function of

Y_{i}

given

x_{i}

. We consider the following additive nonlinear interaction model for the conditional quantile function:

Q_{Y_{i} | x_{i}} (τ) = \sum_{j = 1}^{p} f_{j} (x_{i j}) + \sum_{1 \leq j < k \leq p} f_{j k} (x_{i j}, x_{i k}),

(1)

where the unknown real-valued two-variate functions

f_{j k} (x_{i j}, x_{i k})

represent the pairwise interaction terms between

x_{i j}

and

x_{i k}

. We assume that

f_{j k} (a, b) = f_{k j} (b, a)

for all a and b and that all

j \neq k

and

x_{i j} \in [0, 1]

for all i and j. Let

ϵ_{i} = Y_{i} - Q_{Y_{i} | x_{i}} (τ)

; then,

ϵ_{i}

satisfies

P (ϵ_{i} \leq 0 | x_{i}) = τ

and we may also write

Y_{i} = \sum_{j = 1}^{p} f_{j} (x_{i j}) + \sum_{1 \leq j < k \leq p} f_{j k} (x_{i j}, x_{i k}) + ϵ_{i}

, where

ϵ_{i}

is the random error. For identification, we assume the

τ

quantile of

f_{j} (x_{i j})

and

f_{j k} (x_{i j}, x_{i k})

for

1 \leq j < k \leq p

to be zero (see [29]).

Let

{φ_{1}, φ_{2}, \dots}

denote the preselected orthonormal basis with respect to the Lebesgue measure on the unit interval. We center all the candidate functions, so the basis functions are centered as well. We approximate each function

f_{j} (x_{i j})

by a linear combination of the basis functions, i.e.,

f_{j} (x_{i j}) \approx \sum_{l = 1}^{L_{n}} φ_{j l} (x_{i j}) β_{j l},

where

{φ_{j l}, l = 1, \dots, L_{n}}

s are centered orthonormal bases. Let

Ψ_{i, j} = {(φ_{j 1} (x_{i j}), \dots, φ_{j L_{n}} (x_{i j}))}^{⊤}

; the main terms can be expressed as

f_{j} (x_{i j}) = Ψ_{i, j}^{⊤} β_{j}

, where

β_{j} = {(β_{j 1}, \dots, β_{j L_{n}})}^{⊤}

denotes the

L_{n}

-dimensional vector of basis coefficients for the j-th main terms. We consider the tensor product of the interaction terms

f_{j k}

as a surface that can be approximated by

f_{j k} (x_{i j}, x_{i k}) \approx \sum_{s = 1}^{L_{n}} \sum_{d = 1}^{L_{n}} φ_{j s} (x_{i j}) φ_{k d} (x_{i k}) β_{j k, s d} .

To simplify the expression, it is computationally efficient to re-express the surface in vector notation as

f_{j k} (x_{i j}, x_{i k}) = Φ_{j k}^{⊤} β_{j k}

, where

Φ_{i, j k} = Ψ_{i, j} \otimes Ψ_{i, k}

, and

β_{j k} = v e c (Γ_{j k}^{⊤})

, where

Γ_{j k} = [β_{j k, s d}], s = 1, \dots, L_{n}, d = 1, \dots, L_{n}

is the matrix with elements

β_{j k, s d}

. Here, ⊗ denotes the Kronecker product. We can now express (1) as

Q_{Y_{i} | x_{i}} (τ) = \sum_{j = 1}^{p} Ψ_{i, j}^{⊤} β_{j} + \sum_{1 \leq j < k \leq p} Φ_{i, j k}^{⊤} β_{j k}

(2)

2.1. Oracle Estimator

For high-dimensional inference, it is often assumed that p is large, but only a small fraction of the main effects and interaction terms are present in the true model. Let

M = {j : f_{j} \neq 0 for 1 \leq j \leq p}

and

I = {(j, k) : 1 \leq j < k \leq p and f_{j k} \neq 0}

be the index set of nonzero main effects and interaction effects, and

q = | M |

and

s = | I |

be the cardinality of

M

and

I

, respectively. That is, we assume that the first q of

β_{j}, j = 1, \dots, p

are nonzero and the remaining components are zero. Hence, we can write

β_{0} = {(β_{01}^{⊤}, 0_{(p - q) L_{n}}^{⊤})}^{⊤}

, where

β_{01} = {(β_{1}^{⊤}, \dots, β_{q}^{⊤})}^{⊤}

. Further, we assume that the first s of pairwise interaction

β_{j k}

corresponding to

β_{01}

are nonzero and the remaining components are zero. Thus, we write

β_{00} = {(β_{001}^{⊤}, 0_{\frac{p (p - 1) - s (s - 1)}{2} L_{n}^{2}}^{⊤})}^{⊤}

, where

β_{001} = {(β_{(12)}^{⊤}, \dots, β_{(1 s)}^{⊤}, \dots, β_{((s - 1) 1)}^{⊤}, \dots, β_{((s - 1) s)}^{⊤})}^{⊤}

. Here,

β_{(j k)}

denotes the component of

β_{j k}

with

{(j, k) : (j, k) \in I, j \in M, k \in M}

or

{(j, k) : (j, k) \in I; j \in M, or k \in M}

. Let

Π = {(Π_{1}, \dots, Π_{n})}^{⊤}

be an n by

p L_{n} + (p (p - 1) / 2) L_{n}^{2}

design matrix with

Π_{i} = {(Ψ_{i}^{⊤}, Φ_{i}^{⊤})}^{⊤}

, where

Ψ_{i} = {(Ψ_{i, 1}^{⊤}, \dots, Ψ_{i, p}^{⊤})}^{⊤} \in R^{p L_{n}}

and

Φ_{i} = {(Φ_{i, 12}^{⊤}, \dots, Φ_{i, 1 p}^{⊤}, \dots, Φ_{i, (p - 1) 1}^{⊤}, \dots, Φ_{i, (p - 1) p}^{⊤})}^{⊤} \in R^{p (p - 1) / 2 L_{n}^{2}}

. Likewise, we write

Π_{A_{i}} = {(Ψ_{M_{i}}^{⊤}, Φ_{I_{i}}^{⊤})}^{⊤} \in R^{q L_{n} + s L_{n}^{2}}

, where

Ψ_{M_{i}}

is the subvector consisting of the first

q L_{n}

elements of

Ψ_{i}

with the active covariates and

Φ_{I_{i}}

is the subvector consisting of the first

s L_{n}^{2}

elements of

Φ_{i}

with the active interaction covariates. We first investigate the estimator, which is obtained when the index sets

M

and

I

are known in advance, which we refer to as the oracle estimator. Now, we consider the quantile regression with the oracle information. Let

({\hat{β}}_{01}, {\hat{β}}_{001}) = arg min_{β_{01}, β_{001}} \frac{1}{n} \sum_{i = 1}^{n} ρ_{τ} (Y_{i} - Ψ_{M_{i}}^{⊤} β_{01} - Φ_{I_{i}}^{⊤} β_{001}) .

(3)

The oracle estimators for

β_{0}

and

β_{00}

are

{({\hat{β}}_{01}^{⊤}, 0_{(p - q) L_{n}}^{⊤})}^{⊤}

and

{({\hat{β}}_{001}, 0_{\frac{p (p - 1) - s (s - 1)}{2} L_{n}^{2}}^{⊤})}^{⊤}

, respectively. Accordingly, the oracle estimators for nonparametric function

f_{j} (x_{i j})

and

f_{i j} (x_{i j}, x_{i k})

are

{\hat{f}}_{j} (x_{i j}) = Ψ_{M_{i}}^{⊤} {\hat{β}}_{01}

and

{\hat{f}}_{j} (x_{i j}, x_{i k}) = Φ_{I_{i}}^{⊤} {\hat{β}}_{001}

, respectively.

2.2. Asymptotic Properties

We next present the asymptotic properties of the oracle estimators. Similar to [13], we assume that all true main effects belong to the Sobolev space of order two:

\sum_{l = 1}^{\infty} {(β_{j l})}^{2} l^{4} < C^{2}

, and impose the same requirement on each true interaction function

\sum_{s = 1}^{\infty} {(β_{j k, s d})}^{2} s^{4} < C^{2}

and

\sum_{d = 1}^{\infty} {(β_{j k, s d})}^{2} d^{4} < C^{2}

for each

j, k, 1 \leq j < k \leq p

, with C being some constant.

To establish the asymptotic properties of the estimators, the following regularity conditions are needed:

(A1)

The distribution of

x

is absolutely continuous with density

f (x)

, which is bounded away from 0;

(A2)

The conditional distribution

F_{ε | x} (\cdot | x)

of the error

ε

, given

x = x

, has a density

f_{ε | x} (\cdot | x)

, which satisfies the following two conditions:

(1): $sup_{ε, x} f_{ε | x} (ε | x) < \infty$ ;
(2): There exists positive constants $b_{1}$ and $b_{2}$ such that $inf_{x} inf_{| ε | < b_{1}} f_{ε | x} (ε | x) \geq b_{2}$ ;

(A3)

Basis function

φ_{1}, \dots, φ_{L_{n}}

satisfies the following:

(1): $L_{n} \approx n^{1 / (2 r + 1)}, r > 1 / 2$ ;
(2): $sup_{x \in {[0, 1]}^{p}} {∥ Π ∥}_{2} = O (L_{n})$ and $H_{n} = \frac{1}{n} \sum_{i = 1}^{n} Π_{i} Π_{i}^{⊤}$ are uniformly bounded from 0 to ∞;
(3): There is a vector $γ = {(β_{0}^{⊤}, β_{00}^{⊤})}^{⊤}$ such that ${sup}_{x \in {[0, 1]}^{p}} | m (x) - Π^{⊤} γ | = O (L_{n}^{- r})$ , where $m (x) = \sum_{j = 1}^{p} f_{j} (x_{j}) + \sum_{1 < j < k \leq p} f_{j k} (x_{j}, x_{k})$ and $x_{j} = {(x_{1 j}, \dots, x_{n j})}^{⊤}$ ;

(A4)

q = O (n^{C_{1}})

for some

C_{1} < \frac{1}{3}

.

It should be noticed that (A1) and (A2) are typical assumptions in nonparametric quantile regression (see [30]). Condition (3) in (A3) requires that the true component function

f_{j} (x_{j})

and

f_{j k} (x_{j}, x_{k})

can be uniformly well-approximated by the basis

φ_{j l} (x), l = 1, \dots, L_{n}

and

φ_{j s} (x) φ_{k d} (x), s = 1, \dots, L_{n}, d = 1, \dots, L_{n}

. Finally, Condition (A4) is set to control the model size.

Theorem 1.

Let

γ_{01} = {(β_{01}^{⊤}, β_{001}^{⊤})}^{⊤}

. Assume that Conditions (A1)–(A4) hold. Then, the following are true:

(a): $∥ {\hat{γ}}_{01} - γ_{01} ∥_{2} = o_{p} (n^{- 1 / 2} L_{n})$ ;
(b): $∥ \hat{m} {(x) - m (x) ∥}_{L_{2}} = O (n^{- (2 r - 1) / 2 (2 r + 1)}),$ where $m (x_{i}) = \sum_{j = 1}^{p} f_{j} (x_{i j}) + \sum_{1 \leq j < k \leq p} f_{j k} (x_{i j}, x_{i k})$ .

Theorem 1 summarizes the rate of convergence for the oracle estimator, which is the same as the optimal convergence rate in the additive model [30] due to the merit of the additive structure. The proof of Theorem 1 can be found in Appendix A.1.

3. Penalized Estimation for Additive Quantile Regression with Nonlinear Interaction Structures

3.1. Penalized Estimator

Model (2) contains a large number of main effects

f_{j} (x_{i j})

and interaction effects

f_{j k} (x_{i j}, x_{i k})

, which will significantly increase the computational burden, especially in cases where p (the number of features) is large. It is necessary to use penalty methods to ensure computational efficiency and prevent overfitting. Applying sparsity penalties (such as Lasso or grouped Lasso) to each

f_{j}

can help to automatically select important variables and reduce the complexity of the model.

However, when using a large number of basis functions to fit complex relationships, a simple sparse penalty may oversimplify the model, ignore potential patterns in the data, and introduce unstable fluctuations. Therefore, in addition to sparse penalties, it is also necessary to introduce smoothness penalties (such as second-order derivative penalties) to avoid drastic fluctuations in the function and ensure smooth and stable fitting results. This method helps to control model complexity, reduce overfitting, and improve the model’s generalization ability.

Therefore, we minimize the following penalized objective function for

(β_{0}, β_{00})

:

\frac{1}{n} \sum_{i = 1}^{n} ρ_{τ} (Y_{i} - \sum_{j = 1}^{p} Ψ_{i, j}^{⊤} β_{j} + \sum_{1 \leq j < k \leq p} Φ_{i, j k}^{⊤} β_{j k}) + P (f),

(4)

where

P (f) = λ_{1} [\sum_{j = 1}^{p} {\{∥ f_{j} (x_{j}) ∥_{n}^{2} + λ_{2} J_{2}^{1} (f_{j} (x_{j}))\}}^{1 / 2} + \sum_{1 \leq j < k \leq p} {\{∥ f_{j k} (x_{j}, x_{k}) ∥_{n}^{2} + λ_{2} J_{2}^{1} (f_{j k} (x_{j}, x_{k}))\}}^{1 / 2}],

is a sparsity smooth penalty function, and the first and second terms in the penalty function penalize the main and interaction effects, respectively. We employ

{∥ f ∥}_{n}^{2} = \frac{1}{n} \sum_{i = 1}^{n} f_{i}^{2}

as a sparsity penalty, which encourages sparsity at the function level, and

J_{2}^{1} (f) = \int f^{″ 2} (x) d x

as a roughness penalty, which controls the model’s complexity by ensuring that the estimated function remains smooth while fitting the data, thus preventing high-frequency fluctuations caused by an excessive number of basis functions. The two tuning parameters

λ_{1}, λ_{2} \geq 0

control the amount of penalization.

It can be written that

J_{2}^{1} (f_{j} (x_{j})) = β_{j}^{⊤} Ω_{j} β_{j}

, where

Ω_{j}

is a band diagonal matrix of known coefficients and the

(l_{1}, l_{2})

th element can be expressed as

Ω_{j, l_{1}, l_{2}} = \int φ_{j l_{1}}^{″} (x_{j}) φ_{j l_{2}}^{″} (x_{j}) d x, l_{1}, l_{2} \in {1, \dots, L_{n}}

—see [31] for details. According to [32], the penalty

J_{2}^{1} (f_{j k} (x_{j}, x_{k}))

can be represented as

β_{j k}^{⊤} (Λ_{j j} \otimes I_{L_{n}} + I_{L_{n}} \otimes Λ_{k k}) β_{j k} ≜ β_{j k}^{⊤} Λ_{j k} β_{j k},

where

Λ_{j j} = \int f_{j}^{″ 2} d x_{j}

and

Λ_{k k} = \int f_{k}^{″ 2} d x_{k}

, with

f_{j}^{″}

and

f_{k}^{″}

being the second derivatives of

f (x_{j}, x_{k})

with respect to

x_{j}

and

x_{k}

, respectively. Hence, the penalty function can be rewritten as

P (f) = λ_{1} (\sum_{j = 1}^{p} \sqrt{β_{j}^{⊤} M_{j} β_{j}} + \sum_{1 \leq j < k \leq p} \sqrt{β_{j k}^{⊤} K_{j k} β_{j k}}),

where

M_{j} = \frac{1}{n} ψ_{j}^{⊤} ψ_{j} + λ_{2} Ω_{j}

and

K_{j k} = \frac{1}{n} ϕ_{j k}^{⊤} ϕ_{j k} + λ_{2} Λ_{j k}

, with

ψ_{j}

denoting the

n \times L_{n}

matrix with the

(i, k)

-th entry given by

φ_{j k} (x_{i j})

, and

ϕ_{j k}

representing the

n \times L_{n}

matrix with the

(i, l)

-th entry given by

φ_{j l} (x_{i j}) φ_{k l} (x_{i k})

. Similar to [33], we decompose

M_{j} = R_{j}^{⊤} R_{j}

and

K_{j k} = Q_{j k}^{⊤} Q_{j k}

for some quadratic

L_{n} \times L_{n}

matrix

R_{j}

and

L_{n}^{2} \times L_{n}^{2}

matrix

Q_{j k}

. Then, Model (4) can be represented as

\frac{1}{n} \sum_{i = 1}^{n} ρ (Y_{i} - \sum_{j = 1}^{p} Ψ_{i, j}^{⊤} β_{j} - \sum_{1 \leq j < k \leq p} Φ_{i, j k}^{⊤} β_{j k}) + λ_{1} (\sum_{j = 1}^{p} ∥ R_{j} β_{j} ∥_{2} + \sum_{1 \leq j < k \leq p} {∥ Q_{j k} β_{j k} ∥}_{2}) .

(5)

However, the above penalty treats the main and interaction effects similarly. That is, an entry of an interaction into the model generally adds more predictors than an entry of a main effect, which usually demands high computational cost and is hard to interpret for more interactions. To deal with this difficulty, we propose to add a set of heredity restrictions to produce sparse interaction models that an interaction only be included in a model if one or both variables are marginally important.

Naturally,

T \subseteq S^{\circ 2} .

Our target is to estimate S and T from the data consistently or sign consistently, like Th1 in [21]

There are two types of heredity restrictions, which iare called strong and weak hierarchy:

Strong hierarchy : {\hat{β}}_{j k} \neq 0 and {\hat{β}}_{j} {\hat{β}}_{k} \neq 0

Weak hierarchy : {\hat{β}}_{j k} \neq 0 and max {∥ {\hat{β}}_{j} ∥_{0}, ∥ {\hat{β}}_{k} ∥_{0}} \neq 0 .

To ensure the strong and weak hierarchy conditions for

{\hat{β}}_{j k}

and

{\hat{β}}_{j}, {\hat{β}}_{k}

, we structure the following penalties to enforce the strong hierarchy:

\begin{matrix} Q^{P} (β_{j}, β_{j k}) = \frac{1}{n} \sum_{i = 1}^{n} ρ_{τ} (Y_{i} - \sum_{j = 1}^{p} Ψ_{i, j}^{⊤} β_{j} - \sum_{1 \leq j < k \leq p} Φ_{i, j k}^{⊤} β_{j k}) \\ + λ_{1} ({(\sum_{j = 1}^{p} ∥ R_{j} β_{j} ∥_{2} + \sum_{1 \leq k < j \leq p} I_{β_{j k \neq 0}} (∥ R_{j} β_{j} ∥_{2} + ∥ R_{k} β_{k} ∥_{2}))}^{1 / 2} + \sum_{1 \leq k < j \leq p} {∥ Q_{j k} β_{j k} ∥}_{2}) . \end{matrix}

(6)

and weak hierarcy:

\begin{matrix} Q^{P} (β_{j}, β_{j k}) = \frac{1}{n} \sum_{i = 1}^{n} ρ_{τ} (Y_{i} - \sum_{j = 1}^{p} Ψ_{i, j}^{⊤} β_{j} - \sum_{1 \leq j < k \leq p} Φ_{i, j k}^{⊤} β_{j k}) \\ + λ_{1} ({(\sum_{j = 1}^{p} ∥ R_{j} β_{j} ∥_{2} + \sum_{1 \leq k < j \leq p} I_{β_{j k \neq 0}} min (∥ R_{j} β_{j} ∥_{2}, ∥ R_{k} β_{k} ∥_{2}))}^{1 / 2} + \sum_{1 \leq k < j \leq p} {∥ Q_{j k} β_{j k} ∥}_{2}) . \end{matrix}

(7)

These structures ensure hierarchy through the introduction of an indicative penalty. Specifically, the indicator function

I_{β_{j k \neq 0}}

activates the penalty only when

β_{j k} \neq 0

, thus enforcing the strong condition. This ensures that both corresponding main effects must be present if the interaction term is non-zero. Additionally, the term

min (∥ R_{j} β_{j} ∥_{2}, ∥ R_{k} β_{k} ∥_{2})

ensures that the penalty is applied to the smaller of the two norms, thereby enforcing the weaker condition. This means that at least one of the corresponding main effects must be present for an interaction term to be included.

3.2. Algorithm

The regularization path algorithm under the marginality principle (RAMP, [21]) is a method for variable selection in high-dimensional quadratic regression models, designed to maintain the hierarchical structure between main effects and interaction effects during the selection process. However, the RAMP algorithm primarily focuses on linear interaction terms and considers interactions of variables with themselves, making it unsuitable for directly handling nonlinear interaction terms.

To address this limitation, we modified the RAMP algorithm, extending its application to additive models that include nonlinear interaction models. Through this enhancement, we can not only handle complex nonlinear interactions but also continue to preserve the hierarchical structure between main effects and interaction effects during variable selection. Specifically, the detailed steps of the modified algorithm are as follows.

Let

P = {1, 2, \dots, p}

and

Q = {(j, k) : 1 \leq j \leq k \leq p}

be the index set for main effects and interaction effects, respectively. For an index set

A \subset P

, define

A^{\circ 2} = A \circ A = {(j, k) : j \leq k; j, k \in A} \subset Q

, and

A \circ P = {(j, k) : j \leq k; or j, k \in A} \subset Q

.

We fix a sequence of values

{λ_{1, s}}_{s = 1}^{S}

between

λ_{1, m a x}

and

λ_{1, m i n}

. Following [21], we set

λ_{1, m a x} = n^{- 1} max | X^{⊤} y |

and

λ_{1, m i n} = ζ λ_{1, m a x}

with some small

ζ > 0

. At step

s - 1

, we denote the current active main effects set as

P_{s - 1}

, and the interaction effects are set as

Q_{s - 1}

. Define

H_{s - 1}

as the parent set of

Q_{s - 1}

, which contains the main effects that have at least one interaction effect in

Q_{s - 1}

. Set

H_{s - 1}^{c} = P - H_{s - 1}

. Then, the algorithm is as detailed below:

Step 1. Generate a decreasing sequence $λ_{1, m a x} = λ_{1, 1} > λ_{1, 2} > \dots, λ_{1, S} = λ_{1, m i n}$ , and $λ_{2, s} = λ_{1, s}^{2}$ for $s = 1, \dots, S$ .
We use the warm start strategy, where the solution for $s - 1$ is used as a starting value for s;
Step 2. Given $P_{s - 1}, Q_{s - 1}, H_{s - 1}$ , add the possible interactions among main effects in $P_{s - 1}$ to the current model. Then, we minimize the penalty loss function

$\begin{matrix} \frac{1}{n} \sum_{i = 1}^{n} ρ_{τ} (Y_{i} - \sum_{j \in P} Ψ_{i, j}^{⊤} β_{j} - \sum_{(j, k) \in P_{s - 1}^{\circ 2}} Φ_{i, j k}^{⊤} β_{j k}) \\ + λ_{1, s} (\sum_{j} I_{j \in H_{s - 1}^{c}} ∥ R_{j} β_{j} ∥_{2} + \sum_{j, k} I_{(j, k) \in P_{s - 1}^{\circ 2}} ∥ Q_{j k} β_{j k} ∥_{2}) \end{matrix}$

where the penalty is imposed on the candidate interaction effects and $H_{s - 1}^{c}$ , which contains the main effects not enforced by the strong heredity constraint. The penalty encourages smoothness and sparsity in the estimated functional components;
Step 3. Record $P_{s}, Q_{s}$ , and $H_{s}$ according to the above solution. Add the corresponding effects from $Q_{s}$ into $P_{s}$ ;
Step 4. Calculate the quantile estimation based on the current model

$({\hat{β}}_{j}^{(s)}, {\hat{β}}_{j k}^{(s)}) = arg min \frac{1}{n} \sum_{i = 1}^{n} ρ_{τ} (Y_{i} - \sum_{j \in P_{s}} Ψ_{i, j}^{⊤} β_{j} - \sum_{(j, k) \in Q_{s}} Φ_{i, j k}^{⊤} β_{j k})$
Step 5. Repeat steps 2–4 S times and determine the active sets $M_{s}$ and $I_{s}$ according to $λ_{s}$ , which minimizes GIC.

Ref. [34] proposed the following GIC criterion:

\begin{matrix} G I C (λ_{1 s}) = log (\sum_{i = 1}^{n} ρ_{τ} (Y_{i} - \sum_{j \in P_{s}} Ψ_{i, j}^{⊤} {\hat{β}}_{j}^{(s)} - \sum_{(j, k) \in Q_{s}} Φ_{i, j k}^{⊤} {\hat{β}}_{j k}^{(s)})) + C_{n} κ_{s} d f_{λ}, \end{matrix}

where

{\hat{β}}_{j}^{(s)}

and

{\hat{β}}_{j k}^{(s)}

are the penalized estimators obtained by step 4,

κ_{s} = | P_{s} | + | Q_{s} |

is the cardinality of the index set of nonzero coefficients in main effects and interaction effects,

d f_{λ}

represents the degrees of freedom of the model, and

C_{n}

is a sequence of positive constants diverging to infinity as n increases. We take

C_{n} = log (log (n))

in our simulation studies and real data analysis.

Following the same strategy, under the weak heredity condition, we use the set

P_{s - 1} \circ P

instead of

P_{s - 1}^{\circ 2}

and solve the following optimization problem:

\begin{matrix} \frac{1}{n} \sum_{i = 1}^{n} ρ_{τ} (Y_{i} - \sum_{j \in P} Ψ_{i, j}^{⊤} β_{j} - \sum_{(j, k) \in P_{s - 1} \circ P} Φ_{i, j k}^{⊤} β_{j k}) \\ + λ_{1, s} (\sum_{j} I_{j \in H_{s - 1}^{c}} ∥ R_{j} β_{j} ∥_{2} + \sum_{(j, k)} I_{(j, k) \in P_{s - 1} \circ P} ∥ Q_{j k} β_{j k} ∥_{2}), \end{matrix}

with respect to

β_{j}

and

β_{j k}

. That is, an interaction effect can enter the model for selection if at least one of its parents are selected in a previous step.

3.3. Asymptotic Theory

Let

Ξ_{n} (λ)

be the set of local minima of

Q^{P} (β_{j}, β_{j k})

. The following result shows that, with probability approaching one, the oracle estimator belongs to the set

Ξ_{n} (λ)

.

Theorem 2.

Assume that conditions (A1)–(A4) are satisfied and consider the penalized function under strong heredity. Let

\hat{γ} = {({\hat{β}}_{0}^{⊤}, {\hat{β}}_{00}^{⊤})}^{⊤}

be the oracle estimator. If

λ_{1} = o (1)

and

n^{- 1} L_{n}^{3} λ^{- 1} \to 0

as

n \to \infty

, then

P (\hat{γ} \in Ξ_{n} (λ)) \to 1, as n \to \infty .

Remark 1.

The penalty term

λ_{1}

decays to zero as

n \to \infty

, allowing the oracle estimator to dominate. The rate condition

n^{- 1} L_{n}^{3} λ^{- 1} \to 0

guarantees that the approximation error from basis expansion

L_{n}

is controlled by the penalty strength.

The proof of Theorem 2 can be found in Appendix A.2.

Define

R_{n} (m) = \frac{1}{n} \sum_{i = 1}^{n} ρ_{τ} (Y_{i} - m (x_{i}))

and

R (m) = E (ρ_{τ} (Y - m (x)))

as the empirical risk and predictive risk of quantile loss, respectively. And

\begin{matrix} M_{n} & = & {m : m (x) = \sum_{j = 1}^{p} ψ_{j}^{⊤} (x_{j}) β_{j} + \sum_{1 < j < k \leq p} ϕ_{j k}^{⊤} (x_{j}, x_{k}) β_{j k} : \\ E (ψ_{j}) = 0, E (ϕ_{j k}) = 0, E (ψ_{j}^{2}) = 1, E (ϕ_{j k}^{2}) = 1} \end{matrix}

is a functional class. Let

\tilde{m} = \underset{m \in M_{n}}{arg min} R (m)

denote the predictive oracle, i.e., the minimizer of the predictive risk over

M_{n}

, and

\hat{m}

represent the minimization of Equation (6) over

M_{n}

. Following [35], we say that an estimator

\hat{m}

is persistent (risk consistent) relative to a class of functions

M_{n}

if

R (\hat{m}) - R (\tilde{m}) \overset{P}{⟶} 0

. Then, we have the following result.

Theorem 3.

Under conditions (A1)–(A4), if

p = o (n)

, then for some constants

t > 0

,

P (R (\hat{m}) - R (\tilde{m}) \leq (2 \sqrt{2 t^{2}} + 10 t^{2} / 3) n^{- (2 r - 1) / 2 (2 r + 1)}) \geq 1 - exp (- n t^{2}) .

The above theorem establishes the convergence rate in terms of the excess risk of the estimator, which shows the prediction accuracy and consistency of the proposed estimator. The proof of Theorem 3 can be found in Appendix A.3.

4. Simulation Studies

We conduct extensive Monte Carlo simulations to assess the performance of our proposed method in comparison with two established approaches, hirNet [19] and RAMP [21], across five specific quantile levels:

τ \in 0.1, 0.25, 0.5, 0.75, 0.90

. In each simulation scenario, we generate 500 independent training datasets with two sample sizes,

n = 300

and

n = 100

, and

p = 15

main effects, resulting in a total of

15 + 15 \times 14 / 2 = 121

potential terms (15 main effects and 105 pairwise interactions). All methods are implemented using pseudo-spline basis functions to ensure consistent comparison. For regularization parameters, we adopt the relationship

λ_{2} = λ_{1}^{2}

based on [13], which provides superior empirical performance.

The covariates

x_{i j}

are generated independently from a uniform distribution

U (0, 1)

. We specify five nonlinear main effect functions that are standardized before model fitting:

f_{1} (x) = 10 x, f_{2} (x) = \frac{1}{1 + x} + 20 x,

f_{3} (x) = 20 \sin (x), f_{4} (x) = 20 \exp (x), f_{5} (x) = 10 x^{2} .

Hence, each

f_{j}

is standardized and the interaction functions are generated by multiplying together the standardized main effects,

f_{j k} (x_{j}, x_{k}) = f_{j} (x_{j}) \times f_{k} (x_{k}), 1 \leq j < k \leq p .

The response variables are generated through quantile-specific models that incorporate either strong or weak heredity constraints. Under strong heredity, interactions are only included when all parent main effects are active, while weak heredity requires just one parent main effect.

Example 1.

Strong heredity. Building on the quantile-specific sparsity framework established by [36], we incorporate strong heredity constraints into our modeling approach. This integration ensures two key aspects: (1) the set of active predictors varies depending on the quantile, and (2) all selected variables preserve strict hierarchical relationships. The resulting nonlinear data-generating process is as follows:

$τ = 0.1$ (Sparse):

$Y_{i} = f_{1} (x_{i 1}) + f_{2} (x_{i 2}) + f_{3} (x_{i 3}) + ε_{i} .$
$τ = 0.25, 0.50, 0.75$ (Dense):

$Y_{i} = f_{1} (x_{i 1}) + f_{2} (x_{i 2}) + f_{3} (x_{i 3}) + f_{4} (x_{i 4}) + f_{5} (x_{i 5}) + f_{12} (x_{i 1}, x_{i 2}) + f_{13} (x_{i 1}, x_{i 3}) + ε_{i} .$
$τ = 0.9$ (Sparse):

$Y_{i} = f_{1} (x_{i 1}) + f_{3} (x_{i 3}) + f_{5} (x_{i 5}) + ε_{i} .$

Example 2.

Weak heredity. In contrast to strong heredity, we also incorporate weak heredity constraints into our modeling approach by extending the quantile-specific sparsity framework. The resulting nonlinear data-generating process is as follows:

$τ \in 0.1, 0.9$ (Sparse):

$Y_{i} = f_{1} (x_{i 1}) + f_{2} (x_{i 2}) + f_{3} (x_{i 3}) + ε_{i} .$
$τ = 0.25, 0.50, 0.75$ (Dense):

$Y_{i} = f_{1} (x_{i 1}) + f_{2} (x_{i 2}) + f_{3} (x_{i 3}) + f_{14} (x_{i 1}, x_{i 4}) + f_{15} (x_{i 1}, x_{i 5}) + ε_{i} .$

To assess robustness across different error conditions, we consider four error distributions:

1.

Gaussian: The error terms

ε_{i}

follow a normal distribution with mean 0 and variance 1, i.e.,

ε_{i} \sim N (0, 1)

;

2.

Heavy-tailed: The error terms

ε_{i}

follow a Student’s t-distribution with 3 degrees of freedom, i.e.,

ε_{i} \sim t (3)

;

3.

Skewed: The error terms

ε_{i}

follow a chi-squared distribution with 2 degrees of freedom, i.e.,

ε_{i} \sim χ (2)

;

4.

Heteroscedasticity [37]: The error terms

ε_{i}

are heteroscedastic and modeled as follows:

$τ = 0.1 : ε_{i} = (x_{i 1} + x_{i 2}) e_{i}$ ;
$τ \in 0.25, 0.50, 0.75 : ε_{i} = (x_{i 1} + x_{i 5}) e_{i}$ ;
$τ = 0.9 (s t r o n g h e r e d i t y) : ε_{i} = (x_{i 1} + x_{i 3}) e_{i}$ ;
$τ = 0.9 (w e a k h e r e d i t y) : ε_{i} = (x_{i 1} + x_{i 4}) e_{i}$ .

Recall that

M = {j : f_{j} \neq 0}

and

I = {(j, k) : f_{j k} \neq 0}

are the active sets of main and interaction effects. For each example, we run

R = 500

Monte Carlo simulations and denote the estimated subsets as

{\hat{M}}^{(r)}

and

{\hat{I}}^{(r)}

,

r = 1, \dots, R

. We evaluate the performance on variable selection based on the following criteria:

(1): True positive rate and false positive rate of main effects (mTPR, mFDR);
(2): True positive rate and false positive rate of interaction effects (iTPR, iFDR);
(3): Main effects coverage percentage ( $P_{m}$ ): $R^{- 1} \sum_{r = 1}^{R} I (M \in {\hat{M}}^{(r)})$ ;
(4): Interaction effects coverage percentage ( $P_{i}$ ): $R^{- 1} \sum_{r = 1}^{R} I (I \in {\hat{I}}^{(r)})$ ;
(5): Coverage rate of a single variable in the main effects and interaction effects;
(6): Model size (size): $R^{- 1} \sum_{r = 1}^{R} (| {\hat{M}}^{(r)} | + | {\hat{I}}^{(r)} |)$ ;
(7): Root mean squared error (RMSE): $R^{- 1} \sum_{r = 1}^{R} \sqrt{\frac{1}{n} \sum_{i = 1}^{n} {(ρ_{τ} (Y_{i} - {\hat{Q}}_{τ}^{(r)} (Y_{i} | x_{i})))}^{2}}$ , where ${\hat{Q}}_{τ}^{(r)} (Y_{i} | x_{i})$ is obtained by the estimated coefficients ${\hat{β}}_{j}^{(r)} \in {\hat{M}}^{(r)}$ and ${\hat{β}}_{j k}^{(r)} \in {\hat{I}}^{(r)}$ ; R is the number of iterations.

The results under the strong heredity for

n = 100

and

n = 300

are shown in Table 1 and Table 2, respectively. In the context of small sample data, the performance of the proposed method is a little worse than the other two methods in terms of true positive rates and coverage percentages of a single variable in the main and interaction effects. This is because the high complexity and degrees of freedom in nonparametric additive interaction models make them more prone to overfitting, leading to an inaccurate reflection of the underlying data structure and, consequently, a lower true positive rate (TPR) in variable selection. In contrast, methods like hirNet and RAMP, by adopting regularization and relatively simple linear structures, can more effectively prevent overfitting. Although this may come at the cost of some loss in model interpretability.

In large sample scenarios, our proposed method not only demonstrates higher true positive rates and broader coverage of main effects and interaction effects but also significantly reduces false positive rates and generates a more parsimonious model. These results indicate that the method can effectively handle nonlinear additive interaction models, accurately capturing complex relationships in the data while maintaining a low false discovery rate and higher model simplicity.

The results from weak heredity are shown in Table 3 and Table 4. In small sample cases, the hirNet method is effective at capturing both main and interaction effects but suffers from a high rate of false selections, leading to increased model complexity. On the other hand, the RAMP method selects fewer variables, reducing false selections but also under-selecting important variables, which negatively affects predictive accuracy. In contrast, the proposed method in this paper offers a better balance, with higher precision and stability in selecting both main and interaction effects, while significantly reducing false selections. This method results in a moderate model size and the lowest prediction error, demonstrating an optimal trade-off between model complexity and predictive performance.

In large sample cases, the proposed method excels in identifying both main and interaction effects, with the main effect true positive rate (mTPR) and interaction effect true positive rate (iTPR) approaching 1.000, indicating a nearly perfect identification of true positives. This method also maintains low mFDR and iFDR, minimizing false positives. In contrast, although the hirNet methods perform better in terms of mTPR and iTPR, they tend to include more noisy variables in the model, resulting in higher RMSE values and lower predictive accuracy.

From the perspective of quantile-specific sparsity, the proposed method demonstrates varying levels of sparsity across different quantiles (e.g.,

τ \in 0.1, 0.25, 0.5, 0.75, 0.90

). At

τ = 0.1

and

τ = 0.9

quantiles, fewer variables are selected, indicating higher sparsity, while, at

τ = 0.25, 0.50, 0.9

, more variables and interaction effects are included, showing that the model adapts its selection based on the distribution of the response variable. Simulation results further show that, when the

τ = 0.9

, DGP consists only of main effects and our method achieves near-perfect variable selection: the TPR for main effects approaches 1 (all true effects retained) and the FDR is close to 0 (almost no false positives), leading to model sizes nearly identical to the true DGP and minimal prediction errors, as reflected in the low RMSE. These results are consistent with [38]’s work, which provides theoretical support for this outcome. Moreover, the RMSE patterns suggest that the method maintains strong predictive performance even when more variables are selected, thereby confirming the effectiveness of the quantile-specific sparsity approach.

5. Applications

Parkinson’s disease (PD) is a common degenerative disease of the nervous system that leads to shaking, stiffness, and difficulty with walking, balance, and coordination. PD symptom monitoring with the Unified Parkinson’s Disease Rating Scale (UPDRS) is very important but costly and logistically inconvenient due to the requirement of patients’ presence in clinics and time-consuming physical examinations by trained medical staff. A new treatment technology based on the rapid and remote replication of UPDRS assessments with more than a dozen biomedical voice measurement indicators has been proposed recently. The dataset consists of 5875 UPDRS from patients with early-stage PD and 16 corresponding biomedical voice measurement indicators. Reference [39] analyzed a PD dataset and mapped the clinically relevant properties from speech signals of PD patients to UPDRS using linear regression and nonlinear regression models, with the aim to verify the feasibility of frequent, remote, and accurate UPDRS tracking and effectiveness in telemonitoring frameworks that enable large-scale clinical trials into novel PD treatments. However, existing analyses of the UPDRS scores from the 16 voice measures with a model including only the main effects may fail to reflect the relationship between the UPDRS and 16 voice measures. For example, fundamental frequency and amplitude can work together on voice and thus may have an interactive effect on UPDRS. In this situation, it is necessary to consider the model with pairwise interactions. Furthermore, the effect on UPDRS from these indicators may be complicated and needs to be investigated further, including their nonlinear dependency, even nonparametric relation, so that both main effect and interactive effect may exist. By noting the typical skewness of the distribution of the UPDRS and the asymmetry of its

τ

-quantile-level and (

1 - τ

)-quantile-level score

(0 < τ < 1

) in Figure 1 (left), the distribution of the UPDRS tends to asymmetric and heavy-tailed, so that one should examine how any quantile, including the median (rather than the mean) of the conditional distribution, is affected by changes in the 16 voice measures and then make a reliable decision about the new treatment technology. Therefore, the proposed additive quantile regression with nonlinear interaction structure is applied for the analysis of the data.

The 16 biomedical voice measures in the PD data include several measurements of variation in fundamental frequency (Jitter, Jitter (Abs), Jitter:RAP, Jitter:PPQ5, and Jitter:DDP), several measurements of variation in amplitude (Shimmer, Shimmer (dB), Shimmer:APQ3, Shimmer:APQ5, Shimmer:APQ11, and Shimmer:DDA), two measurements of the ratio of noise to tonal components in the voice (NHR and HNR), a nonlinear dynamical complexity measurement (RPDE), signal fractal scaling exponent (DFA), and a nonlinear measure of fundamental frequency variation (PPE). Our interest is to identify the biomedical voice measurements and their interactions that may effectively affect the UPDRS (Unified Parkinson Disease Rating Scale) in Parkinson’s disease patients. Figure 1 (right) shows the correlations among 16 biomedical voice measurements. As we can see from Figure 1 (right), except for variable DFA, there are strong correlations among the covariates, which makes variable selection more challenging. Therefore, the proposed additive quantile regression model with nonlinear interaction structures may provide a complete picture on how those factors and their interactions affect the distribution of UPDRS.

We first randomly generated 100 partitions of the data into training and test sets. For each training set, we select 5475 observations as training set and the remaining 400 observations as the test set. Normalization is conducted for the dataset. Also, quantile regression is used to fit the model selected by the proposed method. For comparison, we also use least square to fit the model selected by hirNet and RAMP on the training set. Finally, we evaluate the performance of these models using the fixed active sets on the corresponding test sets.

Table 5 summarizes the covariates selected by different methods. The proposed quantile-specific method reveals distinct covariate patterns across different UPDRS severity levels. At lower quantiles (

τ = 0.1

and

τ = 0.25

), the model highlights DFA and its interactions (e.g., HNRDFA, DFAPPE), suggesting that these acoustic features may be particularly relevant in the early stages of the disease. The middle quantile (

τ = 0.5

) incorporates additional speech markers (APQ11, RPDE), which align with conventional methods, while higher quantiles (

τ = 0.75

and

τ = 0.9

) progressively simplify to core features (HNR, dB), indicating reduced covariate dependence in advanced stages. This dynamic selection demonstrates how quantile-specific modeling can capture stage-dependent biological mechanisms while maintaining model parsimony, selecting just 3–5 main effects and 1–2 interactions per quantile, compared to hitNet’s fixed 12-variable approach. The shifting importance of HNR and DFA interactions across quantiles particularly underscores their complex, nonlinear relationships with symptom progression.

Figure 2 and Figure 3 show the estimation results of main effects and interaction effects at

τ = 0.50

. The solid lines represent the estimated effects, while the dashed lines indicate the

95 %

confidence intervals obtained through a simulation-based approach. Specifically, for the chosen model, we conducted 100 repeated simulations on the dataset to derive point-wise confidence intervals for each covariate’s effect.

From Figure 2, it can be observed that, as APQ11 increases, the UPDRS score initially rises and then declines, peaking at a specific value. This inverted U-shaped relationship suggests that moderate amplitude variations are linked to more severe symptoms. For HNR, the UPDRS score decreases as the noise ratio increases, indicating that higher noise levels correlate with disease worsening. The confidence intervals around these trends confirm their statistical significance across the simulated datasets. RPDE shows a U-shaped curve, meaning that intermediate complexity levels are associated with more severe symptoms, while extreme values (high or low) indicate milder symptoms. The confidence intervals further support this nonlinear relationship, demonstrating its robustness to the data. DFA has a negative correlation with the UPDRS score: as fundamental frequency variability increases, the UPDRS score decreases, suggesting that greater variability is linked to symptom relief. The narrow confidence intervals around this trend highlight its consistency across the simulations.

Figure 3 illustrates the interaction effect between HNR and RPDE on the UPDRS score. This shows that, as HNR increases, the impact on the UPDRS score becomes more negative, especially when RPDE is at lower values, indicating a worsening in symptoms with higher noise to harmonic ratio. Conversely, for higher values of RPDE, the interaction effect stabilizes, suggesting less influence on the UPDRS score. The plot reveals a valley-like structure around the center where both HNR and RPDE are near zero, signifying minimal interaction effect. Notably, the most significant interactions occur at extreme values of either HNR or RPDE, highlighting the complex, nonlinear relationship between these variables and Parkinson’s disease symptom severity. The estimation results of the main effects and interaction effects at

τ = 0.1, 0.25, 0.75, 0.9

are presented in Appendix B.

From the results in Appendix B, it can be seen that, although the main effects selected at

τ = 0.25

and

τ = 0.75

are the same, their impacts on UPDRS scores differ due to nonlinear relationships or complex interactions between variables. This highlights the advantage of quantile regression, which not only captures the average trend of the dependent variable but also reveals how these effects vary across different quantiles. This method provides a more comprehensive understanding of the complex relationships between variables and helps to analyze the specific effects of variables on UPDRS under different conditions.

From the perspective of quantile-specific sparsity, the proposed method demonstrates varying levels of covariate selection across different quantiles. At lower quantiles (

τ = 0.1

and

τ = 0.25

), the model tends to select fewer covariates, focusing primarily on HNR, DFA, and PPE, which suggests a higher degree of sparsity. As the quantile increases to

τ = 0.50

, the model selects more covariates, including APQ11 and RPDE, indicating a reduction in sparsity. At higher quantiles (

τ = 0.75

and

τ = 0.9

), the model again shows increased sparsity by selecting only a few key covariates such as HNR, DFA, and PPE. This pattern reflects the adaptability of the proposed method in capturing the specific characteristics of the data distribution at different quantiles, thereby achieving optimal sparsity tailored to each quantile level.

Table 6 compares the average RMSE and model sizes across 100 datasets for three methods: the proposed quantile-based approach (evaluated at

τ = 0.1, 0.25, 0.5, 0.75, 0.9

), RAMP, and hirNet. The results show that our method outperforms the others in predictive accuracy, particularly at

τ = 0.75

, where it achieves the lowest RMSE of 0.858, outperforming RAMP (0.878) and hirNet (0.887). Additionally, the proposed method uses much smaller models, with only 2.3–5.0 variables, compared to hirNet’s larger model with 18.39 variables. This reduction in model size highlights the method’s efficiency without sacrificing predictive power. The method also performs consistently well across different quantiles, with a particularly strong result at

τ = 0.25

(RMSE = 0.873), though the RMSE at

τ = 0.5

is slightly higher (1.00), showing some variation in performance across quantiles. Furthermore, models that include interaction terms perform better than those with only main effects, emphasizing the importance of interaction effects in the model. In conclusion, the proposed method strikes an optimal balance between prediction accuracy, model simplicity, and computational efficiency, making it ideal for applications that require both interpretability and strong predictive performance.

6. Conclusions

We explore the variable selection for additive quantile regression with interactions. We fit the model using the sparsity smooth penalty function and add the regularization algorithm under the marginality principle to the additive quantile regression with interactions, which can select main and interaction effects simultaneously while keeping either the strong or weak heredity constraints. We demonstrate theoretical properties of the proposed method for additive quantile regression with interactions. Simulation studies demonstrate good performance of the proposed model and method.

Also, we applied the proposed method to the Parkinson’s disease (PD) data; our method successfully validates the important finding in the literature that frequent, remote, and accurate UPDRS tracking as a novel PD treatment could be effective in telemonitoring frameworks for PD symptom monitoring and large-scale clinical trials.

Author Contributions

Y.B.: Conceptualization, Formal analysis, Methodology, Software, Writing—original draft, Data curation; J.J.: Formal analysis, Methodology, Writing—review and editing; M.T.: Supervision, Funding acquisition. All authors have read and agreed to the published version of the manuscript.

Funding

The work was partially supported by the Beijing Natural Science Foundation (No.1242005).

Data Availability Statement

The dataset used in this study is publicly available at https://www.worldbank.org/en/home (accessed on 1 June 2023).

Conflicts of Interest

The authors declare no conflictw of interest.

Appendix A. Proof of the Theorem

Appendix A.1. Proof of Theorem 1

Throughout the proof, if

v = {(v_{1}, \dots, v_{k})}^{⊤}

is a vector, we use the norms

{∥ v ∥}_{2} = \sqrt{\sum_{j = 1}^{k} v_{j}^{2}}

and

{∥ v ∥}_{\infty} = {max}_{j} | v_{j} |

. For a function f on

[0, 1]

, we denote its

L_{2} (P)

norm by

{∥ f ∥}_{L_{2}} = \sqrt{\int_{0}^{1}} f^{2} (x) d P (x) = \sqrt{E {(f)}^{2}}

. We write

γ_{01} = {(β_{01}^{⊤}, β_{001}^{⊤})}^{⊤}

and write

{\tilde{γ}}_{01} = {({\tilde{β}}_{01}^{⊤}, {\tilde{β}}_{001}^{⊤})}^{⊤}

in the same fashion. Let

θ = \sqrt{n} ({\tilde{γ}}_{01} - γ_{01})

. Note that we can express

{\hat{Q}}_{Y_{i} | x_{i}} (τ) = Ψ_{M_{i}}^{⊤} {\hat{β}}_{01} + Φ_{I_{i}}^{⊤} {\hat{β}}_{001}

; alternatively, we can express it as

{\hat{Q}}_{Y_{i} | x_{i}} (τ) = Ψ_{M_{i}}^{⊤} {\tilde{β}}_{01} + Φ_{I_{i}}^{⊤} {\tilde{β}}_{001}

. By the identifiability of the model, we must have

{\hat{γ}}_{01} = {\tilde{γ}}_{01}

.

Notice that

\frac{1}{n} \sum_{i = 1}^{n} ρ_{τ} (Y_{i} - Ψ_{M_{i}}^{⊤} {\tilde{β}}_{01} + Φ_{I_{i}}^{⊤} {\tilde{β}}_{001}) = \frac{1}{n} \sum_{i = 1}^{n} ρ_{τ} (ϵ_{i} - {\tilde{Π}}_{A_{i}}^{⊤} θ - U_{n i})

where

{\tilde{Π}}_{A_{i}} = n^{- 1 / 2} Π_{A_{i}}

and

U_{n i} = Π_{A_{i}}^{⊤} {\tilde{γ}}_{01} - m (x_{i})

. Define the minimizers under the transformation as

\hat{θ} = arg min_{θ} \frac{1}{n} \sum_{i = 1}^{n} ρ_{τ} (ϵ_{i} - {\tilde{Π}}_{A_{i}}^{⊤} θ - U_{n i}) .

Let

a_{n}

be a sequence of positive numbers and define

Q_{i} (a_{n}) = ρ_{τ} (ϵ_{i} - a_{n} {\tilde{Π}}_{A_{i}}^{⊤} θ - U_{n i}) .

Define

D_{i} (θ, a_{n}) = Q_{i} (a_{n}) - Q_{i} (0) - E [Q_{i} (a_{n}) - Q_{i} (0) | x_{i}] + a_{n} {\tilde{Π}}_{A_{i}}^{⊤} θ φ_{τ} (ϵ_{i}),

and

{\tilde{Q}}_{i} (θ, a_{n}) = Q_{i} (a_{n}) - Q_{i} (0) + a_{n} {\tilde{Π}}_{A_{i}}^{⊤} θ φ_{τ} (ϵ_{i}),

where

φ_{τ} (ϵ_{i}) = τ - I (ϵ_{i} < 0)

.

Lemma A1.

Let

q_{n} = q L_{n} + s L_{n}^{2}

. If Conditions (A1)–(A4) are satisfied, then, for any positive constant L,

q_{n}^{- 1} sup_{∥ θ ∥ \leq L} | D_{i} (θ, \sqrt{q_{n}}) | = o_{p} (1) .

Proof.

Note that

\begin{matrix} D_{i} (θ, \sqrt{q_{n}}) = {\tilde{Q}}_{i} (θ, \sqrt{q_{n}}) - E [{\tilde{Q}}_{i} (θ, \sqrt{q_{n}})] . \end{matrix}

Using Knight’s identity

ρ_{τ} (r - s) - ρ_{τ} (r) = - s {τ - I (r \leq 0)} + \int_{0}^{s} {I (r \leq t) - I (r \leq 0)} d t,

we have

\begin{matrix} \begin{matrix} {\tilde{Q}}_{i} (θ, \sqrt{q_{n}}) & = ρ_{τ} (ϵ_{i} - \sqrt{q_{n}} {\tilde{Π}}_{A_{i}}^{⊤} θ - U_{n i}) - ρ_{τ} (ϵ_{i} - U_{n i}) + \sqrt{q_{n}} {\tilde{Π}}_{A_{i}}^{⊤} θ φ_{τ} (ϵ_{i}) \\ = \int_{0}^{\sqrt{q_{n}} {\tilde{Π}}_{A_{i}}^{⊤} θ} [I (ϵ_{i} - U_{n i} < t) - I (ϵ_{i} - U_{n i} < 0)] d t . \end{matrix} \end{matrix}

Therefore,

V a r (D_{i} (θ, \sqrt{q_{n}})) = V a r ({\tilde{Q}}_{i} (θ, \sqrt{q_{n}})) \leq E ({\tilde{Q}}_{i}^{2} (θ, \sqrt{q_{n}}))

. We have

\begin{matrix} \sum_{i = 1}^{n} E ({\tilde{Q}}_{i}^{2} (θ, \sqrt{q_{n}}) | x_{i}) \\ \leq & C q_{n} n^{- 1 / 2} \sum_{i = 1}^{n} \int_{0}^{\sqrt{q_{n}} {\tilde{Π}}_{A_{i}}^{⊤} θ} F_{i} (t + U_{n i}) - F_{i} (U_{n i}) d t \\ \leq & C q_{n} n^{- 1 / 2} \sum_{i = 1}^{n} \int_{0}^{\sqrt{q_{n}} {\tilde{Π}}_{A_{i}}^{⊤} θ} (f (0) + o (1)) (t + o (t^{2})) d 2 \\ \leq & C q_{n}^{2} n^{- 1 / 2} [θ^{⊤} (\sum_{i = 1}^{n} f_{i} (0) {\tilde{Π}}_{A_{i}} {\tilde{Π}}_{A_{i}}^{⊤}) θ] (1 + o (1)) \\ \leq & C q_{n}^{2} n^{- 1 / 2} [{∥ θ ∥}_{2}^{2} λ_{max} (n^{- 1} Π_{A}^{⊤} B_{n} Π_{A})] \\ \leq & C q_{n}^{2} n^{- 1 / 2} (1 + o (1)) . \end{matrix}

for some positive constant C, where

B_{n} = d i a g (f_{1} (0), \dots, f_{n} (0))

is an

n \times n

diagonal matrix with

f_{i} (0)

denoting the conditional density function of

ϵ_{i}

given

x_{i}

evaluated at zero. Therefore,

\sum_{i = 1}^{n} V a r {D_{i} (θ, \sqrt{q_{n}})} \leq C q_{n}^{2} n^{- 1 / 2}

for some positive constant C and all n sufficiently large. By Bernstein’s inequality, for all n sufficiently large,

\begin{matrix} \begin{matrix} P (q_{n}^{- 1} | \sum_{i = 1}^{n} D_{i} (θ, \sqrt{q_{n}}) | > ν | x_{i}) \\ \leq q_{n}^{- 1} exp (- \frac{ν^{2}}{C q_{n}^{2} n^{- 1 / 2} + C ν n^{- 1 / 2}}) \\ \leq q_{n}^{- 1} exp (- C n^{1 / 2} q_{n}^{- 2}) \end{matrix} \end{matrix}

which converges to 0 as

n \to \infty

by Conditions (A3) and (A4). Note that the upper bound does not depend on

x_{i}

, so the above bound also holds unconditionally. □

Lemma A2.

Suppose that Conditions (A1)–(A4) hold. Then, for any sequence

{b_{n}}

with

1 \leq b_{n} \leq L_{n}^{ξ / 10}, 0 < ξ < (r - 1 / 2) / (2 r + 1)

, we have

\begin{matrix} sup_{θ^{⊤} {\tilde{Π}}_{A}^{⊤} {\tilde{Π}}_{A} θ \leq b_{n}^{2} L_{n}} | \sum_{i = 1}^{n} & ρ (ϵ_{i} - {\tilde{Π}}_{A_{i}}^{⊤} θ - U_{n i}) - ρ (ϵ_{i} - U_{n i}) + {\tilde{Π}}_{A_{i}}^{⊤} θ (τ - I (ϵ_{i} < 0)) \\ - E_{ϵ_{i} | x_{i}} (ρ (ϵ_{i} - {\tilde{Π}}_{A_{i}}^{⊤} θ - U_{n i}) - ρ (ϵ_{i} - U_{n i})) | = o_{p} (L_{n}) . \end{matrix}

Using the similar arguments as described to prove Lemma 3.2 in [40], Lemma A2 can be proven.

Proof.

For Theorem 1(1), we first prove that,

\forall η > 0

, there exists a

C > 0

such that

P (inf_{{∥ θ ∥}_{2} \leq L} q_{n}^{- 1} \sum_{i = 1}^{n} (Q_{i} (\sqrt{q_{n}}) - Q_{i} (0)) > 0) \geq 1 - η .

Note that

\begin{matrix} \begin{matrix} q_{n}^{- 1} \sum_{i = 1}^{n} (Q_{i} (\sqrt{q_{n}}) - Q_{i} (0)) & = q_{n}^{- 1} \sum_{i = 1}^{n} D_{i} (θ, \sqrt{q_{n}}) + q_{n}^{- 1} \sum_{i = 1}^{n} E [Q_{i} (\sqrt{q_{n}}) - Q_{i} (0)] \\ - q_{n}^{- 1 / 2} \sum_{i = 1}^{n} {\tilde{Π}}_{A_{i}}^{⊤} θ φ_{τ} (ϵ_{i}) \\ ≜ G_{n 1} + G_{n 2} + G_{n 3}, \end{matrix} \end{matrix}

where the definition of

G_{n i}, i = 1, 2, 3

is clear from the context. First, we can see that

inf_{{∥ θ ∥}_{2} \leq L} | G_{n 1} | = O_{p} (1)

by Lemma A1.

For

G_{n 2}

, we have

\begin{matrix} q_{n}^{- 1} \sum_{i = 1}^{n} E {Q_{i} (\sqrt{q_{n}}) - Q_{i} (0)} \\ = & q_{n}^{- 1} \sum_{i = 1}^{n} E (ρ_{τ} (ϵ_{i} - \sqrt{q_{n}} {\tilde{Π}}_{A_{i}}^{⊤} θ - U_{n i}) - ρ_{τ} (ϵ_{i} - U_{n i})) \\ = & q_{n}^{- 1} \sum_{i = 1}^{n} E (- \sqrt{q_{n}} {\tilde{Π}}_{A_{i}}^{⊤} θ (τ - I (ϵ_{i} - U_{n i} \leq 0)) \\ + \int_{0}^{\sqrt{q_{n}} {\tilde{Π}}_{A_{i}}^{⊤} θ} \{I (ϵ_{i} - U_{n i} < s) - I (ϵ_{i} - U_{n i} < 0)\} d s | x_{i}) \\ = & - q_{n}^{- 1 / 2} \sum_{i = 1}^{n} E ({\tilde{Π}}_{A_{i}}^{⊤} θ (τ - I (ϵ_{i} - U_{n i} \leq 0))) \\ + q_{n}^{- 1} \sum_{i = 1}^{n} E (\int_{U_{n i}}^{\sqrt{q_{n}} {\tilde{Π}}_{A_{i}}^{⊤} θ + U_{n i}} \{I (ϵ_{i} < s) - I (ϵ_{i} < 0)\} d s | x_{i}) \\ ≜ & W_{n 1} (θ) + W_{n 2} (θ), \end{matrix}

where the definition of

W_{n i} (θ), i = 1, 2

is clear from the context. Note that

| F_{ϵ | x} (0 | x) - F_{ϵ | x} (U_{n i} | x) | \leq B | U_{n i} |

for all

x

, where B is the constant in the assumption (A2). Let

U_{n} = {(u_{n 1}, \dots, u_{n n})}^{⊤}

. By Condition (A3), we have

∥ U_{n} ∥_{2} = O (L_{n}^{- r})

. Consequently, we can take a constant

M > 0

such that

{sup}_{{∥ θ ∥}_{2} \leq L} | W_{n 1} (θ) | \leq M q_{n}^{- 1 / 2} ∥ n^{- 1 / 2} θ^{⊤} Π_{A}^{⊤} ∥_{2} ∥ U_{n} ∥_{2} = O_{p} (q_{n}^{- 1 / 2} n^{- 1 / 2} L_{n}^{- r}) {∥ θ ∥}_{2} = O_{p} {(∥ θ ∥}_{2})

by Condition (A3) and Lemma A2. For

W_{n 2} (θ)

, we have

\begin{matrix} W_{n 2} (θ) & = & q_{n}^{- 1} \sum_{i = 1}^{n} \int_{U_{n i}}^{\sqrt{q_{n}} {\tilde{Π}}_{A_{i}}^{⊤} θ + U_{n i}} f_{i} (0) s d s (1 + o (1)) \\ = & q_{n}^{- 1} \sum_{i = 1}^{n} f_{i} (0) [\frac{1}{2} q_{n} {({\tilde{Π}}_{A_{i}}^{⊤} θ)}^{2} + U_{n i} \sqrt{q_{n}} {\tilde{Π}}_{A_{i}}^{⊤} θ] (1 + o (1)) \\ = & C θ^{⊤} (n^{- 1} \sum_{i = 1}^{n} f_{i} (0) Π_{A_{i}} Π_{A_{i}}^{⊤}) θ \times (1 + o (1)) + q_{n}^{- 1 / 2} \sum_{i = 1}^{n} f_{i} (0) U_{n i} {\tilde{Π}}_{A_{i}}^{⊤} θ \\ = & C θ^{⊤} K_{n} θ \times (1 + o (1)) + q_{n}^{- 1 / 2} \sum_{i = 1}^{n} f_{i} (0) U_{n i} {\tilde{Π}}_{A_{i}}^{⊤} θ \end{matrix}

where

K_{n} = \frac{1}{n} Π_{A}^{⊤} B_{n} Π_{A}

. Based on Condition (A3), there exists a finite constant

c > 0

, such that

C θ^{⊤} K_{n} θ \times (1 + o (1)) \geq c {∥ θ ∥}_{2}^{2}

with the probability approaching one. Combining Condition (A2) and the Cauchy–Schwarz inequality, we obtain

\begin{matrix} q_{n}^{- 1 / 2} \sum_{i = 1}^{n} f_{i} (0) U_{n i} {\tilde{Π}}_{A_{i}}^{⊤} θ & = & q_{n}^{- 1 / 2} n^{- 1 / 2} θ^{⊤} Π_{A}^{⊤} B_{n} U_{n} \\ \leq & q_{n}^{- 1 / 2} ∥ n^{- 1 / 2} θ^{⊤} Π_{A}^{⊤} ∥_{2} \cdot {∥ B_{n} U_{n} ∥}_{2} \\ = & O_{p} (q_{n}^{- 1 / 2} n^{- 1 / 2} L_{n}^{- r}) {∥ θ ∥}_{2} = O_{p} {(∥ θ ∥}_{2}) . \end{matrix}

We next evaluate

G_{n 3}

, noting that

E (G_{n 3}) = 0

and

E (G_{n 3}^{2}) \leq C q_{n}^{- 1} E (θ^{⊤} (n^{- 1} \sum_{i = 1}^{n} Π_{A_{i}} Π_{A_{i}}^{⊤}) θ) = O (q_{n}^{- 1} {∥ θ ∥}_{2}^{2}) .

Therefore,

G_{n 3} = O (q_{n}^{- 1} {∥ θ ∥}_{2}^{2})

. Hence, for L sufficiently large, the quadratic term will dominate and

q_{n}^{- 1} \sum_{i = 1}^{n} (Q_{i} (\sqrt{q_{n}}) - Q_{i} (0))

has asymptotically a lower bound

c L^{2}

. By convexity, this implies

∥ \hat{θ} ∥_{2} = O_{p} (\sqrt{q_{n}})

. From the definition of

\hat{θ}

, it follows that

∥ {\hat{γ}}_{01} - γ_{01} ∥_{2} = O_{p} (\sqrt{n^{- 1} q_{n}})

. This completes the proof of Theorem 1 (a).

The proof of Theorem 1 (b) is immediate since

∥ \hat{m} {(x) - m (x) ∥}_{L_{2}} = {∥ {\hat{γ}}_{01} - γ_{01} ∥}_{2}

and

{sup}_{x \in {[0, 1]}^{p}} {∥ Π ∥}_{2} = O (L_{n})

. □

Appendix A.2. Proof of Theorem 2

Proof.

Note that

\hat{γ} = {({\hat{β}}_{0}^{⊤}, {\hat{β}}_{00}^{⊤})}^{⊤}

is the oracle estimator. Our goal is to prove that

\hat{γ}

is a local minimizer of

Q^{P} (β_{j}, β_{j k})

. The gradient function of

\sum_{i = 1}^{n} ρ_{τ} (Y_{i} - \sum_{j = 1}^{p} Ψ_{i, j}^{⊤} β_{j} - \sum_{1 \leq j < k \leq p} Φ_{i, j k}^{⊤} β_{j k})

is not applicable in the proof because the check loss function

ρ_{τ}

is not differentiable at zero. We derive it directly from a certain lower bound of the difference of two check loss functions.

Suppose that there exists an index

j_{0} \in M^{c},

k_{0} \in M^{c}

and

(j_{0}, k_{0}) \in I^{c}

, such that

f_{j_{0}} \neq 0

and

f_{j_{0} k_{0}} \neq 0

. That is,

{\hat{β}}_{j_{0}} \neq 0

and

{\hat{β}}_{j_{0} k_{0}} \neq 0

. Let

{\hat{γ}}^{*} = {({\hat{β}}_{0}^{* ⊤}, {\hat{β}}_{00}^{* ⊤})}^{⊤}

be the vector obtained with

{\hat{β}}_{j_{0}} \neq 0

and

{\hat{β}}_{j_{0} k_{0}} \neq 0

being replaced by 0. Since

ρ_{τ} (u) - ρ_{τ} (v) \geq (τ - I (v \leq 0)) (u - v)

for any

u, v \in R

, then

\begin{matrix} Q^{P} ({\hat{β}}_{j}, {\hat{β}}_{j k}) - Q^{P} ({\hat{β}}_{j}^{*}, {\hat{β}}_{j k}^{*}) \\ \geq & - \frac{1}{n} \sum_{i = 1}^{n} (τ - I (Y_{i} \leq Π_{i}^{⊤} {\hat{γ}}^{*})) Π_{i}^{⊤} (\hat{γ} - {\hat{γ}}^{*}) \\ + λ_{1} ({(\sum_{j = 1}^{p} ∥ R_{j} ∥_{2} ∥ β_{j_{0}} ∥_{2} + \sum_{1 \leq k < j \leq p} I_{β_{j k \neq 0}} (∥ R_{j} ∥_{2} ∥ β_{j_{0}} ∥_{2} + ∥ R_{k} ∥_{2} ∥ β_{k_{0}} ∥_{2}))}^{1 / 2} \\ + \sum_{1 \leq k < j \leq p} ∥ Q_{j k} ∥_{2} {∥ β_{j_{0} k_{0}} ∥}_{2}) \\ = & - \frac{1}{n} \sum_{i = 1}^{n} (τ - I (ϵ_{i} \leq 0)) Π_{i}^{⊤} (\hat{γ} - {\hat{γ}}^{*}) - \frac{1}{n} \sum_{i = 1}^{n} (I (ϵ_{i} \leq 0) - I (ϵ_{i} \leq r_{n i})) Π_{i}^{⊤} (\hat{γ} - {\hat{γ}}^{*}) \\ + λ_{1} ({(\sum_{j = 1}^{p} ∥ R_{j} ∥_{2} ∥ β_{j_{0}} ∥_{2} + \sum_{1 \leq k < j \leq p} I_{β_{j k \neq 0}} (∥ R_{j} ∥_{2} ∥ β_{j_{0}} ∥_{2} + ∥ R_{k} ∥_{2} ∥ β_{k_{0}} ∥_{2}))}^{1 / 2} \\ + \sum_{1 \leq k < j \leq p} ∥ Q_{j k} ∥_{2} {∥ β_{j_{0} k_{0}} ∥}_{2}) \\ \geq & - \frac{1}{n} ∥ \sum_{i = 1}^{n} (τ - I (ϵ_{i} \leq 0)) Π_{i} ∥ ∥ \hat{γ} - {\hat{γ}}^{*} ∥ - \frac{1}{n} ∥ \sum_{i = 1}^{n} (I (ϵ_{i} \leq 0) - I (ϵ_{i} \leq r_{n i})) Π_{i} ∥ ∥ \hat{γ} - {\hat{γ}}^{*} ∥ \\ + λ_{1} ({(\sum_{j = 1}^{p} ∥ R_{j} ∥_{2} ∥ β_{j_{0}} ∥_{2} + \sum_{1 \leq k < j \leq p} I_{β_{j k \neq 0}} (∥ R_{j} ∥_{2} ∥ β_{j_{0}} ∥_{2} + ∥ R_{k} ∥_{2} ∥ β_{k_{0}} ∥_{2}))}^{1 / 2} \\ + \sum_{1 \leq k < j \leq p} ∥ Q_{j k} ∥_{2} {∥ β_{j_{0} k_{0}} ∥}_{2}) \\ \geq & - T_{n 1} - T_{n 2} + T_{n 3} \end{matrix}

where

r_{n i} = U_{n i} - Π_{i}^{⊤} (\hat{γ} - {\hat{γ}}^{*})

. By simple calculation, one has that

∥ \sum_{i = 1}^{n} (τ - I (ϵ_{i} \leq 0)) Π_{i} ∥_{2} = O_{p} (n^{1 / 2} L_{n})

. Therefore,

T_{n 1} = O_{p} (n^{- 1} L_{n}^{2})

. For

T_{n 2}

, from Conditions (A2) and (A3), we have

\begin{matrix} E {[\sum_{i = 1}^{n} (I (ϵ_{i} \leq 0) - I (ϵ_{i} \leq r_{n i})) Π_{i}]}^{2} \\ \leq & n \sum_{i = 1}^{n} E {[(I (ϵ_{i} \leq 0) - I (ϵ_{i} \leq r_{n i})) Π_{i}]}^{2} \\ \leq & n \sum_{i = 1}^{n} E [Π_{i}^{⊤} Π_{i} | I (ϵ_{i} - r_{n i} \leq 0) - I (ϵ_{i} \leq 0) |] \\ \leq & n \sum_{i = 1}^{n} E [s_{(n)}^{2} I (0 \leq | ϵ_{i} | \leq | r_{n i} |)] \\ \leq & L_{n} \sum_{i = 1}^{n} \int_{- | r_{n i} |}^{| r_{n i} |} f_{i} (s) d s \\ = & O_{p} (n^{1 / 2} L_{n}^{3}) \end{matrix}

where

s_{(n)} = {max}_{i} {∥ φ (x_{i}) ∥}_{2} \leq c_{1} n^{- 1 / 2} L_{n}

for some positive constant

c_{1}

. The last equality follows by observing that

max_{1 \leq i \leq n} | r_{n i} | \leq O (L_{n}^{- r}) + L_{n} {∥ \hat{γ} - {\hat{γ}}^{*} ∥}_{2} = O_{p} (n^{- 1 / 2} L_{n}^{2})

. This implies that

∥ \sum_{i = 1}^{n} (τ - I (ϵ_{i} \leq 0)) Π_{i} ∥_{2} = O_{p} (n^{1 / 2} L_{n}^{3})

. Therefore,

T_{n 2} = O_{p} (n^{- 1 / 2} L_{n}^{3})

. By

T_{n 1}

,

T_{n 2}

, and

n^{- 1} L_{n}^{3} λ_{1}^{- 1} \to 0

, we have that

T_{n 3}

will dominate. Therefore,

Q^{P} ({\hat{β}}_{j}, {\hat{β}}_{j k}) - Q^{P} ({\hat{β}}_{j}^{*}, {\hat{β}}_{j k}^{*}) > 0

with probability tending to one, which contradicts the fact that

\hat{γ}

is the minimizer of

Q^{P} (β_{j}, β_{j k})

. Similarly, the same results are proven under weak heredity. This completes the proof of Theorem 2. □

Appendix A.3. Proof of Theorem 3

Lemma A3.

Let

z_{1}, \dots, z_{n}

be independent random variable with values in some space

Z

and let Γ be a class of real-valued functions on

Z

, satisfying, for some positive constants

η_{n}

and

τ_{n}

,

{∥ γ (z) ∥}_{2} \leq η_{n} and \frac{1}{n} \sum_{i = 1}^{n} v a r (γ (z_{i})) \leq τ_{n}^{2}, \forall γ \in Γ .

Define

Z : = {sup}_{γ \in Γ} | \frac{1}{n} \sum_{i = 1}^{n} (γ (z_{i}) - E γ (z_{i})) |

. Then, for

t > 0

,

P (Z \leq E (Z) + t \sqrt{2 (τ_{n}^{2} + 2 η_{n} E (Z))} + \frac{2 η_{n} t^{2}}{3}) \leq exp (- n t^{2}) .

For the details of this proof, see [41].

Proof.

Define

Γ = {γ (z) : γ (z) = ρ (Y - \hat{m} (x)) - ρ (Y - \tilde{m} (x))}

. We can write

[R (\hat{m}) - R (\tilde{m})] - [R_{n} (\hat{m}) - R_{n} (\tilde{m})] = E γ (z) - \frac{1}{n} \sum_{i = 1}^{n} γ (z_{i}), γ \in Γ

. By Lemma A3, we have

Z \leq E (Z) + \sqrt{2 t^{2} (τ_{n}^{2} + 2 η_{n} E (Z))} + \frac{2 η_{n} t^{2}}{3} .

with probability at least

1 - exp (- n t^{2})

for

t > 0

. Based on the subadditivity and inequality

\sqrt{x y} \leq (x + y) / 2, x, y \geq 0

, we have

\sqrt{2 t^{2} (τ_{n}^{2} + 2 η_{n} E (Z))} \leq \sqrt{2 t^{2} τ_{n}^{2}} + 2 \sqrt{t^{2} η_{n} E (Z)} \leq \sqrt{2 t^{2} τ_{n}^{2}} + 2 E (Z) + t^{2} η_{n} .

Then, with probability at least

1 - exp (- n t^{2})

, we have

Z \leq 3 E (Z) + \sqrt{2 t^{2} τ_{n}^{2}} + \frac{5 t^{2} η_{n}}{3} .

(A1)

Let

γ (x)

be the collection of all differences

ρ (Y - \hat{m} (x)) - ρ (Y - \tilde{m} (x))

. The bracketing number

N_{[]} (δ, M_{n})

is the minimum number of

ε

-brackets

[l_{i}, u_{i}]

over

M_{n}

, where

∥ u_{j} - l_{j} ∥_{\infty} \leq ϵ, 1 \leq j \leq k

. We can construct 2

ε

-brackets over

γ (x)

by taking difference

[l_{i} - u_{j}, u_{i} - l_{j}]

for the upper and lower bounds. Therefore, the bracketing numbers

N_{[]} (ϵ, γ (x))

are bounded by the squares of the bracketing numbers

N_{[]} (ϵ / 2, M_{n})

. For

δ > 0

, by theorem 19.5 in [42], there exists a finite number

a (δ)

such that

E (sup_{i} sup_{γ \in Γ} | \frac{1}{n} \sum_{i = 1}^{n} γ (x_{i}) - E γ (z) |) ≲ \frac{J_{[]} (δ, M_{n})}{\sqrt{n}} + M 1 {M > a (δ) \sqrt{n}} .

where

J_{[]} (δ, M_{n}) = \int_{0}^{δ} \sqrt{log N_{[]} (ϵ, M_{n})} d ϵ

is the bracketing integral. The envelope function M can be taken as equal to the supremum of the absolute values of the upper and lower bounds of finitely many brackets over

M_{n}

. Based on Theorem 19.5 in [42], the second term on the right is bounded by

a {(δ)}^{- 1} P M^{2} 1 {M > a (δ) \sqrt{n}}

and hence converges to zero as

n \to \infty

. Given

K > 0

, there exists a constant K such that, for Sobolev space

S_{C}^{2} ([0, 1])

,

log N_{[]} (δ, S_{C}^{2} ([0, 1])) \leq K {(\frac{1}{δ})}^{1 / 2},

Note that,

\sum_{j \in M} φ_{j}^{⊤} β_{j} \in M_{n} (L_{n})

and

\sum_{(j k) I} ϕ_{j k}^{⊤} β_{j k} \in M_{n} (L_{n})

, where

φ_{j} \in S_{C}^{2} ([0, 1])

and

ϕ_{j k} \in S_{C}^{2} ({[0, 1]}^{2})

. Reference [43] implies that the bracketing integral of Sobolev space

S_{C}^{2} ([0, 1])

and

S_{C}^{2} ({[0, 1]}^{2})

is bounded. Then,

J_{[]} (δ, M_{n}) = O ({(log p)}^{1 / 2} + {(2 log p)}^{1 / 4}) = O (\sqrt{log p}) .

By the convexity of

ρ_{τ}

, there exists

C > 0

such that

E {(γ (z))}^{2} \leq 2 {∥ \hat{m} (x) - \tilde{m} (x) ∥}_{L_{2}}^{2}

and

{∥ γ ∥}_{\infty} \leq C ∥ \hat{m} (x) - \tilde{m} {(x) ∥}_{\infty} \leq {∥ \hat{m} (x) - \tilde{m} (x) ∥}_{L_{2}}

. Let

τ_{n}^{2} = ∥ \hat{m} (x) - \tilde{m} {(x) ∥}_{L_{2}}^{2} \leq ∥ \hat{m} {(x) - m (x) ∥}_{L_{2}}^{2} + {∥ \tilde{m} (x) - m (x) ∥}_{L_{2}}^{2} = O (n^{- (2 r - 1) / (2 r + 1)})

and

η_{n} = ∥ \hat{m} (x) - \tilde{m} {(x) ∥}_{L_{2}} \leq ∥ \hat{m} {(x) - m (x) ∥}_{L_{2}} + {∥ \tilde{m} (x) - m (x) ∥}_{L_{2}} = O (n^{- (2 r - 1) / 2 (2 r + 1)})

. Then, Equation (A1) implies that

\begin{matrix} Z & \leq & n^{- 1 / 2} \sqrt{log p} + \sqrt{2 t^{2}} ∥ \hat{m} (x) - \tilde{m} {(x) ∥}_{L_{2}} + \frac{5 t^{2}}{3} {∥ \hat{m} (x) - \tilde{m} (x) ∥}_{L_{2}} \\ \leq & \sqrt{log p} n^{- 1 / 2} + (\sqrt{2 t^{2}} + 5 t^{2} / 3) n^{- (2 r - 1) / 2 (2 r + 1)} \\ \leq & (2 \sqrt{2 t^{2}} + 10 t^{2} / 3) n^{- (2 r - 1) / 2 (2 r + 1)} \end{matrix}

with probability at least

1 - exp (- n t^{2})

as

n \to \infty

. From the above, we have

R (\hat{m}) - R (\tilde{m}) \leq R_{n} (\hat{m}) - R_{n} (\tilde{m}) + (2 \sqrt{2 t^{2}} + 10 t^{2} / 3) n^{- (2 r - 1) / 2 (2 r + 1)}

and that there exists

D_{n}

such that

P (R (\hat{m}) - R (\tilde{m}) \leq D_{n}) = P (R_{n} (\hat{m}) - R_{n} (\tilde{m}) + (2 \sqrt{2 t^{2}} + 10 t^{2} / 3) n^{- (2 r - 1) / 2 (2 r + 1)})

. Under the assumption that the regression function is bounded, it follows, for

φ_{j} \in S_{C}^{2} ([0, 1])

and

ϕ_{j k} \in S_{C}^{2} ({[0, 1]}^{2})

, that

{(\sum_{j = 1}^{p} ∥ β_{j} ∥_{2} + \sum_{1 \leq k < j \leq p} I_{β_{j k \neq 0}} (∥ β_{j} ∥_{2} + ∥ β_{k} ∥_{2}))}^{1 / 2} + \sum_{1 \leq k < j \leq p} {∥ β_{j k} ∥}_{2} \leq L_{n}

where

L_{n} = o ({[n / l o g (n)]}^{1 / 4})

. Reference [13] also shows that the Lasso is persistent when p grows polynomially in n. Furthermore, according to the definition of

\hat{m} (x)

, we have

R_{n} (\hat{m}) - R_{n} (\tilde{m}) \leq λ_{1} L_{n}

and

R (\hat{m}) - R (\tilde{m}) = λ_{1} L_{n} + (2 \sqrt{2 t^{2}} + 10 t^{2} / 3) n^{- (2 r - 1) / 2 (2 r + 1)}

. Let

D_{n} = λ_{1} L_{n} + (2 \sqrt{2 t^{2}} + 10 t^{2} / 3) n^{- (2 r - 1) / 2 (2 r + 1)}

. Since

R_{n} (\hat{m}) - R_{n} (\tilde{m}) \leq λ_{1} L_{n}

always holds, the probability of

P (R (\hat{m}) - R (\tilde{m}) \leq D_{n}) = P (R_{n} (\hat{m}) - R_{n} (\tilde{m}) + (2 \sqrt{2 t^{2}} + 10 t^{2} / 3) n^{- (2 r - 1) / 2 (2 r + 1)} \leq λ_{1} L_{n} + (2 \sqrt{2 t^{2}} + 10 t^{2} / 3) n^{- (2 r - 1) / 2 (2 r + 1)})

is 1. Note that

λ_{1} L_{n} + (2 \sqrt{2 t^{2}} + 10 t^{2} / 3) n^{- (2 r - 1) / 2 (2 r + 1)} = O ((2 \sqrt{2 t^{2}} + 10 t^{2} / 3) n^{- (2 r - 1) / 2 (2 r + 1)})

. Therefore,

R (\hat{m}) - R (\tilde{m}) \leq (2 \sqrt{2 t^{2}} + 10 t^{2} / 3) n^{- (2 r - 1) / 2 (2 r + 1)}

with probability at least

1 - exp (- n t^{2})

. Similarly, the same results are proved under weak heredity. □

Appendix B. Estimated Results at τ = 0.1, 0.25, 0.75, 0.9 in Applications

Figure A1 illustrates the effects of HNR, RPDE, and DFA on the UPDRS score at

τ = 0.1

. As HNR increases, the UPDRS score significantly decreases, showing a nonlinear relationship, where higher noise-to-harmonic ratios are associated with milder symptoms. RPDE exhibits a clear U-shaped curve, indicating that extreme values correspond to more severe symptoms, while intermediate values are associated with milder symptoms, reflecting a nonlinear effect pattern. DFA shows a negative correlation, with UPDRS scores decreasing as DFA increases, again revealing a nonlinear trend, suggesting that increased fundamental frequency variability may help to alleviate symptoms.

Figure A2 shows the interaction effect between HNR and DFA at

τ = 0.1

, revealing a nonlinear relationship. When HNR is low and DFA is high, the UPDRS score is higher; as HNR increases, the UPDRS score decreases. However, when HNR is high and DFA is low, the score increases again. This suggests that the effects of HNR and DFA on the UPDRS score are complex and interdependent, highlighting the importance of understanding and analyzing the interactions between these variables.

Figure A1. Estimated main effects for the HNR, RPDE, and DFA variables at

τ = 0.10

.

Figure A1. Estimated main effects for the HNR, RPDE, and DFA variables at

τ = 0.10

.

Figure A2. Estimated interaction between HNR and DFA at

τ = 0.1

.

Figure A2. Estimated interaction between HNR and DFA at

τ = 0.1

.

Figure A3 illustrates the estimated main effect trends of three acoustic features—HNR, DFA, and PPE—at

τ = 0.25

. Under this condition, as HNR increases, its impact on the analysis first rises slightly and then drops rapidly, indicating that the purity of the speech signal influences recognition but with diminishing returns beyond a certain point. DFA reveals an optimal fluctuation pattern, after which its ability to reflect health status declines. PPE shows that the randomness and complexity of the speech signal also have an optimal range, beyond which the effect weakens. Overall, the variations in these features highlight their potential applications in fields such as disease diagnosis, where analyzing these acoustic characteristics can assist in medical diagnostics and improve accuracy.

Figure A4 illustrates the estimated interaction between DFA and PPE at

τ = 0.25

, presented through a 3D surface plot. As DFA and PPE values vary, their interaction significantly impacts the overall effect. Specifically, when DFA is at a low level, the overall effect initially increases slowly and then drops rapidly as PPE increases. In contrast, when DFA is high, this trend becomes more gradual, indicating a more complex interaction pattern. This suggests that the influence of their interaction varies significantly across different DFA and PPE combinations. For instance, in certain combinations, their interaction can more accurately reflect changes in health status within speech signals, which is important for disease diagnosis and improving speech processing technologies. Overall, the figure highlights the nonlinear interaction between DFA and PPE and its potential applications.

Figure A3. Estimated main effects for the HNR, DFA, and PPE variables at

τ = 0.25

.

Figure A3. Estimated main effects for the HNR, DFA, and PPE variables at

τ = 0.25

.

Figure A4. Estimated interaction between DFA and PPE at

τ = 0.25

.

Figure A4. Estimated interaction between DFA and PPE at

τ = 0.25

.

Figure A5 illustrates the estimated main effects of three variables HNR, DFA, and PPE at

τ = 0.75

. From the figure, it can be observed that, as the HNR value increases, its impact on the overall effect first rises and then rapidly decreases. The change in DFA values shows a similar trend, but the decline is more gradual. Meanwhile, the PPE value exhibits a distinct peak before gradually decreasing. These variations in the features highlight their significance in practical applications, particularly in disease diagnosis and speech processing fields.

Figure A6 illustrates the estimated interaction effects between HNR and DFA, as well as between HNR and PPE at

τ = 0.75

. From the figure, it can be observed that the interaction between HNR and DFA exhibits a complex nonlinear relationship, with the overall effect first increasing and then decreasing as the HNR value rises, showing a distinct fluctuation pattern. Similarly, the interaction between HNR and PPE also displays a complex pattern, but with more pronounced changes, particularly when the PPE value is high, leading to more significant variations in the overall effect.

Figure A5. Estimated main effects for the HNR and DFA variables at

τ = 0.75

.

Figure A5. Estimated main effects for the HNR and DFA variables at

τ = 0.75

.

Figure A6. Estimated interaction effects between HNR and DFA, as well as the interaction effect between HNR and PPE variables at

τ = 0.75

.

Figure A6. Estimated interaction effects between HNR and DFA, as well as the interaction effect between HNR and PPE variables at

τ = 0.75

.

Figure A7 provides the effects of dB and HNR on the UPDRS score at

τ = 0.9

. The main effect of dB presents a U-shaped curve, indicating that both extremely high and low dB values lead to higher UPDRS scores, with lower scores at intermediate values. This emphasizes the importance of moderate changes in sound levels and their nonlinear effects. HNR also shows an inverted U-shaped curve, indicating that moderate HNR values are associated with the most severe symptoms, with this nonlinear relationship being particularly prominent at higher quantiles. These findings highlight the complex nonlinear effects of different variables on the UPDRS score, providing valuable insights for understanding Parkinson’s disease symptoms.

Figure A7. Estimated main effects for the dB and HNR variables at

τ = 0.90

.

Figure A7. Estimated main effects for the dB and HNR variables at

τ = 0.90

.

References

Koenker, R.; Bassett, G. Regression quantiles. Econometrica 1978, 46, 33–50. [Google Scholar] [CrossRef]
Nelder, J.A. A reformulation of linear models. J. R. Stat. Soc. Ser. A (Gen.) 1977, 140, 48–63. [Google Scholar] [CrossRef]
McCullagh, P.; Nelder, J. Monographs on Statistics and Applied Probability; Chapman & Hall: London, UK, 1989. [Google Scholar]
De Gooijer, J.G.; Zerom, D. On additive conditional quantiles with high-dimensional covariates. J. Am. Stat. Assoc. 2003, 98, 135–146. [Google Scholar] [CrossRef]
Cheng, Y.; De Gooijer, J.G.; Zerom, D. Efficient estimation of an additive quantile regression model. Scand. J. Stat. 2011, 38, 46–62. [Google Scholar] [CrossRef]
Lee, Y.K.; Mammen, E.; Park, B.U. Backfitting and smooth backfitting for additive quantile models. Ann. Stat. 2010, 38, 2857–2883. [Google Scholar] [CrossRef]
Horowitz, J.L.; Lee, S. Nonparametric estimation of an additive quantile regression model. J. Am. Stat. Assoc. 2005, 100, 1238–1249. [Google Scholar] [CrossRef]
Sherwood, B.; Maidman, A. Additive nonlinear quantile regression in ultra-high dimension. J. Mach. Learn. Res. 2022, 23, 1–47. [Google Scholar]
Zhao, W.; Li, R.; Lian, H. Estimation and variable selection of quantile partially linear additive models for correlated data. J. Stat. Comput. Simul. 2024, 94, 315–345. [Google Scholar] [CrossRef]
Cui, X.; Zhao, W. Pursuit of dynamic structure in quantile additive models with longitudinal data. Comput. Stat. Data Anal. 2019, 130, 42–60. [Google Scholar] [CrossRef]
Lin, Y.; Zhang, H.H. Component selection and smoothing in multivariate nonparametric regression. Ann. Stat. 2006, 34, 2272–2297. [Google Scholar] [CrossRef]
Storlie, C.B.; Bondell, H.D.; Reich, B.J.; Zhang, H.H. Surface estimation, variable selection, and the nonparametric oracle property. Stat. Sin. 2011, 21, 679–705. [Google Scholar] [CrossRef] [PubMed]
Radchenko, P.; James, G.M. Variable selection using adaptive nonlinear interaction structures in high dimensions. J. Am. Stat. Assoc. 2010, 105, 1541–1553. [Google Scholar] [CrossRef]
Efron, B.; Hastie, T.; Johnstone, I.; Tibshirani, R. Least angle regression. Ann. Stat. 2004, 32, 407–499. [Google Scholar] [CrossRef]
Wu, T.T.; Chen, Y.F.; Hastie, T.; Sobel, E.; Lange, K. Genome-wide association analysis by lasso penalized logistic regression. Bioinformatics 2009, 25, 714–721. [Google Scholar] [CrossRef]
Zhao, P.; Rocha, G.; Yu, B. The composite absolute penalties family for grouped and hierarchical variable selection. Ann. Stat. 2009, 37, 3468–3497. [Google Scholar] [CrossRef]
Yuan, M.; Joseph, V.R.; Zou, H. Structured variable selection and estimation. Ann. Appl. Stat. 2009, 3, 1738–1757. [Google Scholar] [CrossRef]
Choi, N.H.; Li, W.; Zhu, J. Variable selection with the strong heredity constraint and its oracle property. J. Am. Stat. Assoc. 2010, 105, 354–364. [Google Scholar] [CrossRef]
Bien, J.; Taylor, J.; Tibshirani, R. A lasso for hierarchical interactions. Ann. Stat. 2013, 41, 1111–1141. [Google Scholar] [CrossRef]
She, Y.; Wang, Z.F.; Jiang, H. Group regularized estimation under structural hierarchy. J. Am. Stat. Assoc. 2018, 113, 445–454. [Google Scholar] [CrossRef]
Hao, N.; Feng, Y.; Zhang, H.H. Model Selection for High Dimensional Quadratic Regression via Regularization. J. Am. Stat. Assoc. 2018, 113, 615–625. [Google Scholar] [CrossRef]
Liu, C.; Ma, J.; Amos, C.I. Bayesian variable selection for hierarchical gene–environment and gene–gene interactions. Hum. Genet. 2015, 134, 23–36. [Google Scholar] [CrossRef] [PubMed]
Yang, Y.; Basu, S.; Zhang, L. A Bayesian hierarchical variable selection prior for pathway-based GWAS using summary statistics. Stat. Med. 2020, 39, 724–739. [Google Scholar] [CrossRef] [PubMed]
Kim, J.E.A. Bayesian variable selection with strong heredity constraints. J. Korean Stat. Soc. 2018, 47, 314–329. [Google Scholar] [CrossRef]
Li, Y.; Wang, N.; Carroll, R.J. Generalized functional linear models with semiparametric single-index interactions. J. Am. Stat. Assoc. 2010, 105, 621–633. [Google Scholar] [CrossRef]
Li, Y.; Liu, J.S. Robust variable and interaction selection for logistic regression and multiple index models. J. Am. Stat. Assoc. 2019, 114, 271–286. [Google Scholar] [CrossRef]
Liu, Y.; Li, Y.; Carroll, R.J. Predictive functional linear models with diverging number of semiparametric single-index interactions. J. Econom. 2021, 230, 221–239. [Google Scholar] [CrossRef]
Liu, H.; You, J.; Cao, J. A Dynamic Interaction Semiparametric Function-on-Scalar Model. J. Am. Stat. Assoc. 2021, 118, 360–373. [Google Scholar] [CrossRef]
Yu, K.; Lu, Z. Local linear additive quantile regression. Scand. J. Stat. 2004, 31, 333–346. [Google Scholar] [CrossRef]
Noh, H.; Lee, E.R. Component selection in additive quantile regression models. J. Korean Stat. Soc. 2014, 43, 439–452. [Google Scholar] [CrossRef]
Wood, S.N. P-splines with derivative based penalties and tensor product smoothing of unevenly distributed data. Stat. Comput. 2017, 27, 985–989. [Google Scholar] [CrossRef]
Eilers, P.H.C.; Marx, B.D. Flexible smoothing with B-splines and penalties. Stat. Sci. 1996, 11, 89–102. [Google Scholar] [CrossRef]
Meier, L.; van de Geer, S.; Buehlmann, P. High-dimensional additive modeling. Ann. Stat. 2009, 37, 3779–3821. [Google Scholar] [CrossRef]
Fan, Y.; Tang, C.Y. Tuning parameter selection in high dimensional penalized likelihood. J. R. Stat. Soc. Ser. B 2013, 75, 531–552. [Google Scholar] [CrossRef]
Greenshtein, E.; Ritov, Y. Persistence in high-dimensional linear predictor selection and the virtue of overparametrization. Bernoulli 2004, 10, 971–988. [Google Scholar] [CrossRef]
Jiang, L.; Bondell, H.; Wang, H. Interquantile shrinkage and variable selection in quantile regression. Comput. Stat. Data Anal. 2014, 69, 208–219. [Google Scholar] [CrossRef]
Zou, H.; Ming, Y. Composite quantile regression and the oracle model selection theory. Ann. Stat. 2008, 36, 1108–1126. [Google Scholar] [CrossRef]
Kohns, D.; Szendrei, T. Horseshoe prior Bayesian quantile regression. J. R. Stat. Soc. Ser. C Appl. Stat. 2024, 73, 193–220. [Google Scholar] [CrossRef]
Tsanas, A.; Little, M.A.; McSharry, P.E.; Ramig, L.O. Accurate telemonitoring of parkinson’s disease progression by noninvasive speech tests. IEEE Trans. Bio-Med. Eng. 2010, 57, 884–893. [Google Scholar] [CrossRef]
He, X.M.; Shi, P. Convergence rate of B-spline estimators of nonparametric conditional quantile functions. J. Nonparametr. Stat. 1994, 3, 299–308. [Google Scholar] [CrossRef]
Bousquet, O. A Bennett concentration inequality and its application to suprema of empirical processes. Comptes Rendus Math. 2002, 334, 495–500. [Google Scholar] [CrossRef]
Vaart, A. Asymptotic Statistics; Cambridge University Press: Cambridge, UK, 1990. [Google Scholar]
Birman, M.; Solomjak, M.Z. Piecewise-polynomial approximations of functions of the classes. Math. USSR-Sb. 1967, 2, 295–317. [Google Scholar] [CrossRef]

Figure 1. (Left) Histogram and density curve of UPDRS; (right) Correlation between voice measurements.

Figure 2. Estimated main effects for the APQ11, HNR, RPDE, and DFA variables at

τ = 0.5

.

Figure 2. Estimated main effects for the APQ11, HNR, RPDE, and DFA variables at

τ = 0.5

.

Figure 3. Estimated interaction term for the HNR and RPDE variables at

τ = 0.5

.

Figure 3. Estimated interaction term for the HNR and RPDE variables at

τ = 0.5

.

Table 1. Selection and estimation results for strong heredity with

n = 100

.

Table 1. Selection and estimation results for strong heredity with

n = 100

.

Method	Main								Inter
Method	$P_{m}$	$x_{1}$	$x_{2}$	$x_{3}$	$x_{4}$	$x_{5}$	mTPR	mFDR	$P_{i}$	$x_{1} x_{2}$	$x_{1} x_{3}$	iTPR	iFDR	Size	RMSE
$N (0, 1)$
$h i r N e t$	1.000	1.000	1.000	1.000	1.000	1.000	1.000	0.045	1.000	1.000	1.000	1.000	0.710	12.140	6.056
$R A M P$	0.920	1.000	1.000	1.000	1.000	0.920	0.984	0.046	1.000	1.000	1.000	1.000	0.382	8.400	7.200
$p r o p o s e d (τ = 0.1)$	1.000	1.000	1.000	1.000	-	-	1.000	0.038	-	-	-	-	-	3.420	1.030
$p r o p o s e d (τ = 0.25)$	0.400	1.000	1.000	1.000	0.920	0.400	0.864	0.068	0.340	0.920	0.920	0.350	0.426	7.580	4.251
$p r o p o s e d (τ = 0.50)$	0.700	1.000	1.000	1.000	0.980	0.700	0.936	0.052	0.940	0.980	0.940	0.960	0.200	7.360	2.919
$p r o p o s e d (τ = 0.75)$	0.640	1.000	1.000	1.000	1.000	0.640	0.928	0.021	0.980	0.980	1.000	0.990	0.232	7.320	3.216
$p r o p o s e d (τ = 0.9)$	1.000	1.000	-	1.000	-	1.000	1.000	0.044	-	-	-	-	-	3.920	1.042
$t (3)$
$h i r N e t$	1.000	1.000	1.000	1.000	1.000	1.000	1.000	0.110	1.000	1.000	1.000	1.000	0.727	12.960	7.392
$R A M P$	0.900	1.000	1.000	1.000	1.000	0.900	0.980	0.050	1.000	1.000	1.000	1.000	0.363	8.300	8.978
$p r o p o s e d (τ = 0.1)$	1.000	1.000	1.000	1.000	-	-	1.000	0.056	-	-	-	-	-	3.480	2.994
$p r o p o s e d (τ = 0.25)$	0.380	1.000	1.000	1.000	0.940	0.380	0.864	0.056	0.520	0.880	0.880	0.550	0.388	7.440	7.718
$p r o p o s e d (τ = 0.50)$	0.620	1.000	1.000	1.000	1.000	0.620	0.924	0.057	0.940	0.960	0.960	0.960	0.213	7.360	6.033
$p r o p o s e d (τ = 0.75)$	0.660	1.000	1.000	1.000	1.000	0.660	0.932	0.025	1.000	1.000	1.000	1.000	0.253	7.460	6.146
$p r o p o s e d (τ = 0.9)$	0.960	0.960	-	1.000	-	1.000	0.986	0.019	-	-	-	-	-	3.480	2.827
$χ^{2} (2)$
$h i r N e t$	1.000	1.000	1.000	1.000	1.000	1.000	1.000	0.128	1.000	1.000	1.000	1.000	0.735	13.300	8.663
$R A M P$	0.860	1.000	1.000	1.000	1.000	0.860	0.972	0.061	1.000	1.000	1.000	1.000	0.425	8.680	10.217
$p r o p o s e d (τ = 0.1)$	0.980	0.980	1.000	1.000	-	-	0.993	0.032	-	-	-	-	-	3.160	3.931
$p r o p o s e d (τ = 0.25)$	1.000	1.000	1.000	1.000	1.000	1.000	1.000	0.000	0.1000	0.820	0.820	0.100	0.411	7.180	7.638
$p r o p o s e d (τ = 0.50)$	1.000	1.000	1.000	1.000	1.000	1.000	1.000	0.000	0.940	1.000	0.940	0.000	0.210	7.260	8.583
$p r o p o s e d (τ = 0.75)$	1.000	1.000	1.000	1.000	1.000	1.000	1.000	0.000	1.000	1.000	1.000	0.180	0.307	7.640	8.942
$p r o p o s e d (τ = 0.1)$	0.860	0.860	-	1.000	-	1.000	0.953	0.046	-	-	-	-	-	3.900	6.049
$σ (x_{i}) e_{i}$
$h i r N e t$	1.000	1.000	1.000	1.000	1.000	1.000	1.000	0.031	1.000	1.000	1.000	1.000	0.717	12.240	6.199
$R A M P$	0.900	1.000	1.000	1.000	1.000	0.900	0.980	0.057	1.000	1.000	1.000	1.000	0.411	8.600	7.441
$p r o p o s e d (τ = 0.1)$	1.000	1.000	1.000	1.000	-	-	1.000	0.026	-	-	-	-	-	3.380	1.306
$p r o p o s e d (τ = 0.25)$	0.404	1.000	1.000	1.000	0.957	0.404	0.872	0.072	0.914	0.936	0.978	0.957	0.403	7.914	2.893
$p r o p o s e d (τ = 0.50)$	0.640	1.000	1.000	1.000	0.980	0.6400	0.924	0.057	0.940	0.980	0.940	0.960	0.219	7.380	2.355
$p r o p o s e d (τ = 0.75)$	0.660	1.000	1.000	1.000	1.000	0.660	0.932	0.029	0.980	0.980	1.000	0.990	0.232	7.380	2.706
$p r o p o s e d (τ = 0.9)$	1.000	1.000	-	1.000	-	1.000	1.000	0.063	-	-	-	-	-	4.080	1.344

Table 2. Selection and estimation results for strong heredity with

n = 300

.

Table 2. Selection and estimation results for strong heredity with

n = 300

.

Method	Main								Inter
Method	$P_{m}$	$x_{1}$	$x_{2}$	$x_{3}$	$x_{4}$	$x_{5}$	mTPR	mFDR	$P_{i}$	$x_{1} x_{2}$	$x_{1} x_{3}$	iTPR	iFDR	Size	RMSE
$N (0, 1)$
$h i r N e t$	1.000	1.000	1.000	1.000	1.000	1.000	1.000	0.000	1.000	1.000	1.000	1.000	0.695	11.560	6.158
$R A M P$	1.000	1.000	1.000	1.000	1.000	1.000	1.000	0.000	1.000	1.000	1.000	1.000	0.629	10.840	2.058
$p r o p o s e d (τ = 0.1)$	1.000	1.000	1.000	1.000	-	-	1.000	0.000	-	-	-	-	-	3.000	0.978
$p r o p o s e d (τ = 0.25)$	1.000	1.000	1.000	1.000	1.000	1.00	1.000	0.000	1.000	1.000	1.000	1.000	0.091	7.200	2.520
$p r o p o s e d (τ = 0.50)$	1.000	1.000	1.000	1.000	1.000	1.000	1.000	0.000	1.000	1.000	1.000	1.000	0.000	7.000	2.472
$p r o p o s e d (τ = 0.75)$	1.000	1.000	1.000	1.000	1.000	1.000	1.000	0.000	1.000	1.000	1.000	1.000	0.038	7.080	1.120
$p r o p o s e d (τ = 0.9)$	1.000	1.000	-	1.000	-	1.000	1.000	0.000	-	-	-	-	-	3.040	0.958
$t (3)$
$h i r N e t$	1.000	1.000	1.000	1.000	1.000	1.000	1.000	0.000	1.000	1.000	1.000	1.000	0.696	11.580	8.741
$R A M P$	1.000	1.000	1.000	1.000	1.000	1.000	1.000	0.000	1.000	1.000	1.000	1.000	0.635	11.320	4.641
$p r o p o s e d (τ = 0.1)$	1.000	1.000	1.000	1.000	-	-	1.000	0.000	-	-	-	-	-	3.020	3.765
$p r o p o s e d (τ = 0.25)$	1.000	1.000	1.000	1.000	1.000	1.000	1.000	0.000	1.000	1.000	1.000	1.000	0.074	7.160	4.684
$p r o p o s e d (τ = 0.50)$	1.000	1.000	1.000	1.000	1.000	1.000	1.000	0.000	1.000	1.000	1.000	1.000	0.000	7.000	4.664
$p r o p o s e d (τ = 0.75)$	1.000	1.000	1.000	1.000	1.000	1.000	1.000	0.000	1.000	1.000	1.000	1.000	0.029	7.060	4.786
$p r o p o s e d (τ = 0.1)$	1.000	1.000	-	1.000	-	1.000	1.000	0.000	-	-	-	-	-	3.060	3.658
$χ^{2} (2)$
$h i r N e t$	1.000	1.000	1.000	1.000	1.000	1.000	1.000	0.000	1.000	1.000	1.000	1.000	0.698	11.660	9.979
$R A M P$	1.000	1.000	1.000	1.000	1.000	1.000	1.000	0.375	1.000	1.000	1.000	1.000	0.646	11.660	5.843
$p r o p o s e d (τ = 0.1)$	1.000	1.000	1.000	1.000	-	-	1.000	0.000	-	-	-	-	-	3.000	4.913
$p r o p o s e d (τ = 0.25)$	1.000	1.000	1.000	1.000	1.000	1.000	1.000	0.000	1.000	1.000	1.000	1.000	0.056	7.120	5.702
$p r o p o s e d (τ = 0.50)$	1.000	1.000	1.000	1.000	1.000	1.000	1.000	0.000	1.000	1.000	1.000	1.000	0.029	7.060	5.777
$p r o p o s e d (τ = 0.75)$	1.000	1.000	1.000	1.000	1.000	1.000	1.000	0.000	1.000	1.000	1.000	1.000	0.056	7.120	5.787
$p r o p o s e d (τ = 0.9)$	0.940	0.940	-	1.000	-	1.000	0.980	0.000	-	-	-	-	-	3.100	6.053
$σ (x_{i}) e_{i}$
$h i r N e t$	1.000	1.000	1.000	1.000	1.000	1.000	1.000	0.000	1.000	1.000	1.000	1.000	0.696	11.660	6.282
$R A M P$	1.000	1.000	1.000	1.000	1.000	1.000	1.000	0.100	1.000	1.000	1.000	1.000	0.625	10.900	2.181
$p r o p o s e d (τ = 0.1)$	1.000	1.000	1.000	1.000	-	-	1.000	0.000	-	-	-	-	-	3.040	1.346
$p r o p o s e d (τ = 0.25)$	1.000	1.000	1.000	1.000	1.000	1.000	1.000	0.000	1.000	1.000	1.000	1.000	0.090	7.200	2.777
$p r o p o s e d (τ = 0.50)$	1.000	1.000	1.000	1.000	1.000	1.000	1.000	0.000	1.000	1.000	1.000	1.000	0.010	7.020	2.693
$p r o p o s e d (τ = 0.75)$	1.000	1.000	1.000	1.000	1.000	1.000	1.000	0.000	1.000	1.000	1.000	1.000	0.010	7.020	2.782
$p r o p o s e d (τ = 0.9)$	1.000	1.000	-	1.000	-	1.000	1.000	0.000	-	-	-	-	-	3.080	1.357

Table 3. Selection and estimation results for weak heredity with

n = 100

.

Table 3. Selection and estimation results for weak heredity with

n = 100

.

Method	Main								Inter
Method	$P_{m}$	$x_{1}$	$x_{2}$	$x_{3}$	mTPR	mFDR	$P_{i}$	$x_{1} x_{4}$	$x_{1} x_{5}$	iTPR	iFDR	Size	RMSE
$N (0, 1)$
$h i r N e t$	0.980	1.000	1.000	1.000	1.000	0.446	1.000	1.000	1.000	1.000	0.781	14.58	31.954
$R A M P$	0.060	1.000	0.260	0.240	0.500	0.050	1.000	1.000	1.000	1.000	0.619	7.140	67.840
$p r o p o s e d (τ = 0.1)$	1.000	1.000	1.000	1.000	1.000	0.047	-	-	-	-	-	3.500	1.067
$p r o p o s e d (τ = 0.25)$	0.340	1.000	0.580	0.480	0.686	0.507	0.720	1.000	0.720	0.860	0.505	7.860	48.135
$p r o p o s e d (τ = 0.50)$	0.320	1.000	0.520	0.500	0.673	0.504	0.740	1.000	0.740	0.870	0.462	7.480	42.688
$p r o p o s e d (τ = 0.75)$	0.340	1.000	0.520	0.520	0.680	0.507	0.780	1.000	0.780	0.890	0.476	7.780	41.641
$p r o p o s e d (τ = 0.9)$	1.000	1.000	1.000	1.000	1.000	0.062	-	-	-	-	-	4.000	1.058
$t (3)$
$h i r N e t$	1.000	1.000	1.000	1.000	1.000	0.450	1.000	1.000	1.000	1.000	0.788	14.920	32.742
$R A M P$	0.060	1.000	0.280	0.240	0.506	0.037	1.000	1.000	1.000	1.000	0.619	7.140	69.608
$p r o p o s e d (τ = 0.1)$	1.000	1.000	1.000	1.000	1.000	0.006	-	-	-	-	-	7.060	1.710
$p r o p o s e d (τ = 0.25)$	0.340	1.000	0.500	0.520	0.673	0.502	0.760	1.000	0.760	0.880	0.443	7.380	46.631
$p r o p o s e d (τ = 0.50)$	0.360	1.000	0.540	0.540	0.693	0.497	0.720	1.000	0.720	0.860	0.459	7.440	32.514
$p r o p o s e d (τ = 0.75)$	0.360	1.000	0.560	0.540	0.700	0.492	0.720	1.000	0.720	0.860	0.679	7.680	52.425
$p r o p o s e d (τ = 0.9)$	1.000	1.000	1.000	1.000	1.000	0.010	-	-	-	-	-	7.000	1.765
$χ^{2} (2)$
$h i r N e t$	0.980	1.000	1.000	1.000	1.000	0.468	1.000	1.000	1.000	1.000	0.794	15.360	33.515
$R A M P$	0.040	1.000	0.240	0.220	0.486	0.087	1.000	1.000	1.000	1.000	0.628	7.300	69.560
$p r o p o s e d (τ = 0.1)$	1.000	1.000	1.000	1.000	1.000	0.051	-	-	-	-	-	6.940	2.821
$p r o p o s e d (τ = 0.25)$	0.280	1.000	0.500	0.400	0.633	0.515	0.620	0.980	0.620	0.810	0.683	7.280	41.928
$p r o p o s e d (τ = 0.50)$	0.340	1.000	0.540	0.460	0.666	0.504	0.640	0.980	0.640	0.810	0.480	7.380	42.535
$p r o p o s e d (τ = 0.75)$	0.360	1.000	0.580	0.500	0.693	0.500	0.680	0.980	0.680	0.830	0.522	7.920	58.314
$p r o p o s e d (τ = 0.9)$	1.000	1.000	1.000	1.000	1.000	0.011	-	-	-	-	-	6.166	2.929
$σ (x_{i}) e_{i}$
$h i r N e t$	1.000	1.000	1.000	1.000	1.000	0.444	1.000	1.000	1.000	1.000	0.784	14.700	32.115
$R A M P$	0.000	0.300	0.380	0.120	0.266	0.259	1.000	1.000	0.300	0.650	0.821	8.360	13.755
$p r o p o s e d (τ = 0.1)$	1.000	1.000	1.000	1.000	1.000	0.124	-	-	-	-		7.775	0.681
$p r o p o s e d (τ = 0.25)$	0.333	1.000	0.562	0.500	0.687	0.505	0.750	1.000	0.750	0.875	0.478	7.687	24.665
$p r o p o s e d (τ = 0.50)$	0.354	1.000	0.604	0.479	0.694	0.502	0.750	1.000	0.750	0.875	0.481	7.750	26.570
$p r o p o s e d (τ = 0.75)$	0.354	1.000	0.625	0.520	0.715	0.500	0.791	1.000	0.791	0.895	0.500	8.083	29.956
$p r o p o s e d (τ = 0.9)$	1.000	1.000	1.000	1.000	1.000	0.013	-	-	-	-	-	6.560	0.760

Table 4. Selection and estimation results for weak heredity with

n = 300

.

Table 4. Selection and estimation results for weak heredity with

n = 300

.

Method	Main								Inter
Method	$P_{m}$	$x_{1}$	$x_{2}$	$x_{3}$	mTPR	mFDR	$P_{i}$	$x_{1} x_{4}$	$x_{1} x_{5}$	iTPR	iFDR	Size	RMSE
$N (0, 1)$
$h i r N e t$	1.000	1.000	1.000	1.000	1.000	0.401	0.160	0.900	0.160	0.530	0.047	19.600	32.574
$R A M P$	1.000	1.000	0.700	0.720	1.000	0.000	1.000	1.000	1.000	1.000	0.695	9.360	68.139
$p r o p o s e d (τ = 0.1)$	1.000	1.000	1.000	1.000	1.000	0.000	-	-	-	-	-	3.620	0.908
$p r o p o s e d (τ = 0.25)$	1.000	1.000	1.000	1.000	1.000	0.400	1.000	1.000	1.000	1.000	0.107	7.280	3.944
$p r o p o s e d (τ = 0.50)$	1.000	1.000	1.000	1.000	1.000	0.402	1.000	1.000	1.000	1.000	0.065	7.160	3.288
$p r o p o s e d (τ = 0.75)$	1.000	1.000	1.000	1.000	1.000	0.400	1.000	1.000	1.000	1.000	0.137	7.360	3.912
$p r o p o s e d (τ = 0.9)$	1.000	1.000	1.000	1.000	1.000	0.000	-	-	-	-	-	3.033	0.921
$t (3)$
$h i r N e t$	1.000	1.000	1.000	1.000	1.000	0.400	1.000	1.000	1.000	1.000	0.676	11.200	32.903
$R A M P$	1.000	1.000	0.700	0.680	1.000	0.000	1.000	1.000	1.000	1.000	0.696	9.360	70.609
$p r o p o s e d (τ = 0.1)$	1.000	1.000	1.000	1.000	1.000	0.000	-	-	-	-	-	3.320	3.572
$p r o p o s e d (τ = 0.25)$	1.000	1.000	1.000	1.000	1.000	0.400	1.000	1.000	1.000	1.000	0.074	7.180	5.371
$p r o p o s e d (τ = 0.50)$	1.000	1.000	1.000	1.000	1.000	0.400	1.000	1.000	1.000	1.000	0.056	7.160	5.568
$p r o p o s e d (τ = 0.75)$	1.000	1.000	1.000	1.000	1.000	0.400	0.980	1.000	0.980	0.990	0.091	7.200	5.145
$p r o p o s e d (τ = 0.9)$	1.000	1.000	1.000	1.000	1.000	0.000	-	-	-	-	-	3.200	3.560
$χ^{2} (2)$
$h i r N e t$	1.000	1.000	1.000	1.000	1.000	0.4000	1.000	1.000	1.000	1.000	0.679	11.260	34.494
$R A M P$	0.000	1.000	0.760	0.600	0.666	0.000	1.000	1.000	1.000	1.000	0.682	9.060	72.785
$p r o p o s e d (τ = 0.1)$	1.000	1.000	1.000	1.000	1.000	0.011	-	-	-	-	-	4.766	4.723
$p r o p o s e d (τ = 0.25)$	1.000	1.000	1.000	1.000	1.000	0.400	0.900	1.000	0.900	0.950	0.251	7.620	5.462
$p r o p o s e d (τ = 0.50)$	1.000	1.000	1.000	1.000	1.000	0.402	0.940	1.000	0.940	0.970	0.163	7.400	5.814
$p r o p o s e d (τ = 0.75)$	1.000	1.000	1.000	1.000	1.000	0.400	0.920	1.000	0.920	0.960	0.150	7.340	6.005
$p r o p o s e d (τ = 0.9)$	1.000	1.000	1.000	1.000	1.000	0.000	-	-	-	-	-	3.566	4.820
$σ (x_{i}) e_{i}$
$h i r N e t$	1.000	1.000	1.000	1.000	1.000	0.402	1.000	1.000	1.000	1.000	0.673	11.140	30.643
$R A M P$	0.840	1.000	0.920	0.900	0.940	0.241	1.000	1.000	0.860	0.930	0.845	15.860	7.298
$p r o p o s e d (τ = 0.1)$	1.000	1.000	1.000	1.000	1.000	0.000	-	-	-	-	-	3.440	1.055
$p r o p o s e d (τ = 0.25)$	1.000	1.000	1.000	1.000	1.000	0.400	1.000	1.000	1.000	1.000	0.010	7.020	3.950
$p r o p o s e d (τ = 0.50)$	1.00	1.000	1.000	1.000	1.000	0.400	1.000	1.000	1.000	1.000	0.029	7.060	3.948
$p r o p o s e d (τ = 0.75)$	1.000	1.000	1.000	1.000	1.000	0.400	1.000	1.000	1.000	1.000	0.029	7.060	3.948
$p r o p o s e d (τ = 0.9)$	1.000	1.000	1.000	1.000	1.000	0.000	-	-	-	-	-	3.166	1.052

Table 5. The covariates selected by the proposed methods.

Method	Covariates
RAMP	APQ11	HNR	RPDE	DFA	PPE
	HNR*RPDE	HNR*DFA	DFA*PPE
hirNet	PPQ5	APQ11	NHR	HNR	RPDE
	DFA	PPE	APQ11*PPE	APQ11*DFA	HNR*RPDE
	RPDE*PPE	DFA*PPE
Proposed method
$τ = 0.1$	HNR	RPDE	DFA	HNR*DFA
$τ = 0.25$	HNR	DFA	PPE	DFA*PPE
$τ = 0.5$	APQ11	HNR	RPDE	DFA	HNR*RPDE
$τ = 0.75$	HNR	DFA	PPE	HNR*DFA	HNR*PPE
$τ = 0.9$	dB	HNR

The symbol * represents the interaction between the two main effects.

Table 6. The average RMSE over the 100 data sets.

Method	RMSE	Size	Method	Size	RMSE	Method	$τ$	Size	RMSE
							$τ = 0.1$	3	0.891
							$τ = 0.25$	4.7	0.873
RAMP	0.878	9.5	hirNet	0.887	18.39	proposed method	$τ = 0.5$	4.9	1.00
							$τ = 0.75$	5.0	0.858
							$τ = 0.9$	2.3	0.971

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Bai, Y.; Jiang, J.; Tian, M. Variable Selection for Additive Quantile Regression with Nonlinear Interaction Structures. Mathematics 2025, 13, 1522. https://doi.org/10.3390/math13091522

AMA Style

Bai Y, Jiang J, Tian M. Variable Selection for Additive Quantile Regression with Nonlinear Interaction Structures. Mathematics. 2025; 13(9):1522. https://doi.org/10.3390/math13091522

Chicago/Turabian Style

Bai, Yongxin, Jiancheng Jiang, and Maozai Tian. 2025. "Variable Selection for Additive Quantile Regression with Nonlinear Interaction Structures" Mathematics 13, no. 9: 1522. https://doi.org/10.3390/math13091522

APA Style

Bai, Y., Jiang, J., & Tian, M. (2025). Variable Selection for Additive Quantile Regression with Nonlinear Interaction Structures. Mathematics, 13(9), 1522. https://doi.org/10.3390/math13091522

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Variable Selection for Additive Quantile Regression with Nonlinear Interaction Structures

Abstract

1. Introduction

2. Additive Quantile Regression with Nonlinear Interaction Structures

2.1. Oracle Estimator

2.2. Asymptotic Properties

3. Penalized Estimation for Additive Quantile Regression with Nonlinear Interaction Structures

3.1. Penalized Estimator

3.2. Algorithm

3.3. Asymptotic Theory

4. Simulation Studies

5. Applications

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Appendix A. Proof of the Theorem

Appendix A.1. Proof of Theorem 1

Appendix A.2. Proof of Theorem 2

Appendix A.3. Proof of Theorem 3

Appendix B. Estimated Results at τ = 0.1, 0.25, 0.75, 0.9 in Applications

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI