1. Introduction
Quantile regression [
1] has become a widely used tool in both empirical studies and the theoretical analysis of conditional quantile functions. Additive models offer greater flexibility by expressing linear predictors as the sum of nonparametric functions of each covariate, which generally results in lower variance compared to fully nonparametric models. However, many practical problems require the consideration of interactions between covariates, an aspect that has been extensively explored in linear and generalized linear models [
2,
3], where incorporating interactions has been shown to improve prediction accuracy. Despite this, there is limited research on additive quantile regression models that explicitly account for interaction structures, particularly in the context of variable selection.
While nonparametric additive quantile regression models have been extensively studied in the literature [
4,
5,
6,
7,
8], with recent advancements in the analysis of longitudinal data [
9] and dynamic component modeling [
10]. Specifically, Ref. [
9] explored a partially linear additive model for longitudinal data within the quantile regression framework, utilizing quadratic inference functions to account for within-subject correlations. In parallel, Ref. [
10] proposed a quantile additive model with dynamic component functions, introducing a penalization-based approach to identify non-dynamic components. Despite these advances, the integration of interaction effects remains relatively underexplored in the context of nonparametric additive quantile regression models. Building upon the existing literature, this paper considers an additive quantile regression model with a nonlinear interaction structure.
When
p, the number of main effect terms and interaction effect terms, is large, many additive methods are not feasible since their implementation requires storing and manipulating the entire
design matrix; thus, variable selection becomes necessary. Refs. [
11,
12] proposed component selection and smoothing operator (COSSO) and an adaptive version of the COSSO algorithm (ACOSSO) to fit the additive model with interaction structures, respectively. But these methods violated the heredity condition, whereas [
13] took the heredity constraint into account and proposed a penalty
on the empirical
norm of the main effects and interaction effects. This method shrinks the interactions, depending on the main effects that are already present in the model [
9,
10].
Several regularization penalties, especially Lasso, have been widely used to shrink the coefficients of main effects and interaction effects to achieve a sparse model. However, existing methods treated interaction terms and main terms similarly, which may select an interaction term but not the corresponding main terms, thus making them difficult to interpret in practice. There is a natural hierarchy among the variables in the model with interaction structures. Refs. [
14,
15] proposed a two-stage Lasso method to select important main and interaction terms. However, this approach is inefficient, as the solution path in the second stage heavily depends on the selection results from the first stage. To address this issue, several regularization methods [
16,
17,
18] have been proposed that employ special reparametrizations of regression coefficients and penalty functions under the hierarchical principle. Ref. [
19] introduced strong and weak heredity constraints into models with interactions. They proposed a Lasso penalty with convex constraints that produces sparse coefficients while ensuring that the strong or weak heredity is satisfied. However, existing algorithms often suffer from slow convergence, even with a moderate number of predictors. Ref. [
20] proposed a group-regularized estimation method under both strong and weak heredity constraints and developed a computational algorithm that guarantees the convergence of the iterates. Ref. [
21] extended the linear interaction model to a quadratic model, accounting for all two-term interactions, and proposed the Regularization Path Algorithm under the Marginality Principle (RAMP), which efficiently computes the regularization solution path while maintaining strong or weak heredity constraints.
Recent advances in Bayesian hierarchical modeling for interaction selection have been driven by innovative prior constructions. Ref. [
22] developed mixture priors that link interaction inclusion probabilities to main effect strengths, automatically enforcing heredity constraints. Ref. [
23] introduced structured shrinkage priors using hierarchical Laplace distributions to facilitate group-level interaction selection. Ref. [
24] incorporated heredity principles into SSVS frameworks through carefully designed spike-and-slab priors. These methodological developments have enhanced the ability to capture complex dependency structures in interaction selection while preserving Bayesian advantages, with demonstrated applications across various domains, including spatiotemporal analysis.
However, all these works assumed that the effects of all covariates can be captured in a simple linear form, which does not always hold in practice. To handle this issue, several authors have proposed nonparametric and semiparametric interaction model. For example, Refs. [
25,
26] considered the variable selection and estimation procedure of single-index models with interactions for functional data and longitudinal data. Some recent work on semiparametric single-index models with interactions included [
27,
28], among others.
Driven by both practical needs and theoretical considerations, this paper makes the following contributions: (1) We propose a method for variable selection in quantile regression models that incorporates nonlinear interaction structures; (2) We modify the RAMP algorithm to extend its applicability to additive models with nonlinear interactions, which enhances its capability to adeptly handle complex nonlinear interactions while also preserving the hierarchical relationship between main effects and interaction effects throughout the variable selection process. This work addresses the challenges posed by complex nonlinear interactions while ensuring that the inherent hierarchical structure is maintained, thereby providing a robust framework for more accurate and interpretable models.
The rest of this paper is organized as follows. In
Section 2, we present the additive quantile regression with nonlinear interaction structures and discuss the properties of the oracle estimator. In
Section 3, we present a sparsity smooth penalized method fitted by regularization algorithm under marginality principle and derive its oracle property. In
Section 4, simulation studies are provided to demonstrate the outperformance of the proposed method. We illustrate an application of the proposed method on a real dataset in
Section 5, and a conclusion is presented in
Section 6.
2. Additive Quantile Regression with Nonlinear Interaction Structures
Suppose that
is an independent and identically distributed (iid) sample, where
is a
p-dimensional vector of covariates. The
th (
) conditional quantile of
given
is defined as
, where
is the conditional distribution function of
given
. We consider the following additive nonlinear interaction model for the conditional quantile function:
where the unknown real-valued two-variate functions
represent the pairwise interaction terms between
and
. We assume that
for all
a and
b and that all
and
for all
i and
j. Let
; then,
satisfies
and we may also write
, where
is the random error. For identification, we assume the
quantile of
and
for
to be zero (see [
29]).
Let
denote the preselected orthonormal basis with respect to the Lebesgue measure on the unit interval. We center all the candidate functions, so the basis functions are centered as well. We approximate each function
by a linear combination of the basis functions, i.e.,
where
s are centered orthonormal bases. Let
; the main terms can be expressed as
, where
denotes the
-dimensional vector of basis coefficients for the
j-th main terms. We consider the tensor product of the interaction terms
as a surface that can be approximated by
To simplify the expression, it is computationally efficient to re-express the surface in vector notation as
, where
, and
, where
is the matrix with elements
. Here, ⊗ denotes the Kronecker product. We can now express (
1) as
2.1. Oracle Estimator
For high-dimensional inference, it is often assumed that
p is large, but only a small fraction of the main effects and interaction terms are present in the true model. Let
and
be the index set of nonzero main effects and interaction effects, and
and
be the cardinality of
and
, respectively. That is, we assume that the first
q of
are nonzero and the remaining components are zero. Hence, we can write
, where
. Further, we assume that the first
s of pairwise interaction
corresponding to
are nonzero and the remaining components are zero. Thus, we write
, where
. Here,
denotes the component of
with
or
. Let
be an
n by
design matrix with
, where
and
. Likewise, we write
, where
is the subvector consisting of the first
elements of
with the active covariates and
is the subvector consisting of the first
elements of
with the active interaction covariates. We first investigate the estimator, which is obtained when the index sets
and
are known in advance, which we refer to as the oracle estimator. Now, we consider the quantile regression with the oracle information. Let
The oracle estimators for
and
are
and
, respectively. Accordingly, the oracle estimators for nonparametric function
and
are
and
, respectively.
2.2. Asymptotic Properties
We next present the asymptotic properties of the oracle estimators. Similar to [
13], we assume that all true main effects belong to the Sobolev space of order two:
, and impose the same requirement on each true interaction function
and
for each
, with
C being some constant.
To establish the asymptotic properties of the estimators, the following regularity conditions are needed:
- (A1)
The distribution of is absolutely continuous with density , which is bounded away from 0;
- (A2)
The conditional distribution of the error , given , has a density , which satisfies the following two conditions:
- (1)
;
- (2)
There exists positive constants and such that ;
- (A3)
Basis function satisfies the following:
- (1)
;
- (2)
and are uniformly bounded from 0 to ∞;
- (3)
There is a vector such that , where and ;
- (A4)
for some .
It should be noticed that (A1) and (A2) are typical assumptions in nonparametric quantile regression (see [
30]). Condition (3) in (A3) requires that the true component function
and
can be uniformly well-approximated by the basis
and
. Finally, Condition (A4) is set to control the model size.
Theorem 1. Let . Assume that Conditions (A1)–(A4) hold. Then, the following are true:
- (a)
;
- (b)
where .
Theorem 1 summarizes the rate of convergence for the oracle estimator, which is the same as the optimal convergence rate in the additive model [
30] due to the merit of the additive structure. The proof of Theorem 1 can be found in
Appendix A.1.
3. Penalized Estimation for Additive Quantile Regression with Nonlinear Interaction Structures
3.1. Penalized Estimator
Model (2) contains a large number of main effects and interaction effects , which will significantly increase the computational burden, especially in cases where p (the number of features) is large. It is necessary to use penalty methods to ensure computational efficiency and prevent overfitting. Applying sparsity penalties (such as Lasso or grouped Lasso) to each can help to automatically select important variables and reduce the complexity of the model.
However, when using a large number of basis functions to fit complex relationships, a simple sparse penalty may oversimplify the model, ignore potential patterns in the data, and introduce unstable fluctuations. Therefore, in addition to sparse penalties, it is also necessary to introduce smoothness penalties (such as second-order derivative penalties) to avoid drastic fluctuations in the function and ensure smooth and stable fitting results. This method helps to control model complexity, reduce overfitting, and improve the model’s generalization ability.
Therefore, we minimize the following penalized objective function for
:
where
is a sparsity smooth penalty function, and the first and second terms in the penalty function penalize the main and interaction effects, respectively. We employ
as a sparsity penalty, which encourages sparsity at the function level, and
as a roughness penalty, which controls the model’s complexity by ensuring that the estimated function remains smooth while fitting the data, thus preventing high-frequency fluctuations caused by an excessive number of basis functions. The two tuning parameters
control the amount of penalization.
It can be written that
, where
is a band diagonal matrix of known coefficients and the
th element can be expressed as
—see [
31] for details. According to [
32], the penalty
can be represented as
where
and
, with
and
being the second derivatives of
with respect to
and
, respectively. Hence, the penalty function can be rewritten as
where
and
, with
denoting the
matrix with the
-th entry given by
, and
representing the
matrix with the
-th entry given by
. Similar to [
33], we decompose
and
for some quadratic
matrix
and
matrix
. Then, Model (
4) can be represented as
However, the above penalty treats the main and interaction effects similarly. That is, an entry of an interaction into the model generally adds more predictors than an entry of a main effect, which usually demands high computational cost and is hard to interpret for more interactions. To deal with this difficulty, we propose to add a set of heredity restrictions to produce sparse interaction models that an interaction only be included in a model if one or both variables are marginally important.
Naturally,
Our target is to estimate
S and
T from the data consistently or sign consistently, like Th1 in [
21]
There are two types of heredity restrictions, which iare called strong and weak hierarchy:
To ensure the strong and weak hierarchy conditions for
and
, we structure the following penalties to enforce the strong hierarchy:
and weak hierarcy:
These structures ensure hierarchy through the introduction of an indicative penalty. Specifically, the indicator function activates the penalty only when , thus enforcing the strong condition. This ensures that both corresponding main effects must be present if the interaction term is non-zero. Additionally, the term ensures that the penalty is applied to the smaller of the two norms, thereby enforcing the weaker condition. This means that at least one of the corresponding main effects must be present for an interaction term to be included.
3.2. Algorithm
The regularization path algorithm under the marginality principle (RAMP, [
21]) is a method for variable selection in high-dimensional quadratic regression models, designed to maintain the hierarchical structure between main effects and interaction effects during the selection process. However, the RAMP algorithm primarily focuses on linear interaction terms and considers interactions of variables with themselves, making it unsuitable for directly handling nonlinear interaction terms.
To address this limitation, we modified the RAMP algorithm, extending its application to additive models that include nonlinear interaction models. Through this enhancement, we can not only handle complex nonlinear interactions but also continue to preserve the hierarchical structure between main effects and interaction effects during variable selection. Specifically, the detailed steps of the modified algorithm are as follows.
Let and be the index set for main effects and interaction effects, respectively. For an index set , define , and .
We fix a sequence of values
between
and
. Following [
21], we set
and
with some small
. At step
, we denote the current active main effects set as
, and the interaction effects are set as
. Define
as the parent set of
, which contains the main effects that have at least one interaction effect in
. Set
. Then, the algorithm is as detailed below:
Step 1. Generate a decreasing sequence , and for .
We use the warm start strategy, where the solution for is used as a starting value for s;
Step 2. Given
, add the possible interactions among main effects in
to the current model. Then, we minimize the penalty loss function
where the penalty is imposed on the candidate interaction effects and
, which contains the main effects not enforced by the strong heredity constraint. The penalty encourages smoothness and sparsity in the estimated functional components;
Step 3. Record , and according to the above solution. Add the corresponding effects from into ;
Step 4. Calculate the quantile estimation based on the current model
Step 5. Repeat steps 2–4 S times and determine the active sets and according to , which minimizes GIC.
Ref. [
34] proposed the following GIC criterion:
where
and
are the penalized estimators obtained by step 4,
is the cardinality of the index set of nonzero coefficients in main effects and interaction effects,
represents the degrees of freedom of the model, and
is a sequence of positive constants diverging to infinity as
n increases. We take
in our simulation studies and real data analysis.
Following the same strategy, under the weak heredity condition, we use the set
instead of
and solve the following optimization problem:
with respect to
and
. That is, an interaction effect can enter the model for selection if at least one of its parents are selected in a previous step.
3.3. Asymptotic Theory
Let be the set of local minima of . The following result shows that, with probability approaching one, the oracle estimator belongs to the set .
Theorem 2. Assume that conditions (A1)–(A4) are satisfied and consider the penalized function under strong heredity. Let be the oracle estimator. If and as , then Remark 1. The penalty term decays to zero as , allowing the oracle estimator to dominate. The rate condition guarantees that the approximation error from basis expansion is controlled by the penalty strength.
Define
and
as the empirical risk and predictive risk of quantile loss, respectively. And
is a functional class. Let
denote the predictive oracle, i.e., the minimizer of the predictive risk over
, and
represent the minimization of Equation (6) over
. Following [
35], we say that an estimator
is persistent (risk consistent) relative to a class of functions
if
. Then, we have the following result.
Theorem 3. Under conditions (A1)–(A4), if , then for some constants , The above theorem establishes the convergence rate in terms of the excess risk of the estimator, which shows the prediction accuracy and consistency of the proposed estimator. The proof of Theorem 3 can be found in
Appendix A.3.
4. Simulation Studies
We conduct extensive Monte Carlo simulations to assess the performance of our proposed method in comparison with two established approaches, hirNet [
19] and RAMP [
21], across five specific quantile levels:
. In each simulation scenario, we generate 500 independent training datasets with two sample sizes,
and
, and
main effects, resulting in a total of
potential terms (15 main effects and 105 pairwise interactions). All methods are implemented using pseudo-spline basis functions to ensure consistent comparison. For regularization parameters, we adopt the relationship
based on [
13], which provides superior empirical performance.
The covariates
are generated independently from a uniform distribution
. We specify five nonlinear main effect functions that are standardized before model fitting:
Hence, each
is standardized and the interaction functions are generated by multiplying together the standardized main effects,
The response variables are generated through quantile-specific models that incorporate either strong or weak heredity constraints. Under strong heredity, interactions are only included when all parent main effects are active, while weak heredity requires just one parent main effect.
Example 1. Strong heredity. Building on the quantile-specific sparsity framework established by [36], we incorporate strong heredity constraints into our modeling approach. This integration ensures two key aspects: (1) the set of active predictors varies depending on the quantile, and (2) all selected variables preserve strict hierarchical relationships. The resulting nonlinear data-generating process is as follows: (Dense):
Example 2. Weak heredity. In contrast to strong heredity, we also incorporate weak heredity constraints into our modeling approach by extending the quantile-specific sparsity framework. The resulting nonlinear data-generating process is as follows:
(Dense):
To assess robustness across different error conditions, we consider four error distributions:
- 1.
Gaussian: The error terms follow a normal distribution with mean 0 and variance 1, i.e., ;
- 2.
Heavy-tailed: The error terms follow a Student’s t-distribution with 3 degrees of freedom, i.e., ;
- 3.
Skewed: The error terms follow a chi-squared distribution with 2 degrees of freedom, i.e., ;
- 4.
Heteroscedasticity [
37]: The error terms
are heteroscedastic and modeled as follows:
;
;
;
.
Recall that and are the active sets of main and interaction effects. For each example, we run Monte Carlo simulations and denote the estimated subsets as and , . We evaluate the performance on variable selection based on the following criteria:
- (1)
True positive rate and false positive rate of main effects (mTPR, mFDR);
- (2)
True positive rate and false positive rate of interaction effects (iTPR, iFDR);
- (3)
Main effects coverage percentage (): ;
- (4)
Interaction effects coverage percentage (): ;
- (5)
Coverage rate of a single variable in the main effects and interaction effects;
- (6)
Model size (size): ;
- (7)
Root mean squared error (RMSE): , where is obtained by the estimated coefficients and ; R is the number of iterations.
The results under the strong heredity for
and
are shown in
Table 1 and
Table 2, respectively. In the context of small sample data, the performance of the proposed method is a little worse than the other two methods in terms of true positive rates and coverage percentages of a single variable in the main and interaction effects. This is because the high complexity and degrees of freedom in nonparametric additive interaction models make them more prone to overfitting, leading to an inaccurate reflection of the underlying data structure and, consequently, a lower true positive rate (TPR) in variable selection. In contrast, methods like hirNet and RAMP, by adopting regularization and relatively simple linear structures, can more effectively prevent overfitting. Although this may come at the cost of some loss in model interpretability.
In large sample scenarios, our proposed method not only demonstrates higher true positive rates and broader coverage of main effects and interaction effects but also significantly reduces false positive rates and generates a more parsimonious model. These results indicate that the method can effectively handle nonlinear additive interaction models, accurately capturing complex relationships in the data while maintaining a low false discovery rate and higher model simplicity.
The results from weak heredity are shown in
Table 3 and
Table 4. In small sample cases, the hirNet method is effective at capturing both main and interaction effects but suffers from a high rate of false selections, leading to increased model complexity. On the other hand, the RAMP method selects fewer variables, reducing false selections but also under-selecting important variables, which negatively affects predictive accuracy. In contrast, the proposed method in this paper offers a better balance, with higher precision and stability in selecting both main and interaction effects, while significantly reducing false selections. This method results in a moderate model size and the lowest prediction error, demonstrating an optimal trade-off between model complexity and predictive performance.
In large sample cases, the proposed method excels in identifying both main and interaction effects, with the main effect true positive rate (mTPR) and interaction effect true positive rate (iTPR) approaching 1.000, indicating a nearly perfect identification of true positives. This method also maintains low mFDR and iFDR, minimizing false positives. In contrast, although the hirNet methods perform better in terms of mTPR and iTPR, they tend to include more noisy variables in the model, resulting in higher RMSE values and lower predictive accuracy.
From the perspective of quantile-specific sparsity, the proposed method demonstrates varying levels of sparsity across different quantiles (e.g.,
). At
and
quantiles, fewer variables are selected, indicating higher sparsity, while, at
, more variables and interaction effects are included, showing that the model adapts its selection based on the distribution of the response variable. Simulation results further show that, when the
, DGP consists only of main effects and our method achieves near-perfect variable selection: the TPR for main effects approaches 1 (all true effects retained) and the FDR is close to 0 (almost no false positives), leading to model sizes nearly identical to the true DGP and minimal prediction errors, as reflected in the low RMSE. These results are consistent with [
38]’s work, which provides theoretical support for this outcome. Moreover, the RMSE patterns suggest that the method maintains strong predictive performance even when more variables are selected, thereby confirming the effectiveness of the quantile-specific sparsity approach.
5. Applications
Parkinson’s disease (PD) is a common degenerative disease of the nervous system that leads to shaking, stiffness, and difficulty with walking, balance, and coordination. PD symptom monitoring with the Unified Parkinson’s Disease Rating Scale (UPDRS) is very important but costly and logistically inconvenient due to the requirement of patients’ presence in clinics and time-consuming physical examinations by trained medical staff. A new treatment technology based on the rapid and remote replication of UPDRS assessments with more than a dozen biomedical voice measurement indicators has been proposed recently. The dataset consists of 5875 UPDRS from patients with early-stage PD and 16 corresponding biomedical voice measurement indicators. Reference [
39] analyzed a PD dataset and mapped the clinically relevant properties from speech signals of PD patients to UPDRS using linear regression and nonlinear regression models, with the aim to verify the feasibility of frequent, remote, and accurate UPDRS tracking and effectiveness in telemonitoring frameworks that enable large-scale clinical trials into novel PD treatments. However, existing analyses of the UPDRS scores from the 16 voice measures with a model including only the main effects may fail to reflect the relationship between the UPDRS and 16 voice measures. For example, fundamental frequency and amplitude can work together on voice and thus may have an interactive effect on UPDRS. In this situation, it is necessary to consider the model with pairwise interactions. Furthermore, the effect on UPDRS from these indicators may be complicated and needs to be investigated further, including their nonlinear dependency, even nonparametric relation, so that both main effect and interactive effect may exist. By noting the typical skewness of the distribution of the UPDRS and the asymmetry of its
-quantile-level and (
)-quantile-level score
) in
Figure 1 (left), the distribution of the UPDRS tends to asymmetric and heavy-tailed, so that one should examine how any quantile, including the median (rather than the mean) of the conditional distribution, is affected by changes in the 16 voice measures and then make a reliable decision about the new treatment technology. Therefore, the proposed additive quantile regression with nonlinear interaction structure is applied for the analysis of the data.
The 16 biomedical voice measures in the PD data include several measurements of variation in fundamental frequency (Jitter, Jitter (Abs), Jitter:RAP, Jitter:PPQ5, and Jitter:DDP), several measurements of variation in amplitude (Shimmer, Shimmer (dB), Shimmer:APQ3, Shimmer:APQ5, Shimmer:APQ11, and Shimmer:DDA), two measurements of the ratio of noise to tonal components in the voice (NHR and HNR), a nonlinear dynamical complexity measurement (RPDE), signal fractal scaling exponent (DFA), and a nonlinear measure of fundamental frequency variation (PPE). Our interest is to identify the biomedical voice measurements and their interactions that may effectively affect the UPDRS (Unified Parkinson Disease Rating Scale) in Parkinson’s disease patients.
Figure 1 (right) shows the correlations among 16 biomedical voice measurements. As we can see from
Figure 1 (right), except for variable DFA, there are strong correlations among the covariates, which makes variable selection more challenging. Therefore, the proposed additive quantile regression model with nonlinear interaction structures may provide a complete picture on how those factors and their interactions affect the distribution of UPDRS.
We first randomly generated 100 partitions of the data into training and test sets. For each training set, we select 5475 observations as training set and the remaining 400 observations as the test set. Normalization is conducted for the dataset. Also, quantile regression is used to fit the model selected by the proposed method. For comparison, we also use least square to fit the model selected by hirNet and RAMP on the training set. Finally, we evaluate the performance of these models using the fixed active sets on the corresponding test sets.
Table 5 summarizes the covariates selected by different methods. The proposed quantile-specific method reveals distinct covariate patterns across different UPDRS severity levels. At lower quantiles (
and
), the model highlights DFA and its interactions (e.g., HNRDFA, DFAPPE), suggesting that these acoustic features may be particularly relevant in the early stages of the disease. The middle quantile (
) incorporates additional speech markers (APQ11, RPDE), which align with conventional methods, while higher quantiles (
and
) progressively simplify to core features (HNR, dB), indicating reduced covariate dependence in advanced stages. This dynamic selection demonstrates how quantile-specific modeling can capture stage-dependent biological mechanisms while maintaining model parsimony, selecting just 3–5 main effects and 1–2 interactions per quantile, compared to hitNet’s fixed 12-variable approach. The shifting importance of HNR and DFA interactions across quantiles particularly underscores their complex, nonlinear relationships with symptom progression.
Figure 2 and
Figure 3 show the estimation results of main effects and interaction effects at
. The solid lines represent the estimated effects, while the dashed lines indicate the
confidence intervals obtained through a simulation-based approach. Specifically, for the chosen model, we conducted 100 repeated simulations on the dataset to derive point-wise confidence intervals for each covariate’s effect.
From
Figure 2, it can be observed that, as APQ11 increases, the UPDRS score initially rises and then declines, peaking at a specific value. This inverted U-shaped relationship suggests that moderate amplitude variations are linked to more severe symptoms. For HNR, the UPDRS score decreases as the noise ratio increases, indicating that higher noise levels correlate with disease worsening. The confidence intervals around these trends confirm their statistical significance across the simulated datasets. RPDE shows a U-shaped curve, meaning that intermediate complexity levels are associated with more severe symptoms, while extreme values (high or low) indicate milder symptoms. The confidence intervals further support this nonlinear relationship, demonstrating its robustness to the data. DFA has a negative correlation with the UPDRS score: as fundamental frequency variability increases, the UPDRS score decreases, suggesting that greater variability is linked to symptom relief. The narrow confidence intervals around this trend highlight its consistency across the simulations.
Figure 3 illustrates the interaction effect between HNR and RPDE on the UPDRS score. This shows that, as HNR increases, the impact on the UPDRS score becomes more negative, especially when RPDE is at lower values, indicating a worsening in symptoms with higher noise to harmonic ratio. Conversely, for higher values of RPDE, the interaction effect stabilizes, suggesting less influence on the UPDRS score. The plot reveals a valley-like structure around the center where both HNR and RPDE are near zero, signifying minimal interaction effect. Notably, the most significant interactions occur at extreme values of either HNR or RPDE, highlighting the complex, nonlinear relationship between these variables and Parkinson’s disease symptom severity. The estimation results of the main effects and interaction effects at
are presented in
Appendix B.
From the results in
Appendix B, it can be seen that, although the main effects selected at
and
are the same, their impacts on UPDRS scores differ due to nonlinear relationships or complex interactions between variables. This highlights the advantage of quantile regression, which not only captures the average trend of the dependent variable but also reveals how these effects vary across different quantiles. This method provides a more comprehensive understanding of the complex relationships between variables and helps to analyze the specific effects of variables on UPDRS under different conditions.
From the perspective of quantile-specific sparsity, the proposed method demonstrates varying levels of covariate selection across different quantiles. At lower quantiles ( and ), the model tends to select fewer covariates, focusing primarily on HNR, DFA, and PPE, which suggests a higher degree of sparsity. As the quantile increases to , the model selects more covariates, including APQ11 and RPDE, indicating a reduction in sparsity. At higher quantiles ( and ), the model again shows increased sparsity by selecting only a few key covariates such as HNR, DFA, and PPE. This pattern reflects the adaptability of the proposed method in capturing the specific characteristics of the data distribution at different quantiles, thereby achieving optimal sparsity tailored to each quantile level.
Table 6 compares the average RMSE and model sizes across 100 datasets for three methods: the proposed quantile-based approach (evaluated at
), RAMP, and hirNet. The results show that our method outperforms the others in predictive accuracy, particularly at
, where it achieves the lowest RMSE of 0.858, outperforming RAMP (0.878) and hirNet (0.887). Additionally, the proposed method uses much smaller models, with only 2.3–5.0 variables, compared to hirNet’s larger model with 18.39 variables. This reduction in model size highlights the method’s efficiency without sacrificing predictive power. The method also performs consistently well across different quantiles, with a particularly strong result at
(RMSE = 0.873), though the RMSE at
is slightly higher (1.00), showing some variation in performance across quantiles. Furthermore, models that include interaction terms perform better than those with only main effects, emphasizing the importance of interaction effects in the model. In conclusion, the proposed method strikes an optimal balance between prediction accuracy, model simplicity, and computational efficiency, making it ideal for applications that require both interpretability and strong predictive performance.