Statistical Inference for High-Dimensional Heteroscedastic Partially Single-Index Models

Fang, Jianglin; Tian, Zhikun

doi:10.3390/e27090964

Open AccessArticle

Statistical Inference for High-Dimensional Heteroscedastic Partially Single-Index Models

by

Jianglin Fang

^* and

Zhikun Tian

College of Science, Hunan Institute of Engineering, Fuxing Road, Xiangtan 411104, China

^*

Author to whom correspondence should be addressed.

Entropy 2025, 27(9), 964; https://doi.org/10.3390/e27090964

Submission received: 6 August 2025 / Revised: 2 September 2025 / Accepted: 11 September 2025 / Published: 16 September 2025

(This article belongs to the Special Issue Statistical Inference: Theory and Methods)

Download

Browse Figure

Versions Notes

Abstract

In this study, we propose a novel penalized empirical likelihood approach that simultaneously performs parameter estimation and variable selection in heteroscedastic partially linear single-index models with a diverging number of parameters. It is rigorously proved that the proposed method possesses the oracle property: (i) with probability tending to 1, the zero components are consistently estimated as zero; (ii) the estimators for nonzero coefficients achieve asymptotic efficiency. Furthermore, the penalized empirical log-likelihood ratio statistic is shown to asymptotically follow a standard chi-squared distribution under the null hypothesis. This methodology can be naturally applied to pure partially linear models and single-index models in high-dimensional settings. Simulation studies and real-world data analysis are conducted to examine the properties of the presented approach.

Keywords:

partially linear single-index models; high-dimensional data; oracle property; penalized empirical likelihood

1. Introduction

Consider the following partially linear single-index model (PLSIM)

Y_{i} = θ^{⊤} X_{i} + g (Z_{i}^{⊤} γ) + ε_{i}, E (ε_{i} | X_{i}, Z_{i}) = 0 i = 1, \dots, n,

(1)

where

X_{i} \in R^{p}

and

Z_{i} \in R^{r}

are covariates,

g (\cdot)

denotes an unknown function,

Y_{i}

is a response variable,

θ \in R^{p}

and

γ \in R^{r}

are parameter vectors, and

ε_{i}

is an independent random error. Let

Var (ε_{i} | X_{i}, Z_{i}) = v (X_{i}, Z_{i}) > 0

, where the function

v (X, Z)

expresses potential heteroscedasticity. For model (1), the scale of the index parameter

γ

is generally not uniquely identifiable. That is, if the index parameter is multiplied by a nonzero constant while the nonparametric function is divided by the same constant, the model’s predictions remain unchanged. To ensure identifiability and avoid non-unique representations, it is standard practice to impose an identifiability constraint on the index parameter

γ

, for example, fixing one of its components to 1 or restricting it to have unit norm. We fix the first component of

γ

to 1 and denote the remaining components of Z as

Z_{- 1}

. Model (1) is a combination of a partially linear model (PLM) and single-index model (SIM). As far as we know, Carroll et al. [1] introduced PLSIMs and developed a backfitting algorithm for estimation in their generalized form. Since their introduction, they have attracted much research attention, and extensive research has focused on estimating

g (\cdot)

and the unknown parameter vectors

θ

and

γ

. For PLSIMs, Yu and Ruppert [2] showed a spline estimation approach via the penalty function; Zhu and Xue [3] proposed an empirical likelihood (EL) method; Xia and Härdle [4] introduced a semiparametric estimation procedure; Liang et al. [5] introduced an estimation method using profile least squares; etc.

In recent years, the analysis of high-dimensional data has evolved into a major frontier of statistical research, with applications spanning internet portals, hyperspectral imagery, financial applications, and high-throughput genomic data analysis within computational biology. See, e.g., Ma and Zhu [6], Fang et al. [7], Hao and Yin [8], Liu et al. [9], etc. Ma and Zhu [6] introduced efficient estimators for heteroscedastic PLSIMs allowing high-dimensional setting. While their methodology accommodated high-dimensional covariates, it restricted their dimensions to be fixed rather than allowing them to diverge as the sample size increases. Fang et al. [7] proposed EL estimators for high-dimensional PLSIMs, permitting the covariate dimension to diverge as the sample size increases.

In high-dimensional analyses, PLSIMs suffer from the inclusion of irrelevant covariates, leading to inefficient parameter estimators and reduced prediction accuracy. Pruning these non-informative variables from the sparse true model is thus a natural imperative. This has motivated the proliferation and rigorous study of various variable selection methods in contemporary statistics. Various variable selection methods have been investigated, including AIC and BIC (Breiman [10]), the LASSO penalty (Tibshirani [11]), the SCAD penalty (Fan and Li [12]), and so on. For PLSIMs, several variable selection methods were proposed. Xie and Huang [13] proposed a variable selection method for PLMs, which are special cases of PLSIMs, and proved that this method possesses the oracle property. Wang and Zhu [14] established nearly necessary and sufficient conditions for estimator consistency in SIMs under high-dimensional (“large p small n”) settings, which are special cases of PLSIMs. Zhang et al. [15] developed a method for variable selection and parameter estimation in high-dimensional PLSIMs. Lai et al. [16] studied variable selection for heteroscedastic PLSIMs by using the efficient score function. These methods, however, are largely constrained to scenarios where the covariate dimensions remain fixed.

Within nonparametric frameworks, Owen [17] developed the EL method for statistical inference. This approach retains likelihood methodology while eliminating parametric distributional assumptions. Since this method was proposed, it has been successfully extended to various circumstances, including linear models [18], generalized linear models [19], heteroscedastic PLMs [20], SIMs [21], and network data [22], among others. Chen et al. [23] demonstrated that EL remains valid when the data dimension diverges. In high-dimensional data settings, Tang and Leng [24] studied a variable selection method by penalized empirical likelihood (PEL) for linear regression models, and Leng and Tang [25] investigated a PEL method for general estimating equations. To our knowledge, applications to heteroscedastic PLSIMs have been scarcely explored, especially for variable selection in high-dimensional settings.

Empirical likelihood is a data-driven, nonparametric methodology that retains the merits of parametric likelihood while offering robustness and flexibility in incorporating auxiliary information to obtain estimates and construct confidence sets. Motivated by the PEL method for high-dimensional estimating equations in Tang and Leng [25], we aim to explore a variable selection approach using PEL for a PLSIM in a heteroscedastic high-dimensional situation, where dimensionality

p \to \infty

and

r \to \infty

, as

n \to \infty

. For model (1), the PEL ratio is constructed using semiparametric efficient estimating equations, incorporating the semiparametric efficient score for the heteroscedastic PLSIM. We prove that PEL has the oracle property and excels at generating sparse models without requiring a prespecified parametric likelihood. Although existing variable selection techniques (e.g., Lai et al. [16]) also attain the oracle property, specifying a high-dimensional distribution remains theoretically challenging. Furthermore, the PEL ratio statistic satisfies Wilks’ theorem, converging to a chi-squared distribution under some regularity conditions, which facilitates hypothesis testing and produces range-respecting confidence regions. As a robust alternative to parametric likelihood ratios in high-dimensional settings, PEL combines the adaptive ability and statistical efficiency inherent in nonparametric likelihood methods, complementing the existing methods.

The rest of this article is organized as follows. Section 2 outlines methods of variable selection, parameter estimation, and asymptotic properties for high-dimensional heteroscedastic PLSIMs using PEL. Section 3 extends the method to PLMs and SIMs as special examples. In Section 4, we exhibit simulation results, and an application of the proposed method is stated in Section 5. Lemmas and technical proofs are shown in Appendix A.

2. Penalized Empirical Likelihood for PLSIM

Denote the weight function as

w (X_{i}, Z_{i}) = {[E (ε_{i}^{2} ∣ X_{i}, Z_{i})]}^{- 1}

,

i = 1, \dots, n

. Using the kernel function

K_{h} (u) = h^{- 1} K (u / h)

with bandwidth

h \to 0

, the nonparametric estimators are defined as follows:

\begin{matrix} \hat{E} {\hat{w} (X, Z) | Z_{i}^{⊤} γ} & = & \frac{\sum_{i \neq j} K_{h_{3}} (Z_{i}^{⊤} γ - Z_{j}^{⊤} γ) \hat{w} (X_{i}, Z_{i})}{\sum_{i \neq j} K_{h_{3}} (Z_{i}^{⊤} γ - Z_{j}^{⊤} γ)}, \\ \hat{E} {\hat{w} (X, Z) Z_{- 1, i} | Z_{i}^{⊤} γ} & = & \frac{\sum_{i \neq j} K_{h_{3}} (Z_{i}^{⊤} γ - Z_{j}^{⊤} γ) \hat{w} (X_{i}, Z_{i}) Z_{- 1, i}}{\sum_{i \neq j} K_{h_{3}} (Z_{i}^{⊤} γ - Z_{j}^{⊤} γ)}, \\ \hat{E} {\hat{w} (X, Z) X | Z_{i}^{⊤} γ} & = & \frac{\sum_{i \neq j} K_{h_{3}} (Z_{i}^{⊤} γ - Z_{j}^{⊤} γ) \hat{w} (X_{i}, Z_{i}) X_{i}}{\sum_{i \neq j} K_{h_{3}} (Z_{i}^{⊤} γ - Z_{j}^{⊤} γ)}, \end{matrix}

\hat{w} (X_{i}, Z_{i}) = \sum_{i \neq j} K_{h_{2}} (η_{i} - η_{j}) / \sum_{i \neq j} K_{h_{2}} (η_{i} - η_{j}) e_{i}^{2},

\hat{g} (Z_{i}^{⊤} γ) = \sum_{i \neq j} K_{h_{1}} (Z_{i}^{⊤} γ - Z_{j}^{⊤} γ) (Y_{i} - X_{i}^{⊤} θ) / \sum_{i \neq j} K_{h_{1}} (Z_{i}^{⊤} γ - Z_{j}^{⊤} γ)

and

\begin{matrix} {\hat{g}}^{'} (Z_{i}^{⊤} γ) & = & h_{1}^{- 1} {\sum_{i \neq j} K_{h_{1}}^{'} (Z_{i}^{⊤} γ - Z_{j}^{⊤} γ) (Y_{i} - X_{i}^{⊤} θ) \sum_{i \neq j} K_{h_{1}} (Z_{i}^{⊤} γ - Z_{j}^{⊤} γ)} \\ - & \sum_{i \neq j} K_{h_{1}} (Z_{i}^{⊤} γ - Z_{j}^{⊤} γ) (Y_{i} - X_{i}^{⊤} θ) \\ \times & \sum_{i \neq j} K_{h_{1}}^{'} (Z_{i}^{⊤} γ - Z_{j}^{⊤} γ)} / {\sum_{i \neq j} K_{h_{1}} (Z_{i}^{⊤} γ - Z_{j}^{⊤} γ)}^{2} . \end{matrix}

We propose the following estimating equations for PLSIM:

\begin{matrix} \{\begin{matrix} \frac{1}{\sqrt{n}} \sum_{i = 1}^{n} {\tilde{ε}}_{i} \hat{w} (X_{i}, Z_{i}) [X_{i} - \frac{\hat{E} {\hat{w} (X, Z) X | Z_{i}^{⊤} γ}}{\hat{E} {\hat{w} (X, Z) | Z_{i}^{⊤} γ}}] = 0, \\ \frac{1}{\sqrt{n}} \sum_{i = 1}^{n} {\tilde{ε}}_{i} \hat{w} (X_{i}, Z_{i}) {\hat{g}}^{'} (Z_{i}^{⊤} γ) [Z_{- 1, i} - \frac{\hat{E} {\hat{w} (X, Z) Z_{- 1} | Z_{i}^{⊤} γ}}{\hat{E} {\hat{w} (X, Z) | Z_{i}^{⊤} γ}}] = 0, \end{matrix} \end{matrix}

(2)

where

{\hat{w}}_{i} \equiv \hat{w} (X_{i}, Z_{i})

and

{\tilde{ε}}_{i} = Y_{i} - θ^{⊤} X_{i} - \hat{g} (Z_{i}^{⊤} γ)

denotes the residual term for

i = 1, \dots, n

.

Attempting to estimate

var (ε ∣ X, Z)

or

w (X, Z)

by applying nonparametric regression to the residuals and the covariates (X, Z) poses a significant challenge, as this constitutes a high-dimensional problem that is highly susceptible to the curse of dimensionality. To simplify the estimation of

w (X, Z)

, we assume there exists a function

η_{i} = η (X_{i}, Z_{i})

satisfying

var (ε_{i} ∣ X_{i}, Z_{i}) = var (ε_{i} ∣ η_{i})

, where

η_{i}

is a known low-dimensional function of the covariates

(X_{i}, Z_{i})

,

i = 1, \dots, n

. For instance,

η

could take the form

θ^{⊤} X

, implying that the error variance depends solely on a linear combination of X. Alternatively,

η

could be

γ^{⊤} Z

, indicating dependence only on Z, or it could represent a combination of both or other structures. A similar assumption is also present in Ma and Zhu [6]. In practice, a reasonable approximation of

η

can be obtained using standard procedures for modeling heteroscedasticity, based on residuals from a preliminary model fit. It should be noted, however, that this assumption can be relaxed to incorporate intermediate multivariate frameworks, such as additive structures, thereby preserving univariate convergence rates while maintaining considerable flexibility in variance modeling.

Define

S_{e f f} = w ε {(X^{⊤} - \frac{E (w X^{⊤} | Z^{⊤} γ)}{E (w | Z^{⊤} γ)}, \{Z_{- 1}^{⊤} - \frac{E (w Z_{- 1}^{⊤} | Z^{⊤} γ)}{E (w | Z^{⊤} γ)}\} g^{'} (Z^{⊤} γ))}^{⊤} .

(3)

According to Ma and Zhu [6],

S_{e f f}

is the semiparametric efficient score, and the estimator

{({\hat{θ}}^{⊤}, {\hat{γ}}^{⊤})}^{⊤}

, based on (2), is doubly robust and efficient with fixed p and r. Double robustness means that a consistent estimator of the target parameter can be obtained as long as one of the two models is correctly specified. For example, we can use an estimator

\hat{g} (\cdot)

that is inconsistent for

g (\cdot)

; as long as the conditional expectation is consistently estimated, i.e.,

\hat{E} (\cdot ∣ Z^{⊤} γ)

converges to

E (\cdot ∣ Z^{⊤} γ)

, then expression (3) will yield consistent estimators for

θ

and

γ

. The converse also holds: if

\hat{g} (\cdot)

is consistent for

g (\cdot)

, then consistency of the estimators in (3) is maintained even if the model for the conditional expectation is misspecified. However, the doubly robust and efficient property of the estimator achieved by solving (2) is no longer valid when p and r tend to infinity as

n \to \infty

.

Our goal is, under high-dimensional sparse setting, to develop new estimation and variable selection approaches for heteroscedastic model (1) by using the PEL method. In order to construct the PEL function, we need to propose an auxiliary random vector using

S_{e f f}

. Define

ξ_{i} (θ, γ) = w_{i} ε_{i} {(X_{i}^{⊤} - \frac{E (w_{i} X_{i}^{⊤} | Z_{i}^{T} γ)}{E (w_{i} | Z_{i}^{⊤} γ)}, \{Z_{- 1, i}^{⊤} - \frac{E (w_{i} Z_{- 1, i}^{⊤} | Z_{i}^{⊤} γ)}{E (w_{i} | Z_{i}^{⊤} γ)}\} g^{'} (Z_{i}^{⊤} γ))}^{⊤} .

We have

E {ξ_{i} (θ, γ)} = 0

for

i = 1, \dots, n

. Let

q = (q_{1} \dots q_{n})

satisfying

\sum_{i = 1}^{n} q_{i} = 1

,

q_{i} \geq 0

. For

{(θ^{⊤}, γ^{⊤})}^{⊤}

, the EL function is written as

L (θ, γ) = sup {\prod_{i = 1}^{n} (n q_{i}) : \sum_{i = 1}^{n} q_{i} = 1, q_{i} \geq 0, \sum_{i = 1}^{n} q_{i} ξ_{i} (θ, γ) = 0} .

(4)

Since

L (θ, γ)

contains unknown functions, it cannot be directly used for statistical inference on

{(θ^{⊤}, γ^{⊤})}^{⊤}

. A natural approach to solving this issue is to substitute the unknown functions in

L (θ, γ)

with their corresponding estimator provided above. For

{(θ^{⊤}, γ^{⊤})}^{⊤}

, redefine the estimated EL function as

\tilde{L} (θ, γ) = sup {\prod_{i = 1}^{n} (n q_{i}) : \sum_{i = 1}^{n} q_{i} = 1, q_{i} \geq 0, \sum_{i = 1}^{n} q_{i} {\hat{ξ}}_{i} (θ, γ) = 0},

(5)

where

{\tilde{ε}}_{i} = Y_{i} - θ^{⊤} X_{i} - \hat{g} (Z_{i}^{⊤} γ)

and

{\hat{ξ}}_{i} (θ, γ) = {\hat{w}}_{i} {\tilde{ε}}_{i} {(X_{i}^{⊤} - \frac{\hat{E} ({\hat{w}}_{i} X_{i}^{⊤} | Z_{i}^{⊤} γ)}{\hat{E} ({\hat{w}}_{i} | Z_{i}^{⊤} γ)}, \{Z_{- 1, i}^{⊤} - \frac{\hat{E} ({\hat{w}}_{i} Z_{- 1, i}^{⊤} | Z_{i}^{⊤} γ)}{\hat{E} ({\hat{w}}_{i} | Z_{i}^{⊤} γ)}\} {\hat{g}}^{'} (Z_{i}^{⊤} γ))}^{⊤} .

Define the PEL estimator

{({\hat{θ}}^{⊤}, {\hat{γ}}^{⊤})}^{⊤}

as the maximizer of

log {\tilde{L} (θ, γ)} - n \sum_{i = 1}^{p} p_{τ} (| θ_{i} |) - n \sum_{i = 1}^{r} p_{ν} (| γ_{i} |),

(6)

where

p_{τ} (t)

and

p_{ν} (t)

are the penalty functions with tuning parameters

τ

and

ν

, respectively.

Many commonly used penalty functions have been studied, i.e., the

L_{1}

penalty (Donoho and Johnstone [26]);

L_{2}

penalty (Hoerl and Kennard [27]); LASSO penalty (Tibshirani [11]); and SCAD penalty (Fan and Li [12]). It is well known that the SCAD penalty has the oracle property. Therefore, in this article, we consider PEL for a heteroscedastic PLSIM by using the SCAD penalty. Its first derivative satisfies

p_{ν}^{'} (t) = ν {\frac{{(a ν - t)}_{+}}{(a - 1) ν} I (t > ν) + I (t \leq ν)},

where

I (\cdot)

denotes an indicator function and a is a constant with

a > 2

.

Combining the Lagrange multiplier method and Equation (5), we have

q_{i} = \frac{1}{n} \frac{1}{1 + λ^{⊤} {\hat{ξ}}_{i} (θ, γ)},

(7)

and

λ

satisfies

\frac{1}{n} \sum_{i = 1}^{n} \frac{{\hat{ξ}}_{i} (θ, γ)}{1 + λ^{⊤} {\hat{ξ}}_{i} (θ, γ)} = 0 .

(8)

By substituting Equation (7) into

\tilde{L} (θ, γ)

, we can show that maximizing (6) corresponds to minimizing

{\tilde{ℓ}}_{p} (θ, γ) = 2 \sum_{i = 1}^{n} log {1 + λ^{⊤} {\hat{ξ}}_{i} (θ, γ)} + n \sum_{i = 1}^{p} p_{τ} (| θ_{i} |) + n \sum_{i = 1}^{r} p_{ν} (| γ_{i} |) .

(9)

Therefore,

{({\hat{θ}}^{T}, {\hat{γ}}^{⊤})}^{⊤}

can also be defined to be the minimization of (9).

Let

A_{1} = {j : θ_{0 j} \neq 0}

and

A_{2} = {j : γ_{0 j} \neq 0}

, and denote the cardinalities of

A_{1}

and

A_{2}

as

d_{1}

and

d_{2}

, where

θ_{0}

and

γ_{0}

are the true values of

θ

and

γ

respectively. Without loss of generality, we write

θ = {(θ_{1}^{⊤}, θ_{2}^{⊤})}^{⊤}

, where

θ_{1} \in R^{d_{1}}

and

θ_{2} \in R^{p - d_{1}}

represent

θ

’s nonzero and zero components, respectively.

γ = {(γ_{1}^{⊤}, γ_{2}^{⊤})}^{⊤}

can be similarly partitioned, where

γ_{1} \in R^{d_{2}}

and

γ_{2} \in R^{r - d_{2}}

. Analogously, the true parameter values

θ_{0}

and

γ_{0}

can be decomposed as

θ_{0} = {(θ_{10}^{⊤}, 0)}^{⊤}

and

γ_{0} = {(γ_{10}^{⊤}, 0)}^{T}

. For notational purposes, let

I_{p} = {(H_{1}^{⊤}, H_{2}^{⊤})}^{⊤}

and

I_{r - 1} = {(H_{3}^{⊤}, H_{4}^{⊤})}^{⊤}

, where

H_{1} \in R^{d_{1} \times p}

,

H_{2} \in R^{(p - d_{1}) \times p}

,

H_{3} \in R^{d_{2} \times (r - 1)}

, and

H_{4} \in R^{(r - 1 - d_{2}) \times (r - 1)}

.

To derive asymptotic properties of the proposed PEL estimator, the following conditions are necessary.

Condition 1.

K_{h} (\cdot)

is symmetric with

K_{h}^{'} (\cdot)

continuous on

[- 1, 1]

.

Condition 2. The bandwidth

h_{i}

, for

i = 1, 2, 3

, satisfies the following asymptotic assumption: (1)

n h_{1}^{8} \to 0

,

n h_{1}^{4} \to \infty

,

h_{2} = O (1 / \sqrt[5]{n})

and

h_{3} = O (1 / \sqrt[5]{n})

; (2)

{log}^{2} (n) / (n h_{i}) \to 0

,

{log}^{4} (n) / (n h_{1} h_{i}) \to 0

and

h_{1}^{4} {log}^{2} (n) / h_{i} \to 0

.

Condition 3. Let

Var (X_{i}) = Σ_{x i}

and

Var (Z_{i}) = Σ_{z i}

,

i = 1 \dots n

. For

Σ_{x i}

and

Σ_{z i}

, their eigenvalues satisfy

C_{1} \leq Γ_{1} (Σ_{x i}) \leq \dots \leq Γ_{p} (Σ_{x i}) \leq C_{2}

and

C_{1} \leq Γ_{1} (Σ_{z i}) \leq \dots \leq Γ_{r} (Σ_{z i}) \leq C_{2}

for some constants

0 < C_{1} < C_{2}

,

i = 1 \dots n

. In addition,

E (ε^{4 + δ} | X, Z) < \infty

, where

δ

is a positive constant.

Condition 4. Let

v (\cdot)

and

η = η (X, Z)

satisfy

E (ε^{2} | X, Z) = v (η)

, where

0 < C_{1} < v (\cdot) < C_{2} < \infty

, and

C_{1}

and

C_{2}

are positive constants. Moreover,

Var (X_{i} | η (X_{i}, Z_{i}))

is positive definite with a bounded spectrum.

Condition 5. There exist

v_{1} (X, Z)

values satisfying

\frac{\partial^{2} E (X | Z^{⊤} γ)}{\partial γ_{i} \partial γ_{j}}, \frac{\partial^{2} E (Z | Z^{⊤} γ)}{\partial γ_{i} \partial γ_{j}}, \frac{\partial^{2} E (w | Z^{⊤} γ)}{\partial γ_{i} \partial γ_{j}}, \frac{\partial^{2} E (w Z | Z^{⊤} γ)}{\partial γ_{i} \partial γ_{j}},

\frac{\partial^{2} E (w X | Z^{⊤} γ)}{\partial γ_{i} \partial γ_{j}} < v_{1} (X, Z), E v_{1}^{2} < \infty, (i, j = 1, \dots, p) .

There also exist

v_{2} (X, Z)

values such that

\frac{\partial^{3} η (X, Z)}{\partial π_{i} \partial π_{j} \partial π_{k}} < v_{2} (X, Z), E v_{2}^{2} < \infty,

where

{(X^{⊤}, Z^{⊤})}^{⊤} = {(π_{1} \dots π_{p + r})}^{⊤}

and

i, j, k = 1, \dots, p + r

. There exist

v_{3} (X, Z)

values satisfying

\frac{\partial^{4} g (Z^{⊤} γ)}{\partial γ_{i} \partial γ_{j} \partial γ_{k} \partial γ_{l}}, \frac{\partial^{4} v (η)}{\partial η_{i_{1}} \partial η_{j_{1}} \partial η_{k_{1}} \partial η_{l_{1}}} < v_{3} (X, Z), E v_{3}^{2} < \infty,

where

η \in R^{p_{1}}

,

i, j, k, l = 1, \dots, p

and

i_{1}, j_{1}, k_{1}, l_{1} = 1 \dots p_{1}

.

Condition 6. Assume

η

and

Z^{⊤} γ

possess densities, denoted by

f_{η} (η)

and

f_{Z^{⊤} γ} (Z^{⊤} γ)

, respectively, which are bounded away from zero and infinity. There exist

v_{4} (X, Z)

values such that

\frac{\partial^{2} f_{Z^{⊤} γ} (Z^{⊤} γ))}{\partial γ_{i} \partial γ_{j}}, \frac{\partial^{2} f_{η} (η)}{\partial η_{k} \partial η_{l}} < v_{4} (X, Z), E v_{4}^{2} < \infty, (i, j = 1 \dots p; k, l = 1 \dots p_{1}) .

Condition 7. As

n \to \infty

, we assume

p \to \infty

and

p / \sqrt[5]{n} \to 0

, and

r \to \infty

and

r / \sqrt[5]{n} \to 0

.

Condition 8. All random elements X,

ε

,

ε X

, and

ε Z

have finite fourth moments.

Condition 9. Define

ξ_{n} (θ, γ) = w ε {(X^{⊤} - \frac{E (w X^{⊤} | Z^{⊤} γ)}{E (w | Z_{i}^{⊤} γ)}, \{Z^{⊤} - \frac{E (w Z^{⊤} | Z^{⊤} γ)}{E (w | Z^{⊤} γ)}\} g^{'} (Z^{⊤} γ))}^{⊤} .

As

n \to \infty

, we assume the following moments are uniformly bounded by a positive constant C:

E (∥ ξ_{n} (θ, γ) / \sqrt{p} ∥^{4}) < C

,

E (∥ Z X^{⊤} ∥^{4}) < C

,

E (∥ X X^{⊤} ∥^{4}) < C

. Furthermore, we assume

E (∥ X Z^{⊤} ∥^{4}) < \infty

.

Condition 10. As

n \to \infty

,

τ {(p / n)}^{\frac{1}{2}} \to \infty

,

ν {(r / n)}^{1 / 2} \to \infty

,

min_{j \in A_{1}} θ_{0 j} / τ \to \infty

, and

min_{j \in A_{2}} γ_{0 j} / ν \to \infty

.

Condition 11. Assume

max_{j \in A_{1}} P_{τ}^{'} (| θ_{0 j} |) = o {{(n p)}^{- 1 / 2}}

,

max_{j \in A_{2}} P_{ν}^{'} (| γ_{0 j} |) = o {{(n r)}^{- 1 / 2}}

,

max_{j \in A_{1}} P_{τ}^{″} (| θ_{0 j} |) = o {{(p)}^{- 1 / 2}}

, and

max_{j \in A_{2}} P_{ν}^{″} (| γ_{0 j} |) = o {{(r)}^{- 1 / 2}}

.

Conditions 1–6 support the existence of estimators

{({\hat{θ}}^{T}, {\hat{γ}}^{⊤})}^{⊤}

. These conditions also ensure that the functions

w (X, Z)

,

g (Z^{⊤} γ)

, and

g^{'} (Z^{⊤} γ)

and the conditional expectations

E {\hat{w} (X, Z) Z_{- 1, i} | Z_{i}^{⊤} γ}

,

E {\hat{w} (X, Z) X | Z_{i}^{⊤} γ}

, and

E {\hat{w} (X, Z) | Z_{i}^{⊤} γ}

can be estimated with maintained precision. Moreover, these conditions also guarantee that nonparametric estimation does not alter the asymptotic behavior of the empirical likelihood ratio. As a result, the estimated PEL ratio

\tilde{L} (θ, γ)

converges to the same asymptotic distribution as the standard PEL ratio

L (θ, γ)

. Conditions 1–6 were also used by Ma and Zhu [6] as sufficient conditions for the double-robustness property of the estimators. Condition 7 serves as a technical requirement. Since determining the minimum upper bound is quite challenging, this condition is necessary, and the resulting bounds in the stochastic analysis are conservative. Condition 8 guarantees that the asymptotic variance exists for the estimator of the increasing-dimensional parameters

{(θ^{T}, γ^{⊤})}^{⊤}

. Condition 9 restricts the tail probability behavior of the estimating equation. Condition 10 requires that the weakest signal remains stronger than the penalty parameter, and Condition 11 helps limit the influence of the penalty on the nonzero components. Conditions 10 and 11 are satisfied by a range of penalty functions, including those discussed in Fan and Li [12]. Due to the considerable theoretical challenges in establishing asymptotic properties for PEL methods in the context of diverging covariate dimensions, these conditions are intentionally stringent rather than minimally sufficient, and the resulting stochastic bounds are conservative.

In the following theorem, we will show the theoretical properties of the PEL estimator

{({\hat{θ}}^{⊤}, {\hat{γ}}^{⊤})}^{⊤}

.

Theorem 1.

As

n \to \infty

, under Conditions 1–11, we have

(1): ${\hat{θ}}_{2} = 0$ and ${\hat{γ}}_{2} = 0$ , with probability tending to 1;
(2): $\sqrt{n} B I_{B}^{- 1 / 2} {{({\hat{θ}}_{1}^{⊤}, {\hat{γ}}_{1}^{⊤})}^{⊤} - {(θ_{10}^{⊤}, γ_{10}^{⊤})}^{⊤}} \overset{L}{\to} N (0, G)$ , where $B \in R^{(q_{1} + q_{2}) \times (p + r - 1)}$ , $B B^{⊤} \to G$ and G is a $(q_{1} + q_{2}) \times (q_{1} + q_{2})$ matrix with fixed $q_{1}$ and $q_{2}$ ,

\begin{matrix} H_{0} = (\begin{matrix} H_{1} & 0 \\ 0 & H_{3} \end{matrix}), H = (\begin{matrix} H_{2} & 0 \\ 0 & H_{4} \end{matrix}), \end{matrix}

\begin{matrix} B = (\begin{matrix} B_{1} & 0 \\ 0 & B_{2} \end{matrix}), V = (\begin{matrix} V_{11} & V_{12} \\ V_{21} & V_{22} \end{matrix}), U = (\begin{matrix} U_{11} & U_{12} \\ U_{21} & U_{22} \end{matrix}) \end{matrix}

V_{11} = E \{w X X^{T} - \frac{E (w X | Z^{T} γ) E (w X^{T} | Z^{T} γ)}{E (w | Z^{T} γ)}\},

V_{12} = E [g^{'} (Z^{T} γ) \{w X Z_{- 1}^{T} - \frac{E (w X | Z^{T} γ) E (w Z_{- 1}^{T} | Z^{T} γ)}{E (w | Z^{T} γ)}\}],

V_{21} = E [g^{'} (Z^{T} γ) \{w Z_{- 1} X^{T} - \frac{E (w Z_{- 1} | Z^{T} γ) E (w X^{T} | Z^{T} γ)}{E (w | Z^{T} γ)}\}],

V_{22} = E [g^{'} {(Z^{T} γ)}^{2} \{w Z_{- 1} Z_{- 1}^{T} - \frac{E (w Z_{- 1} | Z^{T} γ) E (w Z_{- 1}^{T} | Z^{T} γ)}{E (w | Z^{T} γ)}\}],

U_{11} = E \{\frac{\partial ξ_{i} (θ, γ)}{\partial θ^{⊤}} V^{- 1} \frac{\partial ξ_{i} (θ, γ)}{\partial θ}\}, U_{12} = E \{\frac{\partial ξ_{i} (θ, γ)}{\partial θ^{⊤}} V^{- 1} \frac{\partial ξ_{i} (θ, γ)}{\partial γ}\},

U_{21} = E \{\frac{\partial ξ_{i} (θ, γ)}{\partial γ^{⊤}} V^{- 1} \frac{\partial ξ_{i} (θ, γ)}{\partial θ}\}, U_{12} = E \{\frac{\partial ξ_{i} (θ, γ)}{\partial γ^{⊤}} V^{- 1} \frac{\partial ξ_{i} (θ, γ)}{\partial γ}\},

I_{B} = H_{0} U^{- 1} V U^{- 1} H_{0}^{⊤} - H_{0} U^{- 1} H^{⊤} {(H U^{- 1} H^{⊤})}^{- 1} H_{2} U^{- 1} V A^{- 1} H_{2}^{⊤} {(H U^{- 1} H^{⊤})}^{- 1} H U^{- 1} H_{0}^{⊤}

, and

\overset{L}{\to}

denotes convergence in distribution.

In Theorem 1, B projects the diverging dimensional parameter vector

{(θ_{1}^{⊤}, γ_{1}^{⊤})}^{⊤}

onto a fixed

(q_{1} + q_{2})

-dimensional subspace.

Remark 1.

Theorem 1 proves that the proposed estimator satisfies the oracle property. Specifically, the components of

θ_{20}

and

γ_{20}

are estimated as zero, and the PEL estimator of the nonzero components

θ_{10}

and

γ_{10}

is efficient, with probability tending to 1.

Next, we will describe the construction of confidence regions and hypothesis testing for

{(θ^{⊤}, γ^{⊤})}^{⊤}

using the PEL method. Consider testing the following null and alternative hypotheses:

H_{0} : L_{n} {(θ_{0}^{⊤}, γ_{0}^{⊤})}^{⊤} = 0 vs . H_{1} : L_{n} {(θ_{0}^{⊤}, γ_{0}^{⊤})}^{⊤} \neq 0,

where

L_{n} \in R^{(q_{1} + q_{2}) \times (p + r - 1)}

satisfies that, for the fixed

q_{1}

and

q_{2}

,

\begin{matrix} L_{n} = (\begin{matrix} L_{n 1} & 0 \\ 0 & L_{n 2} \end{matrix}), L_{n} L_{n}^{⊤} = (\begin{matrix} I_{q_{1}} & 0 \\ 0 & I_{q_{2}} \end{matrix}), \end{matrix}

L_{n 1} \in R^{q_{1} \times p}

,

L_{n 2} \in R^{q_{2} \times r - 1}

, and

I_{q_{1}}

and

I_{q_{2}}

are the

q_{1}

-dimensional and

q_{2}

-dimensional identity matrixes, respectively. Not only can we use this type of hypothesis to test the hypothesis for individual and multiple components of

{(θ_{0}^{⊤}, γ_{0}^{⊤})}^{⊤}

, but we can also use it to test the hypotheses about linear functions of

{(θ_{0}^{⊤}, γ_{0}^{⊤})}^{⊤}

.

Similar to the EL ratio for the PLSIM in [3], we can construct the PEL ratio statistic as

\tilde{ℓ} (L_{n}) = - {{\tilde{ℓ}}_{p} (\hat{θ}, \hat{γ}) - min_{(θ, γ) : L_{n} {(θ^{⊤}, γ^{⊤})}^{⊤} = 0} {\tilde{ℓ}}_{p} (θ, γ)} .

(10)

The following theorem shows the properties of the PEL ratio statistic for model (1).

Theorem 2.

As

n \to \infty

, under the null hypothesis and Conditions 1–11, we have

\tilde{ℓ} (L_{n}) \overset{L}{\to} χ_{q_{1} + q_{2}}^{2} .

The standard PEL ratio, under some regular conditions, converges in law to a chi-square distribution. This is one of the most important properties of the PEL method, and similar conclusions can be found in Fang et al. [7] and Tang and Leng [24], among others. Theorem 2 shows that, under Conditions 1–11, the estimated PEL ratio

\tilde{L} (θ, γ)

converges to the same asymptotic distribution as the standard PEL ratio

L (θ, γ)

. This result provides a convenient approach for testing hypotheses and constructing data-driven confidence regions without any shape constraints. Combined with the oracle property of the PEL method established in Theorem 1, these findings demonstrate the robustness and efficiency of the PEL method for PLSIMs.

Confidence regions for

{(θ^{⊤}, γ^{⊤})}^{⊤}

can be constructed using Theorem 2; that is,

I_{α} = {{(θ^{⊤}, γ^{⊤})}^{⊤} : - {{\tilde{ℓ}}_{p} (\hat{θ}, \hat{γ}) - min_{(θ, γ) : L_{n} {(θ^{⊤}, γ^{⊤})}^{⊤} = 0} {\tilde{ℓ}}_{p} (θ, γ)} \leq χ_{q_{1} + q_{2}, (1 - α)}^{2}},

(11)

where

χ_{q_{1} + q_{2}, (1 - α)}^{2}

is the

1 - α

quantile of the

χ_{q_{1} + q_{2}}^{2}

distribution with

q_{1} + q_{2}

degrees of freedom. Here

I_{α}

provides an asymptotic confidence region for

{(θ^{⊤}, γ^{⊤})}^{⊤}

with confidence level

1 - α

, i.e., as

n \to \infty

,

P (L_{n} (θ, γ) \in I_{α}) \to 1 - α

.

Remark 2.

In this article, we develop a PEL method for simultaneous variable selection and parameter estimation in high-dimensional sparse heteroscedastic PLSIMs, and this requires

n > p

and

n > r

. When this is violated in practice, one can first adopt SIS, proposed by Fan and Lv [28], to reduce the dimensionality to a moderate level below sample size.

3. Penalized Empirical Likelihood for PLM and SIM

For two special cases of model (1), we develop PEL estimators for parameters

θ

and

γ

. If

γ = 1

, model (1) reduces to a heteroscedastic PLM, and it can be written as

Y_{i} = θ^{⊤} X_{i} + g (Z_{i}) + ε_{i}, f o r i = 1, \dots, n .

(12)

Consider the PEL method for the high-dimensional PLM. Redefine the EL function for

θ

and

{\hat{ξ}}_{1 i} (θ)

as

\tilde{L} (θ) = sup {\prod_{i = 1}^{n} (n q_{i}) : \sum_{i = 1}^{n} q_{i} = 1, q_{i} \geq 0, \sum_{i = 1}^{n} q_{i} {\hat{ξ}}_{1 i} (θ) = 0},

(13)

and

{\hat{ξ}}_{1 i} (θ) = {\hat{w}}_{i} \{Y_{i} - X_{i}^{T} θ - \hat{g} (Z_{i})\} \{X_{i} - \frac{\hat{E} ({\hat{w}}_{i} X_{i} | Z_{i})}{\hat{E} ({\hat{w}}_{i} | Z_{i})}\}, i = 1, \dots, n .

(14)

The PEL function for the model (12) can be written as

{\tilde{ℓ}}_{p} (θ) = 2 \sum_{i = 1}^{n} log {1 + λ^{⊤} {\hat{ξ}}_{1 i} (θ)} + n \sum_{i = 1}^{p} p_{τ} (| θ_{i} |) .

(15)

We state the similar results as follows.

Corollary 1.

As

n \to \infty

, under Conditions 1–11, we have

(1): ${\hat{θ}}_{2} = 0$ , with probability tending to 1;
(2): $\sqrt{n} B_{1} I^{- 1 / 2} {({\hat{θ}}_{1} - (θ_{10})} \overset{L}{\to} N (0, G)$ , where $B_{1} \in R^{q_{1} \times p}$ , $B_{1} B_{1}^{⊤} \to G_{1}$ , $G_{1}$ is a $q_{1} \times q_{1}$ matrix with fixed $q_{1}$ and $\overset{L}{\to}$ stands for convergence in distribution.

Consider testing the following null and alternative hypotheses:

H_{0} : L_{n 1} θ_{0} = 0 vs . H_{1} : L_{n 1} θ_{0} \neq 0,

where

L_{n 1} \in R^{q_{1} \times p}

satisfies that, for the fixed

q_{1}

,

L_{n 1} L_{n 1}^{⊤} = I_{q_{1}}

, where

I_{q_{1}}

is a

q_{1}

-dimensional identity matrix. The PEL ratio statistic can be constructed as follows:

\tilde{ℓ} (L_{n 1}) = - {{\tilde{ℓ}}_{p} (\hat{θ}) - min_{θ : L_{n 1} θ = 0} {\tilde{ℓ}}_{p} (θ)} .

(16)

Corollary 2.

As

n \to \infty

, under the null hypothesis and Conditions 1–11, we have

\tilde{ℓ} (L_{n 1}) \overset{L}{\to} χ_{q_{1}}^{2} .

Next, we consider the following SIM with a diverging number of parameters, which is another special case of model (1).

Y_{i} = g (Z_{i}^{⊤} γ) + ε_{i}, E (ε_{i} | X_{i}, Z_{i}) = 0 i = 1, \dots, n .

(17)

Redefine

{\hat{ξ}}_{2 i} (γ)

as

{\hat{ξ}}_{2 i} (γ) = {\hat{w}}_{i} (Y_{i} - \hat{g} (Z_{i}^{⊤} γ)) \{Z_{- 1, i} - \frac{\hat{E} ({\hat{w}}_{i} Z_{- 1, i} | Z_{i}^{⊤} γ)}{\hat{E} ({\hat{w}}_{i} | Z_{i}^{⊤} γ)}\} {\hat{g}}^{'} (Z_{i}^{⊤} γ), i = 1, \dots, n,

and rewrite the PEL ratio (9) as

{\tilde{ℓ}}_{p} (γ) = 2 \sum_{i = 1}^{n} log {1 + λ^{⊤} {\hat{ξ}}_{2 i} (γ)} + n \sum_{i = 1}^{r} p_{ν} (| γ_{i} |) .

(18)

Corollary 3.

As

n \to \infty

, nnder Conditions 1–11, we have

(1): ${\hat{γ}}_{2} = 0$ , with probability tending to 1;
(2): $\sqrt{n} B_{2} I^{- 1 / 2} {({\hat{γ}}_{1} - (γ_{10})} \overset{L}{\to} N (0, G)$ , where $B_{2} \in R^{q_{2} \times r - 1}$ , $B_{2} B_{2}^{⊤} \to G_{2}$ , and $G_{2}$ is a $q_{2} \times q_{2}$ matrix with fixed $q_{2}$ .

Consider testing the following null and alternative hypotheses:

H_{0} : L_{n 2} γ_{0} = 0 vs . H_{1} : L_{n 2} γ_{0} \neq 0,

where

L_{n 2} \in R^{q_{2} \times r}

satisfies that, for the fixed

q_{2}

,

L_{n 2} L_{n 2}^{⊤} = I_{q_{2}}

and

I_{q_{2}}

is a

q_{2}

-dimensional identity matrix. The PEL ratio statistic for

γ

can be written as follows:

\tilde{ℓ} (L_{n 2}) = - {{\tilde{ℓ}}_{p} (\hat{γ}) - min_{γ : L_{n 2} γ = 0} {\tilde{ℓ}}_{p} (γ)} .

(19)

Corollary 4.

As

n \to \infty

, under the null hypothesis and Conditions 1–11, we have

\tilde{ℓ} (L_{n 2}) \overset{L}{\to} χ_{q_{2}}^{2} .

4. Simulations

First, we describe how to solve the optimization problems introduced by the PEL. The minimizer of the PEL ratio is obtained through the local quadratic approximation algorithm. For the PEL estimator, we minimize (9) using a nested optimization algorithm. The following steps outline the nested algorithm for calculating the PEL estimator by minimizing Equation (9).

Step 1: We use the estimation procedure (a relatively simple but inefficient estimation method) described in Section 2 of Ma and Zhu [6] to obtain an initial estimator ${(θ^{0 ⊤}, γ^{0 ⊤})}^{⊤}$ .
Step 2: Obtain $\hat{g} (Z_{i}^{⊤} γ)$ , ${\hat{g}}^{'} (Z_{i}^{⊤} γ)$ , $\hat{w} (X_{i}, Z_{i})$ , $\hat{E} {\hat{w} (X, Z) | Z_{i}^{⊤} γ}$ , $\hat{E} {\hat{w} (X, Z) X | Z_{i}^{⊤} γ}$ , and $\hat{E} {\hat{w} (X, Z) Z_{- 1} | Z_{i}^{⊤} γ}$ described above using fixed values of ${(θ^{⊤}, γ^{⊤})}^{⊤}$ .
Step 3: Obtain the auxiliary random vector ${\hat{ξ}}_{i} (θ, γ)$ .
Step 4: Use Newton’s method to minimize (9) with respect to $λ$ for fixed values of ${(θ^{⊤}, γ^{⊤})}^{⊤}$ .
Step 5: Use the local quadratic approximation algorithm to minimize (9) with respect to ${(θ^{⊤}, γ^{⊤})}^{⊤}$ for fixed values of $λ$ obtained from Step 4.
Step 6: Iterate Steps 4 and 5 until ${(θ^{⊤}, γ^{⊤})}^{⊤}$ converges.

Assume that

{(θ^{0 ⊤}, γ^{0 ⊤})}^{⊤}

is an initial value of

{(θ^{⊤}, γ^{⊤})}^{⊤}

, and

θ_{j}^{(k)}

and

γ_{l}^{(k)}

are the k-th step estimators of

θ_{j}

and

γ_{l}

, respectively. When

θ_{j}^{(k)}

(

| θ_{j}^{(k)} | < ς

) or

γ_{l}^{(k)}

(where

| γ_{l}^{(k)} | < ς

) are very close to 0, we set

{\hat{θ}}_{j}^{(k)} = 0

or

{\hat{γ}}_{l}^{(k)} = 0

, where

ς

is a predefined small positive tolerance. If

θ_{j}^{(k)} \neq 0

,

p_{τ} (| θ_{j} |)

can be locally approximated by

p_{τ} (| θ_{j}^{(k)} |) + \frac{1}{2} {p_{τ}^{'} (| θ_{j}^{(k)} |) / | θ_{j}^{(k)} |} {θ_{j}^{2} - {(θ_{j}^{(k)})}^{2}}

. Similarly, we can use

p_{ν} (| γ_{l}^{(k)} |) + \frac{1}{2} {p_{ν}^{'} (| γ_{l}^{(k)} |) / | γ_{l}^{(k)} |} {γ_{l}^{2} - {(γ_{l}^{(k)})}^{2}}

to approximate

p_{ν} (| γ_{l} |)

when

γ_{l}^{(k)} \neq 0

. These procedures are repeated until

∥ {(θ^{(j + 1) ⊤}, γ^{(j + 1) ⊤})}^{⊤} - {(θ^{j ⊤}, γ^{j ⊤})}^{⊤} ∥ < ς_{1}

, where

ς_{1}

is a very small positive number.

Next, we present simulation studies to illustrate the properties of the PEL inference for a heteroscedastic PLSIM.

Example 1.

For the PLSIM, we generated

X_{1}

from a Poisson distribution with parameter 2,

X_{p}

from a binomial distribution with a success probability 0.6,

X_{j}

from the uniform distribution

U (0, 1)

for

j = 2, \dots, (p - 1)

, and

Z_{k}

from the normal distribution with mean 0 and variance 1. Using

{(X^{⊤}, Z^{⊤})}^{⊤}

, we generated responses from

Y \sim N (X^{T} θ + exp (Z^{T} γ), Var (Y) = | Z^{T} γ |)

. Let

θ = {(2, \dots, 1, 0)}^{⊤}

and

γ = {(1, 1.5, - 2, \dots, 0)}^{⊤}

. We consider dimensions

p = 10, 20

and

r = 10, 20

, and sample sizes

n = 50, 100

, and 200, respectively. We applied the cross-validation method to select the penalty parameters τ and ν. In order to compare the influence of the kernel function, we consider using the cosine kernel, defined as

\frac{π}{4} cos (\frac{π t}{2}) \cdot I (t \leq 1)

, and the Epanechnikov kernel, given by

K (t) = \frac{3}{4} {(1 - t^{2})}_{+}

, respectively. In accordance with Condition 2, the bandwidth was set to

n^{- 1 / 5}

, resulting in

h \approx 0.45

at

n = 50

,

h \approx 0.4

at

n = 100

, and

h \approx 0.35

at

n = 200

. Furthermore, to examine the robustness of the bandwidth selection, a grid search algorithm was also employed. For each of these settings, we repeated the simulation 500 times, and the results are reported in Table 1 and Table 2.

From Table 1 and Table 2, we observe that (1) for fixed p and n, as the sample size increases, the accuracy of variable selection improves and the standard deviation of the estimation decreases; (2) the choice of kernel function has a relatively minor influence on the results. Overall, the Epanechnikov kernel performs slightly better in estimation than the cosine kernel.

Example 2.

To consider the performance of the presented method for dependent covariates, we generated predictors by

{(X, Z)}^{T} \sim N (0, Σ)

, where

σ_{i j} = 0 . 3^{| i - j |}

, and generated responses by

Y \sim N (X^{T} θ + exp (Z^{T} γ), Var (Y) = | Z^{T} γ |)

. Let

θ = {(1, \dots, - 1, 0)}^{⊤}

and

γ = {(1, 1, 2, 1, \dots, 0)}^{⊤}

. We consider dimensions

p = 20, 30

and

r = 20, 30

, and sample sizes of

n = 200, 400

, respectively. We applied the Epanechnikov kernel functions

K (t) = \frac{3}{4} {(1 - t^{2})}_{+}

, and applied the cross-validation method to select the penalty parameters τ and ν. According to Condition 2, the bandwidth was set to be

n^{- 1 / 5}

, which means that

h \approx 0.35

when

n = 200

, and

h \approx 0.3

when

n = 400

. Because Lai et al. [16] also studied parameter estimation and variable selection for a heteroscedastic PLSIM, we computed their estimator (PVS) in this simulation study for the purpose of comparison. For each of these settings, we repeated the simulation 500 times, and the results are reported in Table 3 and Table 4.

From Table 3 and Table 4, it can be observed that (1) both estimators (PEL and PVS) yield estimates close to the true parameter values, with PEL exhibiting slightly smaller standard deviations than PVS; (2) in terms of variable selection, PEL produces, on average, fewer false zeros than PVS. Furthermore, the PEL method is a nonparametric methodology that retains the advantages of parametric likelihood while possessing double robustness. In contrast, the PVS method is a semiparametric efficient method and does not possess double robustness. For instance, when the model is misspecified, the performance of the PVS estimator is adversely affected. Thus, the proposed PEL method demonstrates favourable performance and outperforms the PVS method.

5. Real Data Application

We will demonstrate the proposed methodology through application of a PLSIM to the AIDS Clinical Trials Group Protocol 175 (ACTG175) dataset (Hammer et al. [29]; https://www.nejm.org/doi/full/10.1056/NEJM199610103351501#tab-contributors (accessed on 5 August 2025)), previously examined by Lai and Wang [30]. The CD4 glycoprotein functions as an essential T-cell receptor (TCR) coreceptor that facilitates antigen-presenting cell interactions, establishing CD4 cell count as the primary immunological endpoint for comparing antiretroviral treatment effects during predefined observation periods in HIV clinical research. ACTG175 evaluates four distinct antiretroviral regimens: didanosine (ddI), zidovudine (ZDV) monotherapy, ZDV+ddI, and ZDV+zalcitabine, utilizing a balanced randomization design to assign 2138 eligible participants across therapeutic arms. The trial results demonstrate that structured antiretroviral interventions effectively reduce disease progression risks among clinically asymptomatic individuals with intermediate-stage HIV infection.

We aimed to construct a PLSIM to analyze subject responses under zidovudine (ZDV) monotherapy. Our analysis utilizes a curated subset of the ACTG175 cohort comprising 320 patients with complete CD4 endpoint data, derived from an initial pool of 521 subjects exhibiting baseline CD4 counts between 200–500 cells/mm³. The response variable Y (CD496) quantifies CD4 cell counts at

96 \pm 5

weeks post-treatment. Predictors include the following:

Linear component: Discrete covariates

x_{1}

(drugs (history of IV drug use (0 = no, 1 = yes))),

x_{2}

(str2 (antiretroviral history (0 = naive, 1 = experienced))),

x_{3}

(gender (0 = F, 1 = M)),

x_{4}

(symptom (symptomatic indicator (0 = asymp, 1 = symp))),

x_{5}

(race (0 = White, 1 = non-white)),

x_{6}

(hemo (hemophilia (0 = no, 1 = yes))),

x_{7}

(homo (homosexual activity (0 = no, 1 = yes))), and

x_{8}

(karnof (Karnofsky score (on a scale of 0–100))).

Single-index component: Continuous covariates

z_{1}

(CD80 (baseline CD8 count)),

z_{2}

(CD820 (CD8 count at 20 ± 5 weeks)),

z_{3}

(CD420 (CD4 count at 20 ± 5 weeks)),

z_{4}

(CD40 (baseline CD4 count)),

z_{5}

(wtkg (weight)), and

z_{6}

(age (age (yrs) at baseline)).

After standardizing Y, we specify the following heteroscedastic PLSIM:

Y = θ^{⊤} X + g (Z^{⊤} γ) + ε,

where

X^{⊤} = (x_{1}, \dots, x_{8})

and

Z^{⊤} = (z_{1}, \dots, z_{6})

. The test for heteroscedasticity confirms that the model is homoscedastic. We applied the Epanechnikov kernel function in this data analysis. According to Condition 2, the bandwidth was set to

n^{- 1 / 5}

, which means that

h \approx 0.32

. We required

γ = {(γ_{1}, \dots, γ_{6})}^{⊤}

to have unit length to ensure identifiability. We compared our results to the PVS method in [16], and the results are summarized in Table 5, with residual plots shown in Figure 1. The residual sums of squares estimated by PEL and PVS were 186 and 194, respectively. From these results, homosexuality exhibits a significant positive linear association with Y. The estimated coefficient for antiretroviral therapy (str2) is negative, indicating a beneficial effect of this treatment for asymptomatic patients with HIV. CD820 shows a negative nonlinear relationship with Y, while age, CD40, and CD420 are positively associated with Y. These factors play important roles in antiretroviral regimens. Moreover, both methods yield similar results, but the approach in [16] produces a slightly larger standard error.

Author Contributions

J.F. proposed the original research problem and Z.T. generated the associated numerical computations. All authors have read and agreed to the published version of the manuscript.

Funding

This work is supported by Provincial Natural Science Foundation of Hunan (Grant No. 2023JJ30187) and the Scientific Research Fund of Hunan Provincial Education Department (Grant No. 24A0518).

Data Availability Statement

Data are contained within the article.

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A

Let

D_{n} = {{(θ^{⊤}, γ^{⊤})}^{⊤} : ∥ {(θ^{⊤}, γ^{⊤})}^{⊤} - {(θ_{0}^{⊤}, γ_{0}^{⊤})}^{⊤} ∥ \leq c a_{n}}

with a positive constant C,

a_{n} = O_{p} {{(p / n)}^{1 / 2}}

, and let

∥ A ∥ = {t r (A^{⊤} A)}^{\frac{1}{2}}

denote the Frobenius norm for A.

We present some lemmas before proving Theorems as follows.

Lemma A1.

Let

∥ \tilde{θ} - θ ∥ = O_{p} (a_{n})

and

∥ \tilde{γ} - γ ∥ = O_{p} (a_{n})

. Under Conditions 1–9, we have

sup_{η \in R} | \hat{w} (η) - w (η) | = O_{p} \{h_{2}^{2} + log n {(p / n)}^{1 / 2} h_{2}^{- 1 / 2}\},

where

\hat{w} (η) = \sum_{i = 1}^{n} K_{h_{2}} (η_{i} - η) / \sum_{i = 1}^{n} K_{h_{2}} (η_{i} - η) e_{i}^{2},

e_{i} = Y_{i} - {\tilde{θ}}^{⊤} X_{i} - \tilde{g} (Z_{i}^{⊤} \tilde{γ}) = {(θ - \tilde{θ})}^{⊤} X_{i} + \tilde{g} (Z_{i}^{⊤} γ) - \tilde{g} (Z_{i}^{⊤} \tilde{γ}) + g (Z_{i}^{⊤} γ) - \tilde{g} (Z_{i}^{⊤} γ) + ε_{i}

and

\tilde{g} (Z_{i}^{⊤} γ) = {(n - 1)}^{- 1} \sum_{j \neq i} K_{h} (Z_{j}^{⊤} γ - Z_{i}^{⊤} γ) Y_{i} .

Proof of Lemma A1.

According to Lemma 3.1 and Lemma 3.3 in Zhu and Fang [31], we have

sup_{(Z_{i}^{⊤} γ) \in R} |g (Z_{i}^{⊤} γ) - \tilde{g} (Z_{i}^{⊤} γ)| = O_{p} \{h_{2}^{2} + {(p / n)}^{1 / 2} h_{2}^{- 1 / 2} log n\}

(A1)

and

sup_{η \in R} |\frac{\sum_{i = 1}^{n} K_{h_{2}} (η_{i} - η) ε_{i}^{2}}{\sum_{i = 1}^{n} K_{h_{2}} (η_{i} - η)} - \frac{1}{w (η)}| = O_{p} \{h_{2}^{2} + {(p / n)}^{1 / 2} h_{2}^{- 1 / 2} log n\} .

(A2)

Combining (20) and (21), we can obtain that

\begin{matrix} sup_{η \in R} & |\sum_{i = 1}^{n} K_{h_{2}} (η_{i} - η) {\{g (Z_{i}^{⊤} γ) - \tilde{g} (Z_{i}^{⊤} γ)\}}^{2} / \sum_{i = 1}^{n} K_{h_{2}} (η_{i} - η)| \\ = O_{p} \{h_{2}^{4} + \frac{p {log}^{2} n}{(n h_{2})}\} . \end{matrix}

(A3)

Because of

∥ \tilde{θ} - θ ∥ = O_{p} (a_{n})

and

E (X_{i}^{2} | η_{i}) < \infty

, we have

sup_{η \in R} |\sum_{i = 1}^{n} K_{h_{2}} (η_{i} - η) {\{X_{i}^{⊤} (\tilde{θ} - θ)\}}^{2} / \sum_{i = 1}^{n} K_{h_{2}} (η_{i} - η)| = O_{p} (p / n) .

(A4)

Using Lemma 1 of Ma and Zhu [6], we obtain that

sup_{Z_{i} \in R^{p}} sup_{{\tilde{γ} : ∥ \tilde{γ} - γ ∥ \leq a_{n}}} |\{\tilde{g} (Z_{i}^{⊤} γ) - \tilde{g} (Z_{i}^{⊤} \tilde{γ})\} - E \{\tilde{g} (Z_{i}^{⊤} γ) - \tilde{g} (Z_{i}^{⊤} \tilde{γ})\}| = o_{p} (a_{n}) .

(A5)

By (24) and

∥ \tilde{γ} - γ ∥ = O_{p} (a_{n})

, together with Taylor’s expansion, we can show that

\begin{matrix} sup_{η \in R} & |\sum_{i = 1}^{n} K_{h_{2}} (η_{i} - η) {\{\tilde{g} (Z_{i}^{⊤} γ) - \tilde{g} (Z_{i}^{⊤} \tilde{γ})\}}^{2} / \sum_{i = 1}^{n} K_{h_{2}} (η_{i} - η)| \\ = O_{p} \{h_{2}^{4} + \frac{p {log}^{2} n}{(n h_{2})}\} . \end{matrix}

(A6)

Obviously,

n^{- 1 / 2} \sum_{i = 1}^{n} \tilde{g} (X_{i}) ε_{i}

can be written as

n^{- 1 / 2} \sum_{i = 1}^{n} \tilde{g} (X_{i}) ε_{i} = n^{- 3 / 2} \sum_{i \neq j}^{n} K_{h} (X_{i} - X_{j}) (ε_{i} Y_{j} - ε_{j} Y_{i}) .

Let

r (X) = E (Y | X)

. According to Serfling [32],

n^{- 3 / 2} \sum_{i \neq j}^{n} K_{h} (X_{i} - X_{j}) (ε_{i} Y_{j} - ε_{j} Y_{i})

can be approximated by its projection, which means that

|n^{- 1 / 2} \sum_{i = 1}^{n} \tilde{g} (X_{i}) ε_{i} - n^{- 1 / 2} \sum_{i = 1}^{n} ε_{i} E \{K_{h} (X_{i} - X_{j}) r (X_{i}) | X_{i}\}| = o_{p} (n^{- 1} p log (n)) .

Therefore, we have

n^{- 1 / 2} \sum_{i = 1}^{n} ε_{i} \{\tilde{g} (X_{i}) - r (X_{i}) f (X_{i})\} = o_{p} (n^{- 1} p log (n)),

(A7)

where

f (X)

is the density function of X. By combining (20)–(26), we have

sup_{η \in R} | \hat{w} (η) - w (η) | = O_{p} \{h_{2}^{2} + log n {(p / n)}^{1 / 2} h_{2}^{- 1 / 2}\} .

Thus, Lemma A1 holds. □

Lemma A2.

Under Conditions 1–9, we have

(1): $\frac{1}{\sqrt{n}} \sum_{i = 1}^{n} {\hat{ξ}}_{i} (θ, γ) = \frac{1}{\sqrt{n}} \sum_{i = 1}^{n} ξ_{i} (θ, γ) + o_{p} (1);$
(2): $\frac{1}{n} \sum_{i = 1}^{n} {\hat{ξ}}_{i} (θ, γ) {\hat{ξ}}_{i}^{⊤} (θ, γ) = \frac{1}{n} \sum_{i = 1}^{n} ξ_{i} (θ, γ) ξ_{i}^{⊤} (θ, γ) + o_{p} (1) .$

Proof of Lemma A2.

We first expand the expression as follows:

\begin{matrix} w_{i} & ε_{i} g^{'} (Z_{i}^{⊤} γ) \{Z_{i} - \frac{E (w_{i} Z_{i} | Z_{i}^{⊤} γ)}{E (w_{i} | Z_{i}^{⊤} γ)}\} - {\hat{w}}_{i} {\tilde{ε}}_{i} {\hat{g}}^{'} (Z_{i}^{⊤} γ) \{Z_{i} - \frac{\hat{E} ({\hat{w}}_{i} Z_{i} | Z_{i}^{⊤} γ)}{\hat{E} ({\hat{w}}_{i} | Z_{i}^{⊤} γ)}\} \\ = & ε_{i} (w_{i} - {\hat{w}}_{i}) g^{'} (Z_{i}^{⊤} γ) \{Z_{i} - \frac{E (w_{i} Z_{i} | Z_{i}^{⊤} γ)}{E (w_{i} | Z_{i}^{⊤} γ)}\} \\ + {\hat{w}}_{i} ε_{i} \{g^{'} (Z_{i}^{⊤} γ) - {\hat{g}}^{'} (Z_{i}^{⊤} γ)\} \{Z_{i}^{⊤} - \frac{E (w_{i} Z_{i}^{⊤} | Z_{i}^{⊤} γ)}{E (w_{i} | Z_{i}^{⊤} γ)}\} \\ - {\hat{w}}_{i} g^{'} (Z_{i}^{⊤} γ) \{\hat{g} (Z_{i}^{⊤} γ) - g (Z_{i}^{⊤} γ)\} \{Z_{i} - \frac{E (w_{i} Z_{i} | Z_{i}^{⊤} γ)}{E (w_{i} | Z_{i}^{⊤} γ)}\} \\ - {\hat{w}}_{i} \{\hat{g} (Z_{i}^{⊤} γ) - g (Z_{i}^{⊤} γ)\} \{g^{'} (Z_{i}^{⊤} γ) - {\hat{g}}^{'} (Z_{i}^{⊤} γ)\} \{Z_{i}^{⊤} - \frac{E (w_{i} Z_{i}^{⊤} | Z_{i}^{⊤} γ)}{E (w_{i} | Z_{i}^{⊤} γ)}\} \\ + {\hat{w}}_{i} ε_{i} {\hat{g}}^{'} (Z_{i}^{⊤} γ) (\frac{\hat{E} ({\hat{w}}_{i} Z_{i} | Z_{i}^{⊤} γ)}{\hat{E} ({\hat{w}}_{i} | Z_{i}^{⊤} γ)} - \frac{E (w_{i} Z_{i}^{⊤} | Z_{i}^{⊤} γ)}{E (w_{i} | Z_{i}^{⊤} γ)}) \\ - {\hat{w}}_{i} {\hat{g}}^{'} (Z_{i}^{⊤} γ) \{\hat{g} (Z_{i}^{⊤} γ) - g (Z_{i}^{⊤} γ)\} (\frac{\hat{E} ({\hat{w}}_{i} Z_{i} | Z_{i}^{⊤} γ)}{\hat{E} ({\hat{w}}_{i} | Z_{i}^{⊤} γ)} - \frac{E (w_{i} Z_{i}^{⊤} | Z_{i}^{⊤} γ)}{E (w_{i} | Z_{i}^{⊤} γ)}) \\ = & \frac{1}{\sqrt{n}} (\sum_{i = 1}^{n} A_{1 i} + \sum_{i = 1}^{n} A_{2 i} + \sum_{i = 1}^{n} A_{3 i} + \sum_{i = 1}^{n} A_{4 i} + \sum_{i = 1}^{n} A_{5 i} + \sum_{i = 1}^{n} A_{6 i}), \end{matrix}

(A8)

and

\begin{matrix} w_{i} ε_{i} & (X_{i} - \frac{E (w_{i} Z_{i} | Z_{i}^{⊤} γ)}{E (w_{i} | Z_{i}^{⊤} γ)}) - {\hat{w}}_{i} {\tilde{ε}}_{i} (X_{i} - \frac{\hat{E} ({\hat{w}}_{i} X_{i} | Z_{i}^{⊤} γ)}{\hat{E} ({\hat{w}}_{i} | Z_{i}^{⊤} γ)}) \\ = & ε_{i} (w_{i} - {\hat{w}}_{i}) (X_{i} - \frac{E (w_{i} Z_{i} | Z_{i}^{⊤} γ)}{E (w_{i} | Z_{i}^{⊤} γ)}) \\ + {\hat{w}}_{i} \{\hat{g} (Z_{i}^{⊤} γ) - g (Z_{i}^{⊤} γ)\} (X_{i} - \frac{E (w_{i} X_{i} | Z_{i}^{⊤} γ)}{E (w_{i} | Z_{i}^{⊤} γ)}) \\ + {\hat{w}}_{i} ε_{i} (\frac{\hat{E} ({\hat{w}}_{i} X_{i} | Z_{i}^{⊤} γ)}{\hat{E} ({\hat{w}}_{i} | Z_{i}^{⊤} γ)} - \frac{E (w_{i} Z_{i} | Z_{i}^{⊤} γ)}{E (w_{i} | Z_{i}^{⊤} γ)}) \\ + {\hat{w}}_{i} (\frac{\hat{E} ({\hat{w}}_{i} X_{i} | Z_{i}^{⊤} γ)}{\hat{E} ({\hat{w}}_{i} | Z_{i}^{⊤} γ)} - \frac{E (w_{i} Z_{i} | Z_{i}^{⊤} γ)}{E (w_{i} | Z_{i}^{⊤} γ)}) \{g (Z_{i}^{⊤} γ) - \hat{g} (Z_{i}^{⊤} γ)\} \\ = & \frac{1}{\sqrt{n}} (\sum_{i = 1}^{n} A_{7 i} + \sum_{i = 1}^{n} A_{8 i} + \sum_{i = 1}^{n} A_{9 i} + \sum_{i = 1}^{n} A_{10 i}) . \end{matrix}

(A9)

By Lemma A1, we can obtain

∥ \sum_{i = 1}^{n} A_{1 i} ∥ \leq \sum_{i = 1}^{n} |(w_{i} - {\hat{w}}_{i})| |ε_{i} g^{'} (Z_{i}^{⊤} γ)| ∥\{Z_{i} - \frac{E (w_{i} Z_{i} | Z_{i}^{⊤} γ)}{E (w_{i} | Z_{i}^{⊤} γ)}\}∥ = o_{p} (\sqrt{n}),

and

∥ \sum_{i = 1}^{n} A_{7 i} ∥ = \sum_{i = 1}^{n} |(w_{i} - {\hat{w}}_{i})| ∥ε_{i} (X_{i} - \frac{E (w_{i} Z_{i} | Z_{i}^{⊤} γ)}{E (w_{i} | Z_{i}^{⊤} γ)})∥ = o_{p} (\sqrt{n}) .

Write

ε_{i} = w_{i} g^{'} (Z_{i}^{⊤} γ) \{Z_{i} - \frac{E (w_{i} Z_{i} | Z_{i}^{⊤} γ)}{E (w_{i} | Z_{i}^{⊤} γ)}\}

. It implies that

E (ε_{i} | Z_{i}^{⊤} γ) = 0

. According to Lemma A1 and (26), we have

\begin{matrix} ∥\sum_{i = 1}^{n} \{{\hat{w}}_{i} - w_{i}\} \{\hat{g} (Z_{i}^{⊤} γ) - g (Z_{i}^{⊤} γ)\} g^{'} (Z_{i}^{⊤} γ) \{Z_{i} - \frac{E (w_{i} Z_{i} | Z_{i}^{⊤} γ)}{E (w_{i} | Z_{i}^{⊤} γ)}\}∥ \\ \leq & sup_{1 \leq i \leq n} |{\hat{w}}_{i} - w_{i}| \sum_{i = 1}^{n} |\{\hat{g} (Z_{i}^{⊤} γ) - g (Z_{i}^{⊤} γ)\}| ∥g^{'} (Z_{i}^{⊤} γ) \{Z_{i} - \frac{E (w_{i} Z_{i} | Z_{i}^{⊤} γ)}{E (w_{i} | Z_{i}^{⊤} γ)}\}∥ \\ = & o_{p} (\sqrt{n}), \end{matrix}

and

∥\sum_{i = 1}^{n} w_{i} g^{'} (Z_{i}^{⊤} γ) \{\hat{g} (Z_{i}^{⊤} γ) - g (Z_{i}^{⊤} γ)\} \{Z_{i} - \frac{E (w_{i} Z_{i} | Z_{i}^{⊤} γ)}{E (w_{i} | Z_{i}^{⊤} γ)}\}∥ = o_{p} (\sqrt{n}) .

Thus,

∥ \sum_{i = 1}^{n} A_{2 i} ∥ = o_{p} (\sqrt{n})

. Similarly, we let

ε_{i} = w_{i} \{Z_{i} - \frac{E (w_{i} Z_{i} | Z_{i}^{⊤} γ)}{E (w_{i} | Z_{i}^{⊤} γ)}\}

. We can obtain that

E (ε_{i} | Z_{i}^{⊤} γ) = 0

. According to Lemma A1 and (26) again, we have

\begin{matrix} ∥ \sum_{i = 1}^{n} A_{8 i} ∥ \leq & \sum_{i = 1}^{n} |{\hat{w}}_{i} - w_{i}| |\{\hat{g} (Z_{i}^{⊤} γ) - g (Z_{i}^{⊤} γ)\}| ∥(X_{i} - \frac{E ({\hat{w}}_{i} X_{i} | Z_{i}^{⊤} γ)}{E ({\hat{w}}_{i} | Z_{i}^{⊤} γ)})∥ \\ + \sum_{i = 1}^{n} |\{\hat{g} (Z_{i}^{⊤} γ) - g (Z_{i}^{⊤} γ)\}| ∥w_{i} (X_{i} - \frac{E ({\hat{w}}_{i} X_{i} | Z_{i}^{⊤} γ)}{E ({\hat{w}}_{i} | Z_{i}^{⊤} γ)})∥ \\ = & o_{p} (\sqrt{n}) . \end{matrix}

Similarly to Lemma 3 in Ma and Zhu [6], we can show that,

sup_{X} |n^{- 1} \sum_{i = 1}^{n} K_{h} (X_{i} - X) Y_{i} - r (X) f (X)| = O_{p} \{h^{2} + log n {(p / n)}^{1 / 2} h^{- 1 / 2}\} .

(A10)

According to Lemma A1, together with (29), we can obtain

∥{\hat{g}}^{'} (Z_{i}^{⊤} γ) - g^{'} (Z_{i}^{⊤} γ)∥ = o_{p} (1),

(A11)

∥\hat{E} \{\hat{w} (X, Z) Z | γ^{⊤} Z_{i}\} - E \{w (X, Z) Z | γ^{⊤} Z_{i}\}∥ = o_{p} (1),

(A12)

∥\hat{E} \{\hat{w} (X, Z) | γ^{⊤} Z_{i}\} - E \{w (X, Z) | γ^{⊤} Z_{i}\}∥ = o_{p} (1),

(A13)

and

∥\hat{E} \{\hat{w} (X, Z) X | γ^{⊤} Z_{i}\} - E \{w (X, Z) X | γ^{⊤} Z_{i}\}∥ = o_{p} (1) .

(A14)

Similarly, to prove

∥ \sum_{i = 1}^{n} A_{2 i} ∥ = o_{p} (\sqrt{n})

, together with (30)–(33), we have

∥ \sum_{i = 1}^{n} A_{4 i} ∥

= o_{p} (\sqrt{n})

,

∥ \sum_{i = 1}^{n} A_{6 i} ∥ = o_{p} (\sqrt{n})

, and

∥ \sum_{i = 1}^{n} A_{10 i} ∥ = o_{p} (\sqrt{n})

. Applying (26), together with (31), it is easy to show that

∥ \sum_{i = 1}^{n} A_{3 i} ∥ = o_{p} (\sqrt{n})

. In addition, by combining (28)–(33) and using Lemma A1, we can show that

∥ \sum_{i = 1}^{n} A_{5 i} ∥ = o_{p} (\sqrt{n})

and

∥ \sum_{i = 1}^{n} A_{9 i} ∥ = o_{p} (\sqrt{n})

.

\begin{matrix} \frac{1}{\sqrt{n}} \{\sum_{i = 1}^{n} {\hat{ξ}}_{i} (θ, γ) - \sum_{i = 1}^{n} ξ_{i} (θ, γ)\} = \frac{1}{\sqrt{n}} \sum_{i = 1}^{n} (\begin{matrix} A_{1 i} + A_{2 i} + A_{3 i} + A_{4 i} + A_{5 i} + A_{6 i} \\ A_{7 i} + A_{8 i} + A_{9 i} + A_{10 i} \end{matrix}) . \end{matrix}

Therefore, we have

\frac{1}{\sqrt{n}} \sum_{i = 1}^{n} {\hat{ξ}}_{i} (θ, γ) = \frac{1}{\sqrt{n}} \sum_{i = 1}^{n} ξ_{i} (θ, γ) + o_{p} (1) .

Next, we will prove the second part of Lemma A2. By Conditions 8 and 9,

\forall ϵ > 0

,

\begin{matrix} P {max_{1 \leq i \leq n} ∥ ξ_{i} (θ, γ) ∥ \leq n^{1 / 4} \sqrt{p} ϵ} & \leq \sum_{i = 1}^{n} P {∥ ξ_{i} (θ, γ) ∥ \leq n^{1 / 4} \sqrt{p} ϵ} \\ \leq \frac{1}{n p^{2} ϵ^{4}} \sum_{i = 1}^{n} E {∥ ξ_{i} (θ, γ) ∥}^{4} \\ = \frac{1}{ϵ^{k}} E {∥ ξ_{1} (θ, γ) / \sqrt{p} ∥}^{4} . \end{matrix}

By Cauchy–Schwarz inequality, we gain that

∥ ξ_{1} (θ, γ) / \sqrt{p} ∥^{4} \leq 1 / (p + r) \sum_{l = 1}^{p + r} {| ξ_{1 l} (θ, γ) |}^{4}

, where

ξ_{1 l} (θ, γ)

is the l-th component of

ξ_{1} (θ, γ)

. This implies that

max_{1 \leq i \leq n} ∥ ξ_{i} (θ, γ) ∥ = o_{p} (n^{1 / 4} \sqrt{p}) .

(A15)

\begin{matrix} \frac{1}{n} \sum_{i = 1}^{n} {\hat{ξ}}_{i} (θ, γ) {\hat{ξ}}_{i} {(θ, γ)}^{⊤} = & \frac{1}{n} \sum_{i = 1}^{n} ξ_{i} (θ, γ) ξ_{i} {(θ, γ)}^{⊤} + \frac{1}{n} \sum_{i = 1}^{n} ξ_{i} (θ, γ) {(\begin{matrix} A_{1 i} + \dots + A_{6 i} \\ A_{7 i} + \dots + A_{10 i} \end{matrix})}^{⊤} \\ + \frac{1}{n} \sum_{i = 1}^{n} (\begin{matrix} A_{1 i} + \dots + A_{6 i} \\ A_{7 i} + \dots + A_{10 i} \end{matrix}) ξ_{i} {(θ, γ)}^{⊤} \\ + \frac{1}{n} \sum_{i = 1}^{n} (\begin{matrix} A_{1 i} + \dots + A_{6 i} \\ A_{7 i} + \dots + A_{10 i} \end{matrix}) {(\begin{matrix} A_{1 i} + \dots + A_{6 i} \\ A_{7 i} + \dots + A_{10 i} \end{matrix})}^{⊤} . \end{matrix}

According to (34) and Condition 7, we can obtain

1 / \sqrt{n} ∥ ξ_{i} (θ, γ) ∥ = o_{p} (1)

. Using the proof of the first part above again, we have

\sum_{i = 1}^{n} A_{k i} ξ_{i} (θ, γ) = o_{p} (\sqrt{n}), k = 1, \dots, 10 .

Similarly,

\sum_{i = 1}^{n} A_{k i} A_{l i} = o_{p} (\sqrt{n}), k, l = 1, \dots, 10 .

By combining the above equations, we have

\frac{1}{n} \sum_{i = 1}^{n} {\hat{ξ}}_{i} (θ, γ) {\hat{ξ}}_{i} {(θ, γ)}^{⊤} = \frac{1}{n} \sum_{i = 1}^{n} ξ_{i} (θ, γ) ξ_{i} {(θ, γ)}^{⊤} + o_{p} (1) .

Therefore, the second part of Lemma A2 holds. □

Lemma A3.

Under Conditions 1–9,

∥ S_{n} - V ∥ = O_{p} (p / \sqrt{n})

, where

S_{n} = 1 / n \sum_{i = 1}^{n} ξ_{i} (θ, γ) ξ_{i} {(θ, γ)}^{⊤}

.

Proof of Lemma A3.

We can obtain that

t r {{(S_{n} - V)}^{2}} = O_{p} (p^{2} / n)

, using Lemma 4 in Chen et al. [23]. Therefore,

∥ S_{n} - V ∥ = {t r [{(S_{n} - V)}^{⊤} (S_{n} - V)]}^{1 / 2} = O_{p} (p n^{- 1 / 2})

. □

Lemma A4.

Under Conditions 1–11,

{max}_{1 \leq i \leq n} ∥ {\hat{ξ}}_{i} (θ, γ) ∥ = o_{p} (n^{1 / 4} \sqrt{p})

and

{max}_{1 \leq i \leq n} | λ^{⊤} {\hat{ξ}}_{i} (θ, γ) |

= o_{p} (1)

for all

λ = O_{p} (a_{n})

.

Proof of Lemma A4.

Based on the proof of the first part in Lemma A2, it is easy to show that

∥ {\hat{ξ}}_{i} (θ, γ) ∥ = ∥ ξ_{i} (θ, γ) ∥ + o_{p} (1) .

Therefore, by combining the above equation and (34), we have

∥ {\hat{ξ}}_{i} (θ, γ) ∥ = = o_{p} (n^{1 / 4} \sqrt{p}),

and for all

λ = O_{p} (a_{n})

,

max_{1 \leq i \leq n} | λ^{⊤} {\hat{ξ}}_{i} (θ, γ) | = o_{p} (1) .

□

Lemma A5.

Under Conditions 1–11,

∥ λ_{(θ_{0}, γ_{0})} ∥ = O_{p} (a_{n})

and

∥ λ_{(\hat{θ}, \hat{γ})} ∥ = O_{p} (a_{n})

.

Proof of Lemma A5.

Let

λ_{(θ, γ)} = ρ α

, where

ρ = ∥ λ_{(θ, γ)} ∥

,

α \in R^{p + r}

, and

∥ α ∥ = 1

. According to (8),

λ_{(θ, γ)} \in R^{p + r}

satisfies

0 = \frac{1}{n} \sum_{i = 1}^{n} \frac{{\hat{ξ}}_{i} (θ, γ)}{1 + λ_{(θ, γ)}^{⊤} {\hat{ξ}}_{i} (θ, γ)} = : ψ (λ) .

Using

1 / (1 + λ_{(θ, γ)}^{⊤} {\hat{ξ}}_{i} (θ, γ)) = 1 - λ_{(θ, γ)}^{⊤} {\hat{ξ}}_{i} (θ, γ) / (1 + λ_{(θ, γ)}^{⊤} {\hat{ξ}}_{i} (θ, γ))

, we can obtain that

n^{- 1} | α^{⊤} \sum_{i = 1}^{n} {\hat{ξ}}_{i} (θ, γ) | \geq \frac{ρ}{1 + ρ max_{1 \leq i \leq n} ∥ {\hat{ξ}}_{i} (θ, γ) ∥} α^{⊤} {\hat{S}}_{n} (θ, γ) α,

where

{\hat{S}}_{n} (θ, γ) = \frac{1}{n} \sum_{i = 1}^{n} {\hat{ξ}}_{i} (θ, γ) {\hat{ξ}}_{i} {(θ, γ)}^{⊤}

. Note that

0 < 1 + λ_{(θ, γ)}^{⊤} {\hat{ξ}}_{i} (θ, γ) \leq 1 + ρ max_{1 \leq i \leq n} ∥ {\hat{ξ}}_{i} (θ, γ) ∥,

and we have

ρ \{α^{⊤} {\hat{S}}_{n} (θ, γ) α - α^{⊤} {\hat{ξ}}_{i} (θ, γ) max_{1 \leq i \leq n} ∥ {\hat{ξ}}_{i} (θ, γ) ∥\} \leq n^{- 1} | α^{⊤} \sum_{i = 1}^{n} {\hat{ξ}}_{i} (θ, γ) | .

(A16)

Using Lemma A4, we can show

n^{- 1} | α^{⊤} \sum_{i = 1}^{n} {\hat{ξ}}_{i} (θ_{0}, γ_{0}) | \leq n^{- 1} ∥ \sum_{i = 1}^{n} {\hat{ξ}}_{i} (θ_{0}, γ_{0}) ∥ = O_{p} (a_{n}) .

Therefore,

max_{1 \leq i \leq n} n^{- 1} ∥ {\hat{ξ}}_{i} (θ_{0}, γ_{0}) ∥ | α^{⊤} \sum_{i = 1}^{n} {\hat{ξ}}_{i} (θ_{0}, γ_{0}) | = o_{p} (1) .

(A17)

Combining (35) and (36), we have

| ρ \{α^{⊤} {\hat{S}}_{n} (θ_{0}, γ_{0}) α + o_{p (1)}\} | = O_{p} (a_{n}) .

Using Lemma A1, we can show that

P (α^{⊤} {\hat{S}}_{n} (θ_{0}, γ_{0}) α \geq \frac{1}{2} C) \to 1

as

n \to \infty

. Therefore,

ρ = O_{p} (a_{n})

, which means that

∥ λ_{(θ_{0}, γ_{0})} ∥ = ρ = O_{p} (a_{n}),

and the proof of

∥ λ_{(\hat{θ}, \hat{γ})} ∥ = O_{p} (a_{n})

follows by Owen [33]. □

Lemma A6.

Under Conditions 1–11,

{\tilde{ℓ}}_{p} (θ, γ)

attains its minimum in

D_{n}

, with probability approaching 1.

Proof of Lemma A6.

The proof of Lemma A6 is analogous to that of Lemma 2 in Tang and Leng [24], and is therefore omitted for brevity. □

Proof of Theorem 1.

According to Lemma A6,

{\tilde{ℓ}}_{p} (θ, γ)

attains its minimum in

D_{n}

. Let

{(θ^{⊤}, γ^{⊤})}^{⊤} \in D_{n}

. combining Lemma A2 and Taylor’s expansion, we can show that

\begin{matrix} \frac{1}{n} \frac{\partial {\tilde{ℓ}}_{p} (θ, γ)}{\partial θ_{j}} = & \frac{1}{n} \sum_{i = 1}^{n} \frac{λ^{⊤} \partial {\hat{ξ}}_{i} (θ, γ) / \partial θ_{j}}{1 + λ^{⊤} {\hat{ξ}}_{i} (θ, γ)} + p_{ν}^{'} (| θ_{j} |) sign (θ_{j}) \\ = & \frac{1}{n} \sum_{i = 1}^{n} \frac{λ^{⊤} \partial ξ_{i} (θ, γ) / \partial θ_{j}}{1 + λ^{⊤} ξ_{i} (θ, γ)} + o_{p} (1) + p_{ν}^{'} (| θ_{j} |) sign (θ_{j}) \\ = & \frac{1}{n} \sum_{i = 1}^{n} λ^{⊤} \{\frac{\partial ξ_{i} (θ_{0}, γ_{0})}{\partial θ_{j}} + \frac{\partial^{2} ξ_{i} (θ_{0}, γ_{0})}{\partial θ_{j} \partial θ^{⊤}} (θ - θ_{0})\} \\ + o_{p} (1) + p_{ν}^{'} (| θ_{j} |) sign (θ_{j}) \\ = & A_{1} + A_{2} + p_{ν}^{'} (| θ_{j} |) sign (θ_{j}) + o_{p} (1), \end{matrix}

and

\begin{matrix} \frac{1}{n} \frac{\partial {\tilde{ℓ}}_{p} (θ, γ)}{\partial γ_{j}} = & \frac{1}{n} \sum_{i = 1}^{n} \frac{λ^{⊤} \partial {\hat{ξ}}_{i} (θ, γ) / \partial γ_{j}}{1 + λ^{⊤} {\hat{ξ}}_{i} (θ, γ)} + p_{ν}^{'} (| γ_{j} |) sign (γ_{j}) \\ = & \frac{1}{n} \sum_{i = 1}^{n} \frac{λ^{⊤} \partial ξ_{i} (θ, γ) / \partial γ_{j}}{1 + λ^{⊤} ξ_{i} (θ, γ)} + o_{p} (1) + p_{ν}^{'} (| γ_{j} |) sign (γ_{j}) \\ = & \frac{1}{n} \sum_{i = 1}^{n} λ^{⊤} \{\frac{\partial ξ_{i} (θ_{0}, γ_{0})}{\partial γ_{j}} + \frac{\partial^{2} ξ_{i} (θ_{0}, γ_{0})}{\partial γ_{j} \partial γ^{⊤}} (γ - γ_{0})\} \\ + o_{p} (1) + p_{ν}^{'} (| γ_{j} |) sign (γ_{j}) \\ = & A_{3} + A_{4} + p_{ν}^{'} (| γ_{j} |) sign (γ_{j}) + o_{p} (1) . \end{matrix}

It follows from Conditions 5 and 6 and Lemma A5 that

\begin{matrix} max_{j \notin A_{1}} (| A_{1} |) & = max_{j \notin A_{1}} \frac{1}{n} |\sum_{i = 1}^{n} λ^{⊤} \{E \frac{\partial ξ_{i} (θ_{0}, γ_{0})}{\partial θ_{j}} + (\frac{\partial ξ_{i} (θ_{0}, γ_{0})}{\partial θ_{j}} - E \frac{\partial ξ_{i} (θ_{0}, γ_{0})}{\partial θ_{j}})\}| \\ \leq max_{j \notin A_{1}} |λ^{⊤} E \frac{\partial ξ (θ_{0}, γ_{0})}{\partial θ_{j}}| + \frac{1}{n} ∥ λ^{⊤} ∥ ∥\sum_{i = 1}^{n} (\frac{\partial ξ_{i} (θ_{0}, γ_{0})}{\partial θ_{j}} - E \frac{\partial ξ (θ_{0}, γ_{0})}{\partial θ_{j}})∥ \\ = o_{p} (1), \end{matrix}

and

\begin{matrix} max_{j \notin A_{1}} (| A_{2} |) & \leq \frac{1}{n} ∥ λ^{⊤} ∥ ∥\sum_{i = 1}^{n} (\frac{\partial^{2} ξ_{i} (θ_{0}, γ_{0})}{\partial θ_{j} \partial θ^{⊤}} - E \frac{\partial^{2} ξ (θ_{0}, γ_{0})}{\partial θ_{j} \partial θ^{⊤}})∥ ∥ (θ - θ_{0}) ∥ \\ + max_{j \notin A_{1}} |λ^{⊤} E \frac{\partial^{2} ξ (θ_{0}, γ_{0})}{\partial θ_{j} \partial θ^{⊤}}| ∥ (θ - θ_{0}) ∥ = o_{p} (1) . \end{matrix}

Similarly, we can obtain

\begin{matrix} max_{j \notin A_{2}} (| A_{3} |) & = max_{j \notin A_{2}} \frac{1}{n} |\sum_{i = 1}^{n} λ^{⊤} \{E \frac{\partial ξ_{i} (θ_{0}, γ_{0})}{\partial γ_{j}} + (\frac{\partial ξ_{i} (θ_{0}, γ_{0})}{\partial γ_{j}} - E \frac{\partial ξ_{i} (θ_{0}, γ_{0})}{\partial γ_{j}})\}| \\ = o_{p} (1), \end{matrix}

and

\begin{matrix} max_{j \notin A_{2}} (| A_{4} |) & \leq \frac{1}{n} ∥ λ^{⊤} ∥ ∥\sum_{i = 1}^{n} (\frac{\partial^{2} ξ_{i} (θ_{0}, γ_{0})}{\partial γ_{j} \partial γ^{⊤}} - E \frac{\partial^{2} ξ (θ_{0}, γ_{0})}{\partial γ_{j} \partial γ^{⊤}})∥ ∥ (γ - γ_{0}) ∥ \\ + max_{j \notin A_{2}} |λ^{⊤} E \frac{\partial^{2} ξ (θ_{0}, γ_{0})}{\partial γ_{j} \partial γ^{⊤}}| ∥ (γ - γ_{0}) ∥ = o_{p} (1) . \end{matrix}

Based on Condition 10, we can show that

P_{τ}^{'} (| θ_{j} |) sign {(θ_{j})}_{{j \notin A_{1}}} = τ

and

P_{τ}^{'} (| θ_{j} |) sign {(θ_{j})}_{{j \notin A_{1}}}

= τ sign {(θ_{j})}_{{j \notin A_{1}}}

. Thus, with probability approaching 1,

\partial {\tilde{l}}_{p} (θ, γ) / \partial θ_{j}

is dominated by the sign of

θ_{j}

,

j \notin A_{1}

, as

n \to \infty

. It means that

{lim}_{n \to \infty} P ({\hat{θ}}_{2} = 0) = 1

. Similarly, using Condition 10 again, we have

P_{ν}^{'} (| γ_{j} |) sign {(γ_{j})}_{{j \notin A_{2}}} = ν

,

P_{ν}^{'} (| γ_{j} |) sign {(γ_{j})}_{{j \notin A_{2}}} = ν sign {(γ_{j})}_{{j \notin A_{2}}}

, and

{lim}_{n \to \infty} P ({\hat{γ}}_{2} = 0) = 1

. Therefore, Theorem 1 (1) is proved.

In this section, we begin to prove Theorem 1 (2). Minimising (9) is equal to minimising the following function

\begin{matrix} {\tilde{ℓ}}_{p} (θ, γ, λ, μ, ϑ) = & \frac{1}{n} \sum_{i = 1}^{n} log {1 + λ^{⊤} {\hat{ξ}}_{i} (θ, γ)} + \sum_{i = 1}^{p} p_{τ} (| θ_{i} |) + \sum_{i = 1}^{r} p_{ν} (| γ_{i} |) \\ + μ^{⊤} H_{2} θ + ϑ^{⊤} H_{4} γ, \end{matrix}

where

μ

and

ϑ

are also Lagrange multipliers. Define

\begin{matrix} Q_{1 n} (θ, γ, λ, μ, ϑ) & = \frac{1}{n} \sum_{i = 1}^{n} \frac{{\hat{ξ}}_{i} (θ, γ)}{1 + λ^{⊤} {\hat{ξ}}_{i} (θ, γ)}, \\ Q_{2 n} (θ, γ, λ, μ, ϑ) & = \frac{1}{n} \sum_{i = 1}^{n} \frac{{\partial {\hat{ξ}}_{i} (θ, γ) / \partial θ}^{⊤} λ}{1 + λ^{⊤} {\hat{ξ}}_{i} (θ, γ)} + b_{1} (θ) + H_{2}^{⊤} μ, \\ Q_{3 n} (θ, γ, λ, μ, ϑ) & = \frac{1}{n} \sum_{i = 1}^{n} \frac{{\partial {\hat{ξ}}_{i} (θ, γ) / \partial γ}^{⊤} λ}{1 + λ^{⊤} {\hat{ξ}}_{i} (θ, γ)} + b_{2} (γ) + H_{4}^{⊤} ϑ, \\ Q_{4 n} (θ, γ, λ, μ, ϑ) & = H_{2} θ, Q_{5 n} (θ, γ, λ, μ, ϑ) = H_{4} γ, \end{matrix}

where

b_{1} (θ) = {P_{τ}^{'} (| θ_{1} |) sign (θ_{1}), \dots, P_{τ}^{'} (| θ_{p_{1}} |) sign (θ_{p_{1}})},

and

b_{2} (γ) = {P_{ν}^{'} (| γ_{1} |) sign (γ_{1}), \dots, P_{ν}^{'} (| γ_{p_{2}} |) sign (θ_{p_{2}})} .

The minimizer

(\hat{θ}, \hat{γ}, \hat{λ}, \hat{μ}, \hat{ϑ})

satisfies

Q_{j n} (θ, γ, λ, μ, ϑ) = 0

,

j = 1, \dots, n

. According to

∥ λ ∥ = O_{p} (a_{n})

,

Q_{2 n} (θ, γ, λ, μ, ϑ) = 0

, and

Q_{3 n} (θ, γ, λ, μ, ϑ) = 0

, we can obtain

∥ μ ∥ = O_{p} (a_{n})

and

∥ ϑ ∥ = O_{p} (a_{n})

. Thus, by expanding

Q_{j n} (θ, γ, λ, μ, ϑ)

at

(θ_{0}, γ_{0}, 0, 0, 0)

, we have

\begin{matrix} (\begin{matrix} - Q_{1 n} (θ_{0}, γ_{0}, 0, 0, 0) \\ 0 \\ 0 \\ 0 \\ 0 \end{matrix}) = (\begin{matrix} - {\hat{S}}_{n} & {\hat{M}}_{1} & {\hat{M}}_{2} & 0 & 0 \\ {\hat{M}}_{1}^{T} & b_{1}^{'} (θ) & 0 & H_{2}^{⊤} & 0 \\ {\hat{M}}_{2} & 0 & b_{2}^{'} (γ) & 0 & H_{4}^{⊤} \\ 0 & H_{2} & 0 & 0 & 0 \\ 0 & 0 & H_{4} & 0 & 0 \end{matrix}) (\begin{matrix} \hat{λ} - 0 \\ \hat{θ} - θ_{0} \\ \hat{γ} - γ_{0} \\ \hat{μ} - 0 \\ \hat{ϑ} - 0 \end{matrix}), \end{matrix}

where

{\hat{M}}_{1} = n^{- 1} \sum_{i = 1}^{n} \partial {\hat{ξ}}_{i} (θ, γ) / \partial θ

and

{\hat{M}}_{2} = n^{- 1} \sum_{i = 1}^{n} \partial {\hat{ξ}}_{i} (θ, γ) / \partial γ

. Let

M_{1} = n^{- 1} \sum_{i = 1}^{n} \partial ξ_{i} (θ, γ) / \partial θ

and

M_{2} = n^{- 1} \sum_{i = 1}^{n} \partial {\hat{ξ}}_{i} (θ, γ) / \partial γ

, it is easy to show that

\begin{matrix} (\begin{matrix} - Q_{1 n} (θ_{0}, γ_{0}, 0, 0, 0) \\ 0 \\ 0 \\ 0 \\ 0 \end{matrix}) = (\begin{matrix} - S_{n} & M_{1} & M_{2} & 0 & 0 \\ M_{1}^{T} & 0 & 0 & H_{2}^{⊤} & 0 \\ M_{2} & 0 & 0 & 0 & H_{4}^{⊤} \\ 0 & H_{2} & 0 & 0 & 0 \\ 0 & 0 & H_{4} & 0 & 0 \end{matrix}) (\begin{matrix} \hat{λ} - 0 \\ \hat{θ} - θ_{0} \\ \hat{γ} - γ_{0} \\ \hat{μ} - 0 \\ \hat{ϑ} - 0 \end{matrix}) + R_{n}, \end{matrix}

(A18)

where

R_{n} = \sum_{k = 1}^{8} R_{n}^{(k)}

,

R_{n}^{(1)} = {(R_{1 n}^{⊤ (1)}, R_{2 n}^{⊤ (1)}, R_{3 n}^{⊤ (1)}, 0, 0)}^{⊤}

,

R_{1 n}^{(1)} \in R^{p + r - 1}

,

R_{2 n}^{(1)} \in R^{p}

,

R_{3 n}^{(1)} \in R^{r - 1}

, the k-th component of

R_{j n} (1)

is given by

R_{j n, k} (1) = \frac{1}{2} {(\hat{ϕ} - ϕ)}^{⊤} \frac{\partial^{2} Q_{j n, k} (ϕ^{*})}{\partial ϕ \partial ϕ^{⊤}} (\hat{ϕ} - ϕ),

ϕ = {(λ^{⊤}, θ^{⊤}, γ^{⊤})}^{⊤}

,

ϕ^{*} = {(λ^{* ⊤}, θ^{* ⊤}, γ^{* ⊤})}^{⊤}

such that

∥ λ^{*} ∥ \leq ∥ \hat{λ} ∥

,

∥ θ^{*} - θ_{0} ∥ \leq ∥ \hat{θ} - θ_{0} ∥

, and

∥ γ^{*} - γ_{0} ∥ \leq ∥ \hat{γ} - γ_{0} ∥

. In addition, we have

R_{n}^{(2)} = {0, b_{1}^{⊤} (θ_{0}), 0, 0, 0}^{⊤}

,

R_{n}^{(3)} = {0, 0, b_{2}^{⊤} (γ_{0}), 0, 0}^{⊤}

,

R_{n}^{(4)} = {0, {b_{1}^{'} (θ^{*}) (θ^{*} - θ_{0})}^{⊤}, 0, 0, 0}^{⊤}

,

R_{n}^{(5)} = {0, 0, {b_{2}^{'} (γ^{*}) (γ^{*} - γ_{0})}^{⊤}, 0, 0}^{⊤}

,

R_{n}^{(6)} = {{({\hat{S}}_{n} (θ_{0}, γ_{0}) - S_{n}) \hat{λ}}^{⊤} + {{\hat{M}}_{1} (θ_{0}, γ_{0}) (\hat{θ} - θ_{0})}^{⊤} + {{\hat{M}}_{2} (θ_{0}, γ_{0}) (\hat{γ} - γ_{0})}^{⊤}, 0, 0, 0, 0}^{⊤}

,

R_{n}^{(7)} = {0, {{({\hat{M}}_{1} (θ_{0}, γ_{0}) - M_{1})}^{⊤} \hat{λ}}^{⊤}, 0, 0, 0}^{⊤}

, and

R_{n}^{(8)} = {0, 0, {({\hat{M}}_{2} (θ_{0}

,

γ_{0}) - M_{2})^{⊤} \hat{λ} {}^{⊤}, 0, 0}}^{⊤}

. By Conditions 5–7 and Lemma A5, we can show that

∥ R_{l n}^{(1)} ∥^{2} \leq n^{- 2} {∥ \hat{ϕ} - ϕ ∥}^{4} \sum_{i, j, k = 1}^{2 (p + r - 1)} \frac{\partial^{2} Q_{i} (X, Z)}{\partial ϕ_{j} \partial ϕ_{k}} = O_{p} \{{(p + r - 1)}^{3} a_{n}^{4}\}, l = 1, 2, 3 .

Combining this with Condition 7, it implies that

∥ R_{n}^{(1)} ∥ = o_{p} (1 / \sqrt{n})

. According to Conditions 10 and 11, it easy to show that

R_{n}^{(2)} = o_{p} (1 / \sqrt{n})

,

R_{n}^{(3)} = o_{p} (1 / \sqrt{n})

,

R_{n}^{(4)} = o_{p} (1 / \sqrt{n})

, and

R_{n}^{(5)} = o_{p} (1 / \sqrt{n})

. Similarly to Leng and Tang [25], we can also prove that

∥ R_{n}^{(6)} ∥ = o_{p} (1 / \sqrt{n})

,

∥ R_{n}^{(7)} ∥ = o_{p} (1 / \sqrt{n})

, and

∥ R_{n}^{(8)} ∥ = o_{p} (1 / \sqrt{n})

. Therefore, we have

∥ R_{n} ∥ = o_{p} (1 / \sqrt{n})

. Let

ψ = {(θ^{⊤}, γ^{⊤}, μ^{⊤}, ϑ^{⊤})}^{⊤}, K_{11} = - S_{n}, K_{12} = (M_{1}, M_{2}, 0, 0), K_{21} = K_{12}^{⊤}

and

\begin{matrix} K_{22} = (\begin{matrix} 0 & 0 & H_{2}^{⊤} & 0 \\ 0 & 0 & 0 & H_{4}^{⊤} \\ H_{2} & 0 & 0 & 0 \\ 0 & H_{4} & 0 & 0 \end{matrix}), K = (\begin{matrix} K_{11} & K_{12} \\ K_{21} & K_{22} \end{matrix}) . \end{matrix}

By inverting (37), it can be shown that

{({(\hat{λ} - 0)}^{⊤}, {(\hat{ψ} - ψ)}^{⊤})}^{⊤} = K^{- 1} \{{(- Q_{1 n} {(θ_{0}, γ_{0}, 0, 0, 0)}^{⊤}, 0)}^{⊤} + o_{p} (n^{- 1 / 2})\} .

(A19)

Applying block matrix inversion, we have

\begin{matrix} K^{- 1} = (\begin{matrix} K_{11}^{- 1} + K_{11}^{- 1} K_{12} F^{- 1} K_{21} K_{11}^{- 1} & - K_{11}^{- 1} K_{12} F^{- 1} \\ - F^{- 1} K_{21} K_{11}^{- 1} & F^{- 1} \end{matrix}), \end{matrix}

where

F = K_{22} - K_{21} K_{11}^{- 1} K_{12} .

Thus,

\hat{ψ} - ψ = F^{- 1} K_{21} K_{11}^{- 1} Q_{1 n} (θ, γ, λ, μ, ϑ) + o_{p} (n^{- 1 / 2}),

and

\begin{matrix} F^{- 1} = (\begin{matrix} U^{- 1} - Ω & U^{- 1} H^{⊤} {(H U^{- 1} H^{⊤})}^{- 1} \\ {(H U^{- 1} H^{⊤})}^{- 1} H U^{- 1} & - {(H U^{- 1} H^{⊤})}^{- 1} \end{matrix}), \end{matrix}

where

Ω = U^{- 1} H^{⊤} {(H U^{- 1} H^{⊤})}^{- 1} H U^{- 1}

and

\begin{matrix} U = (\begin{matrix} M_{1}^{⊤} S_{n}^{- 1} M_{1} & M_{1}^{⊤} S_{n}^{- 1} M_{2} \\ M_{2}^{⊤} S_{n}^{- 1} M_{1} & M_{2}^{⊤} S_{n}^{- 1} M_{2} \end{matrix}) . \end{matrix}

This implies that

\begin{matrix} (\begin{matrix} \hat{θ} - θ_{0} \\ \hat{γ} - γ_{0} \end{matrix}) = {U^{- 1} - Ω} {\frac{1}{n} \sum_{i = 1}^{n} ξ_{i} (θ_{0}, γ_{0}) + o_{p} (n^{- \frac{1}{2}})} . \end{matrix}

Furthermore, we can obtain

\begin{matrix} (\begin{matrix} {\hat{θ}}_{1} - θ_{10} \\ {\hat{γ}}_{1} - γ_{10} \end{matrix}) = H_{0} {U^{- 1} - Ω} {\frac{1}{n} \sum_{i = 1}^{n} ξ_{i} (θ_{0}, γ_{0}) + o_{p} (n^{- \frac{1}{2}})} . \end{matrix}

Using Lemma A3, we have

Var {n^{1 / 2} {({\hat{θ}}_{1}^{⊤}, {\hat{γ}}_{1}^{⊤})}^{⊤}} = I_{B} = H_{0} U^{- 1} V U^{- 1} H_{0}^{⊤} - H_{0} U^{- 1} H^{⊤}

{(H U^{- 1} H^{⊤})}^{- 1} H_{2} U^{- 1} V A^{- 1} H_{2}^{⊤} {(H U^{- 1} H^{⊤})}^{- 1} H U^{- 1} H_{0}^{⊤}

. Define

Y_{n i} = \frac{1}{\sqrt{n}} T_{n i}

, where

T_{n i} = B I_{B}^{- \frac{1}{2}} (H_{0} U^{- 1} - H_{0} U^{- 1} H^{⊤} {(H U^{- 1} H^{⊤})}^{- 1} H U^{- 1}) ξ_{i} (θ_{0}, γ_{0})

. According to the central limit theorem, it follows that

\frac{1}{\sqrt{n}} B I_{B}^{- \frac{1}{2}} {H_{0} U^{- 1} - H_{0} U^{- 1} H^{⊤} {(H U^{- 1} H^{⊤})}^{- 1} H U^{- 1}} \sum_{i = 1}^{n} ξ_{i} (θ_{0}, γ_{0}) \to N (0, G)

in distribution. □

Proof of Theorem 2.

Define

{\hat{φ}}_{i} = {\hat{λ}}^{⊤} {\hat{ξ}}_{i} (\hat{θ}, \hat{γ})

and

φ_{i} = {\hat{λ}}^{⊤} ξ_{i} (\hat{θ}, \hat{γ})

for

i = 1, \dots, n

. Combining Lemma A2, Lemma A4, and Taylor’s expansion, we can show that

\begin{matrix} {\tilde{l}}_{p} (\hat{θ}, \hat{γ}) & = \sum_{i = 1}^{n} \{{\hat{φ}}_{i} - \frac{{\hat{φ}}_{i}^{2}}{2} + \frac{{\hat{φ}}_{i}^{3}}{3 {(1 + ζ_{i})}^{4} + o_{p} (1)}\} \\ = \sum_{i = 1}^{n} \{φ_{i} - \frac{φ_{i}^{2}}{2} + \frac{φ_{i}^{3}}{3 {(1 + ζ_{i})}^{4} + o_{p} (1)}\} + o_{p} (1), \end{matrix}

(A20)

where

| ζ_{i} | \leq | φ_{i} |

. According to (38), the asymptotic expansion for

\hat{λ}

can be shown as

\hat{λ} = \{Σ^{- 1} + Σ^{- 1} (M_{1}, M_{2}) F_{1} {(M_{1}, M_{2})}^{⊤} Σ^{- 1}\} ({\bar{ξ}}_{i} + o_{p} (1)),

where

F_{1} = U^{- 1} - Ω

,

Ω = U^{- 1} H^{⊤} {(H U^{- 1} H^{⊤})}^{- 1} H U^{- 1}

, and

{\bar{ξ}}_{i} = \frac{1}{n} \sum_{i = 1}^{n} ξ_{i} (θ_{0}, γ_{0})

. From (39), we can gain the expansion of

{\tilde{l}}_{p} (\hat{θ}, \hat{γ})

as follows

{\tilde{l}}_{p} (\hat{θ}, \hat{γ}) = n {\bar{ξ}}_{i}^{⊤} Σ^{- 1} (M_{1}, M_{2}) U^{- 1} H^{⊤} {(H U^{- 1} H^{⊤})}^{- 1} H U^{- 1} {(M_{1}, M_{2})}^{⊤} Σ^{- 1} {\bar{ξ}}_{i} + o_{p} (1) .

There exist values of

{\tilde{H}}_{2}

and

{\tilde{H}}_{4}

that satisfy

{\tilde{H}}_{2} θ = 0

,

{\tilde{H}}_{4} θ = 0

,

{\tilde{H}}_{2} {\tilde{H}}_{2}^{⊤} = I_{p - q_{1}}

, and

{\tilde{H}}_{4} {\tilde{H}}_{4}^{⊤} = I_{p - q_{2}}

. Let

\tilde{H} = (\begin{matrix} {\tilde{H}}_{2} & 0 \\ 0 & {\tilde{H}}_{4} \end{matrix})

. Similarly, we can show that

\begin{matrix} {\tilde{l}}_{p} {(\hat{θ}, \hat{γ})}_{L_{n} (θ_{0}^{⊤}, L_{n} {(γ_{0}^{⊤})}^{⊤} = 0)} = n {\bar{ξ}}_{i}^{⊤} Σ^{- 1} (M_{1}, M_{2}) \tilde{Ω} {(M_{1}, M_{2})}^{⊤} Σ^{- 1} {\bar{ξ}}_{i} + o_{p} (1), \end{matrix}

where

\tilde{Ω} = U^{- 1} {\tilde{H}}^{⊤} {(\tilde{H} U^{- 1} {\tilde{H}}^{⊤})}^{- 1} \tilde{H} U^{- 1}

.

By combining above two equations, it follows that

{\tilde{l}}_{p} (L_{n}) = n {\bar{ξ}}_{i}^{⊤} Σ^{- 1 / 2} (P_{1} - P_{2}) Σ^{- 1 / 2} {\bar{ξ}}_{i} + o_{p} (1),

where

P_{1} = Σ^{- 1 / 2} (M_{1}, M_{2}) \tilde{Ω} {(M_{1}, M_{2})}^{⊤} Σ^{- 1 / 2},

P_{2} = Σ^{- 1 / 2} (M_{1}, M_{2}) Ω {(M_{1}, M_{2})}^{⊤} Σ^{- 1 / 2}

. Because

P_{1} - P_{2}

is an idempotent matrix with rank

q_{1} + q_{2}

,

P_{1} - P_{2}

can be denoted as

P_{n}^{⊤} P_{n}

, where

P_{n} \in R^{(q_{1} + q_{2}) \times (p + r - 1)}

and

P_{n} P_{n}^{⊤} = I_{q_{1} + q_{2}}

. By the central limit theorem, we have

\sqrt{n} P_{n} Σ^{- 1 / 2} {\bar{ξ}}_{i} \overset{L}{\to} N (0, I_{q_{1} + q_{2}})

. Therefore,

{\tilde{l}}_{p} (L_{n}) \overset{L}{\to} χ_{q_{1} + q_{2}}^{2}

and Theorem 2 follows. □

The SIM and PLM are two special cases of the PLSIM. According to the same arguments in the proofs of Theorems 1 and 2, Corollaries 1–4 can be proved in a similar manner, and are hence omitted.

References

Carroll, R.; Fan, J.; Gijbels, I.; Wand, M.P. Generalized partially linear single-index models. J. Am. Stat. Assoc. 1997, 92, 477–489. [Google Scholar] [CrossRef]
Yu, Y.; Ruppert, D. Penalized spline estimation for partially linear single-index models. J. Am. Stat. Assoc. 2002, 97, 1042–1054. [Google Scholar] [CrossRef]
Zhu, L.X.; Xue, L.G. Empirical likelihood confidence regions in a partially linear single-index model. J. R. Stat. Soc. Ser. B 2006, 68, 549–570. [Google Scholar] [CrossRef]
Xia, Y.; Härdle, W. Semi-parametric estimation of partially linear single-index models. J. Multivar. Anal. 2006, 97, 1162–1184. [Google Scholar] [CrossRef]
Liang, H.; Xia, L.; Li, R.; Tsai, C.L. Estimation and testing for partially linear single-index models. Ann. Stat. 2010, 38, 3811–3836. [Google Scholar] [CrossRef] [PubMed]
Ma, Y.; Zhu, L.P. Doubly robust and efficient estimators for heteroscedastic partially linear single-index models allowing high dimensional covariates. J. R. Stat. Soc. Ser. B 2013, 75, 305–322. [Google Scholar] [CrossRef]
Fang, J.L.; Liu, W.R.; Lu, X.W. Empirical likelihood for heteroscedastic partially linear single-index models with growing dimensional data. Metrika 2018, 81, 255–281. [Google Scholar] [CrossRef]
Hao, C.; Yin, X. A Normality Test for High-dimensional Data Based on the Nearest Neighbor Approach. J. Am. Stat. Assoc. 2023, 118, 719–731. [Google Scholar]
Liu, B.; Zhang, Q.; Xue, L.; Song, P.; Kang, J. Robust High-Dimensional Regression with Coefficient Thresholding and its Application to Imaging Data Analysis. J. Am. Stat. Assoc. 2024, 119, 715–729. [Google Scholar] [CrossRef]
Breiman, L. Heuristics of instability and stabilization in model selection. Ann. Stat. 1996, 24, 2350–2383. [Google Scholar] [CrossRef]
Tibshirani, R. Regression shrinkage and selection via the Lasso. J. R. Stat. Soc. Ser. B 1996, 58, 267–288. [Google Scholar] [CrossRef]
Fan, J.; Li, R. Variable selection via nonconcave penalized likelihood and its oracle properties. J. Am. Stat. Assoc. 2001, 96, 1348–1360. [Google Scholar] [CrossRef]
Xie, H.; Huang, J. SCAD-penalized regression in high-dimensional partially linear models. Ann. Stat. 2009, 37, 673–696. [Google Scholar] [CrossRef]
Wang, T.; Zhu, L.X. Consistent Model Selection and Estimation in a General Single-Index Model with “Large p and Small n”; Technical Report; Department of Mathematics, Hong Kong Baptist University: Hong Kong, China, 2011. [Google Scholar]
Zhang, J.; Wang, T.; Zhu, L.X.; Liang, H. A dimension reduction based approach for estimation and variable selection in partially linear single-index models with high-dimensional covariates. Electron. J. Stat. 2012, 6, 2235–2273. [Google Scholar] [CrossRef]
Lai, P.; Wang, Q.H.; Zhou, X.H. Variable selection and semiparametric efficient estimation for the heteroscedastic partially linear single-index model. Comput. Stat. Data Anal. 2014, 70, 241–256. [Google Scholar] [CrossRef]
Owen, A. Empirical likelihood ratio confidence intervals for a single function. Biometrika 1988, 75, 237–249. [Google Scholar] [CrossRef]
Owen, A. Empirical likelihood for linear models. Ann. Stat. 1991, 19, 1725–1747. [Google Scholar] [CrossRef]
Kolaczyk, E.D. Empirical likelihood for generalized linear models. Stat. Sinaca 1994, 4, 199–218. [Google Scholar]
Lu, X.W. Empirical likelihood for heteroscedastic partially linear models. J. Multivar. Anal. 2009, 100, 387–395. [Google Scholar] [CrossRef][Green Version]
Xue, L.; Zhu, L. Empirical likelihood for single-index models. J. Multivar. Anal. 2006, 97, 1295–1312. [Google Scholar] [CrossRef][Green Version]
Matsushita, Y.; Otsuke, T. Empirical likelihood for network data. J. Am. Stat. Assoc. 2023, 119, 2117–2128. [Google Scholar] [CrossRef]
Chen, S.; Peng, L.; Qin, Y. Effects of data dimension on empirical likelihood. Biometrika 2009, 96, 712–722. [Google Scholar] [CrossRef]
Tang, C.; Leng, C. Penalized high-dimensional empirical likelihood. Biometrika 2010, 97, 905–920. [Google Scholar] [CrossRef]
Leng, C.; Tang, C. Penalized empirical likelihood and growing dimensional general estimating equations. Biometrika 2012, 99, 706–716. [Google Scholar] [CrossRef]
Donoho, D.; Johnstone, I. Ideal spatial adaptation by wavelet shrinkage. Biometrika 1994, 81, 425–455. [Google Scholar] [CrossRef]
Hoerl, A.; Kennard, R. Ridge regression: Biased estimation for nonorthogonal problems. Technometrics 1970, 12, 55–67. [Google Scholar] [CrossRef]
Fan, J.Q.; Lv, J.C. Sure independence screening for ultrahigh dimensional feature space. J. R. Stat. Soc. Ser. B 2008, 70, 894–911. [Google Scholar] [CrossRef]
Hammer, S.; Katzenstein, D.; Hughes, M.; Gundaker, H.; Schooley, R.; Haubrich, R.; Henry, W.; Lederman, M.; Phair, J.; Niu, M.; et al. A trial comparing nucleoside monotherapy with combination therapy in HIV-infected adults with CD4 cell counts from 200 to 500 per cubic millimeter. N. Engl. J. Med. 1996, 335, 1081–1089. [Google Scholar] [CrossRef]
Lai, P.; Wang, Q. Semiparametric efficient estimation for partially linear single-index models with responses missing at random. J. Multivar. Anal. 2014, 128, 33–50. [Google Scholar] [CrossRef]
Zhu, L.X.; Fang, K.T. Asymptotics for kernel estimate of sliced inverse regression. Ann. Stat. 1996, 24, 1053–1068. [Google Scholar] [CrossRef]
Serfling, R.J. Approximation of Stochastic Processes; John Wiley: New York, NY, USA, 1980. [Google Scholar]
Owen, A. Empirical Likelihood; Chapman and Hall-CRC: New York, NY, USA, 2001. [Google Scholar]

Figure 1. Residual plots for the PLSIM. (a) Residual plot of PEL. (b) Residual plot of PVS.

Table 1. Variable selection results for the PEL method.

$(p, r)$	n	Kernel Function	Mean Count of Zero Coefficients
			Correct	Incorrect
(10,10)	50	Epanechnikov	(4.47 [55.9%], 3.82 [54.6%])	(2.51, 3.24)
(10,10)	50	Cosine	(4.35 [54.4%], 3.69 [52.7%])	(2.38, 3.47)
(10,10)	100	Epanechnikov	(6.93 [86.6%], 5.63 [80.4%])	(0.52, 0.76)
(10,10)	100	Cosine	(6.86 [85.8%], 5.58 [79.7%])	(0.55, 0.74)
(10,10)	200	Epanechnikov	(7.32 [91.5%], 6.27 [89.5%])	(0.34, 0.51)
(10,10)	200	Cosine	(7.29 [91.1%], 6.24 [89.1%])	(0.37, 0.58)
(20,20)	50	Epanechnikov	(9.71 [53.9%], 8.95 [52.6%])	(4.83, 5.95)
(20,20)	50	Cosine	(9.58 [53.2%], 8.79 [51.7%])	(4.92, 6.14)
(20,20)	100	Epanechnikov	(12.69 [70.5%], 11.81 [69.4%])	(3.58, 5.32)
(20,20)	100	Cosine	(12.66 [70.3%], 11.67 [68.6%])	(3.71, 5.45)
(20,20)	200	Epanechnikov	(15.81 [87.8%], 14.71 [86.5%])	(0.42, 0.87)
(20,20)	200	Cosine	(15.28 [84.9%], 14.56 [85.6%])	(0.45, 0.93)

Table 2. Estimation mean and standard deviations of the PEL estimators (the values in parentheses are the corresponding standard deviations).

$(p, r)$	n	Kernel Function	$θ_{1}$	$θ_{(p - 1)}$	$γ_{2}$	$γ_{3}$
(10,10)	50	Epanechnikov	4.35 (2.616)	2.74 (1.738)	3.91 (2.237)	−4.96 (3.521)
(10,10)	50	Cosine	4.27 (2.853)	2.59 (1.802)	4.14 (2.394)	−4.58 (3.475)
(10,10)	100	Epanechnikov	1.94 (0.128)	0.96 (0.106)	1.56 (0.132)	−1.93 (0.103)
(10,10)	100	Cosine	2.06 (0.131)	0.94 (0.114)	1.43 (0.146)	−2.09 (0.115)
(10,10)	200	Epanechnikov	1.95 (0.087)	1.03 (0.082)	1.56 (0.105)	−2.04 (0.103)
(10,10)	200	Cosine	2.02 (0.094)	1.06 (0.091)	1.43 (0.113)	−2.07 (0.107)
(20,20)	50	Epanechnikov	4.92 (3.587)	3.17 (3.264)	5.89 (4.316)	−6.83 (4.763)
(20,20)	50	Cosine	4.53 (3.951)	2.96 (3.728)	6.34 (4.512)	−6.71 (4.625)
(20,20)	100	Epanechnikov	2.97 (1.438)	1.53 (0.896)	2.16 (1.048)	−2.79 (0.951)
(20,20)	100	Cosine	3.18 (1.502)	1.61 (0.925)	2.12 (1.073)	−2.85 (1.027)
(20,20)	200	Epanechnikov	2.09 (0.135)	1.05 (0.107)	1.54 (0.139)	−1.96 (0.114)
(20,20)	200	Cosine	2.12 (0.139)	1.07 (0.119)	1.43 (0.147)	−2.08 (0.121)

Table 3. Variable selection results for the PEL and PVS methods.

$(p, r)$	n	Method	Mean Count of Zero Coefficients
			Correct	Incorrect
(20,20)	200	PEL	(15.21 [84.5%], 13.45 [84.1%])	(0.37, 1.03)
	200	PVS	(14.75 [81.9%], 13.17 [82.3%])	(0.45, 1.26)
	400	PEL	(15.89 [88.3%], 14.02 [87.6%])	(0.35, 0.85)
	400	PVS	(15.46 [85.8%], 13.74 [85.9%])	(0.41, 0.92)
(20,30)	200	PEL	(15.24 [84.6%], 22.56 [86.7%])	(0.39, 1.27)
	200	PVS	(14.98 [83.2%], 22.12 [85.1%])	(0.38, 1.35)
	400	PEL	(15.79 [87.7%], 23.31 [89.7%])	(0.34, 1.14)
	400	PVS	(15.13 [84.1%], 22.85 [87.9%])	(0.43, 1.28)
(30,20)	200	PEL	(24.38 [87.1%], 13.51 [84.4%])	(0.38, 1.24)
	200	PVS	(23.85 [85.2%], 12.97 [81.1%])	(0.44, 1.31)
	400	PEL	(25.14 [89.7%], 14.13 [88.3%])	(0.33, 1.17)
	400	PVS	(24.73 [88.3%], 13.54 [84.6%])	(0.39, 1.26)
(30,30)	200	PEL	(24.25 [86.6%], 23.36 [89.8%])	(0.40, 1.31)
	200	PVS	(23.72 [84.7%], 22.68 [87.2%])	(0.47, 1.38)
	400	PEL	(25.27 [90.0%], 23.15 [89.1%])	(0.38, 1.19)
	400	PVS	(24.54 [87.6%], 22.82 [87.7%])	(0.45, 1.25)

Table 4. Estimation mean and standard deviations of the PEL and PVS estimators (the Values in parentheses are the corresponding standard deviations).

$(p, r)$	n	Method	$θ_{1}$	$θ_{(p - 1)}$	$γ_{2}$	$γ_{3}$	$γ_{4}$
(20,20)	200	PEL	1.05 (0.132)	−1.04 (0.101)	1.06 (0.143)	2.07 (0.127)	0.95 (0.106)
	200	PVS	1.12 (0.145)	−1.08 (0.134)	0.93 (0.175)	2.11 (0.142)	1.08 (0.131)
	400	PEL	0.98 (0.102)	−1.03 (0.091)	1.04 (0.126)	1.98 (0.108)	1.06 (0.113)
	400	PVS	1.07 (0.134)	−1.06 (0.114)	0.96 (0.135)	2.08 (0.125)	1.05 (0.117)
(20,30)	200	PEL	0.94 (0.137)	−1.05 (0.109)	1.10 (0.128)	2.06 (0.121)	1.07 (0.149)
	200	PVS	1.86 (0.153)	−1.12 (0.142)	0.92 (0.136)	1.96 (0.147)	1.13 (0.151)
	400	PEL	0.96 (0.125)	−1.04 (0.098)	1.08 (0.124)	2.03 (0.119)	1.09 (0.128)
	400	PVS	1.91 (0.138)	−1.09 (0.117)	0.92 (0.225)	1.95 (0.124)	1.92 (0.137)
(30,20)	200	PEL	1.09 (0.141)	−1.13 (0.135)	1.07 (0.127)	1.94 (0.130)	1.08 (0.143)
	200	PVS	0.89 (0.159)	−1.08 (0.162)	1.09 (0.143)	1.91 (0.129)	1.13 (0.156)
	400	PEL	1.06 (0.098)	−0.96 (0.107)	1.03 (0.126)	2.05 (0.115)	1.04 (0.112)
	400	PVS	1.86 (0.116)	−1.58 (0.154)	0.94 (0.139)	1.97 (0.129)	1.07 (0.123)
(30,30)	200	PEL	1.14 (0.129)	−1.13 (0.138)	0.91 (0.142)	2.09 (0.138)	1.12 (0.128)
	200	PVS	0.79 (0.148)	−1.18 (0.145)	0.88 (0.157)	1.90 (0.141)	0.93 (0.135)
	400	PEL	1.09 (0.114)	−1.07 (0.126)	0.93 (0.119)	2.10 (0.128)	1.09 (0.119)
	400	PVS	1.15 (0.123)	−1.15 (0.131)	1.05 (0.125)	1.89 (0.133)	0.92 (0.121)

Table 5. Estimations and confidence intervals of the methods of PEL and PVS (the values in parentheses are the corresponding confidence intervals).

Variable	Method
	PEL	PVS
homo	0.193 ([0.065, 0.317])	0.131 ([0.001, 0.262])
str2	−0.322 ([−0.517, −0.016])	−0.045 ([−0.324, 0.234])
age	0.528 ([0.314, 0.735])	0.153 ([−0.118, 0.424])
CD820	−0.691 ([−0.478, −0.893])	−0.208 ([−0.413, 0.003])
cD40	0.255 ([0.009, 0.452])	0.446 ([0.134, 0.758])
cD420	0.423 ([0.126, 0.674])	0.857 ([0.536, 1.178])

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Fang, J.; Tian, Z. Statistical Inference for High-Dimensional Heteroscedastic Partially Single-Index Models. Entropy 2025, 27, 964. https://doi.org/10.3390/e27090964

AMA Style

Fang J, Tian Z. Statistical Inference for High-Dimensional Heteroscedastic Partially Single-Index Models. Entropy. 2025; 27(9):964. https://doi.org/10.3390/e27090964

Chicago/Turabian Style

Fang, Jianglin, and Zhikun Tian. 2025. "Statistical Inference for High-Dimensional Heteroscedastic Partially Single-Index Models" Entropy 27, no. 9: 964. https://doi.org/10.3390/e27090964

APA Style

Fang, J., & Tian, Z. (2025). Statistical Inference for High-Dimensional Heteroscedastic Partially Single-Index Models. Entropy, 27(9), 964. https://doi.org/10.3390/e27090964

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Statistical Inference for High-Dimensional Heteroscedastic Partially Single-Index Models

Abstract

1. Introduction

2. Penalized Empirical Likelihood for PLSIM

3. Penalized Empirical Likelihood for PLM and SIM

4. Simulations

5. Real Data Application

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Appendix A

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI