Next Article in Journal
Fundamental Solutions to Fractional Heat Conduction in Two Joint Half-Lines Under Conditions of Nonperfect Thermal Contact
Previous Article in Journal
Multi-Dimensional Quantum-like Resources from Complex Synchronized Networks
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Statistical Inference for High-Dimensional Heteroscedastic Partially Single-Index Models

College of Science, Hunan Institute of Engineering, Fuxing Road, Xiangtan 411104, China
*
Author to whom correspondence should be addressed.
Entropy 2025, 27(9), 964; https://doi.org/10.3390/e27090964
Submission received: 6 August 2025 / Revised: 2 September 2025 / Accepted: 11 September 2025 / Published: 16 September 2025
(This article belongs to the Special Issue Statistical Inference: Theory and Methods)

Abstract

In this study, we propose a novel penalized empirical likelihood approach that simultaneously performs parameter estimation and variable selection in heteroscedastic partially linear single-index models with a diverging number of parameters. It is rigorously proved that the proposed method possesses the oracle property: (i) with probability tending to 1, the zero components are consistently estimated as zero; (ii) the estimators for nonzero coefficients achieve asymptotic efficiency. Furthermore, the penalized empirical log-likelihood ratio statistic is shown to asymptotically follow a standard chi-squared distribution under the null hypothesis. This methodology can be naturally applied to pure partially linear models and single-index models in high-dimensional settings. Simulation studies and real-world data analysis are conducted to examine the properties of the presented approach.

1. Introduction

Consider the following partially linear single-index model (PLSIM)
Y i = θ X i + g ( Z i γ ) + ε i , E ( ε i | X i , Z i ) = 0 i = 1 , , n ,
where X i R p and Z i R r are covariates, g ( · ) denotes an unknown function, Y i is a response variable, θ R p and γ R r are parameter vectors, and ε i is an independent random error. Let Var ( ε i | X i , Z i ) = v ( X i , Z i ) > 0 , where the function v ( X , Z ) expresses potential heteroscedasticity. For model (1), the scale of the index parameter γ is generally not uniquely identifiable. That is, if the index parameter is multiplied by a nonzero constant while the nonparametric function is divided by the same constant, the model’s predictions remain unchanged. To ensure identifiability and avoid non-unique representations, it is standard practice to impose an identifiability constraint on the index parameter γ , for example, fixing one of its components to 1 or restricting it to have unit norm. We fix the first component of γ to 1 and denote the remaining components of Z as Z 1 . Model (1) is a combination of a partially linear model (PLM) and single-index model (SIM). As far as we know, Carroll et al. [1] introduced PLSIMs and developed a backfitting algorithm for estimation in their generalized form. Since their introduction, they have attracted much research attention, and extensive research has focused on estimating g ( · ) and the unknown parameter vectors θ and γ . For PLSIMs, Yu and Ruppert [2] showed a spline estimation approach via the penalty function; Zhu and Xue [3] proposed an empirical likelihood (EL) method; Xia and Härdle [4] introduced a semiparametric estimation procedure; Liang et al. [5] introduced an estimation method using profile least squares; etc.
In recent years, the analysis of high-dimensional data has evolved into a major frontier of statistical research, with applications spanning internet portals, hyperspectral imagery, financial applications, and high-throughput genomic data analysis within computational biology. See, e.g., Ma and Zhu [6], Fang et al. [7], Hao and Yin [8], Liu et al. [9], etc. Ma and Zhu [6] introduced efficient estimators for heteroscedastic PLSIMs allowing high-dimensional setting. While their methodology accommodated high-dimensional covariates, it restricted their dimensions to be fixed rather than allowing them to diverge as the sample size increases. Fang et al. [7] proposed EL estimators for high-dimensional PLSIMs, permitting the covariate dimension to diverge as the sample size increases.
In high-dimensional analyses, PLSIMs suffer from the inclusion of irrelevant covariates, leading to inefficient parameter estimators and reduced prediction accuracy. Pruning these non-informative variables from the sparse true model is thus a natural imperative. This has motivated the proliferation and rigorous study of various variable selection methods in contemporary statistics. Various variable selection methods have been investigated, including AIC and BIC (Breiman [10]), the LASSO penalty (Tibshirani [11]), the SCAD penalty (Fan and Li [12]), and so on. For PLSIMs, several variable selection methods were proposed. Xie and Huang [13] proposed a variable selection method for PLMs, which are special cases of PLSIMs, and proved that this method possesses the oracle property. Wang and Zhu [14] established nearly necessary and sufficient conditions for estimator consistency in SIMs under high-dimensional (“large p small n”) settings, which are special cases of PLSIMs. Zhang et al. [15] developed a method for variable selection and parameter estimation in high-dimensional PLSIMs. Lai et al. [16] studied variable selection for heteroscedastic PLSIMs by using the efficient score function. These methods, however, are largely constrained to scenarios where the covariate dimensions remain fixed.
Within nonparametric frameworks, Owen [17] developed the EL method for statistical inference. This approach retains likelihood methodology while eliminating parametric distributional assumptions. Since this method was proposed, it has been successfully extended to various circumstances, including linear models [18], generalized linear models [19], heteroscedastic PLMs [20], SIMs [21], and network data [22], among others. Chen et al. [23] demonstrated that EL remains valid when the data dimension diverges. In high-dimensional data settings, Tang and Leng [24] studied a variable selection method by penalized empirical likelihood (PEL) for linear regression models, and Leng and Tang [25] investigated a PEL method for general estimating equations. To our knowledge, applications to heteroscedastic PLSIMs have been scarcely explored, especially for variable selection in high-dimensional settings.
Empirical likelihood is a data-driven, nonparametric methodology that retains the merits of parametric likelihood while offering robustness and flexibility in incorporating auxiliary information to obtain estimates and construct confidence sets. Motivated by the PEL method for high-dimensional estimating equations in Tang and Leng [25], we aim to explore a variable selection approach using PEL for a PLSIM in a heteroscedastic high-dimensional situation, where dimensionality p and r , as n . For model (1), the PEL ratio is constructed using semiparametric efficient estimating equations, incorporating the semiparametric efficient score for the heteroscedastic PLSIM. We prove that PEL has the oracle property and excels at generating sparse models without requiring a prespecified parametric likelihood. Although existing variable selection techniques (e.g., Lai et al. [16]) also attain the oracle property, specifying a high-dimensional distribution remains theoretically challenging. Furthermore, the PEL ratio statistic satisfies Wilks’ theorem, converging to a chi-squared distribution under some regularity conditions, which facilitates hypothesis testing and produces range-respecting confidence regions. As a robust alternative to parametric likelihood ratios in high-dimensional settings, PEL combines the adaptive ability and statistical efficiency inherent in nonparametric likelihood methods, complementing the existing methods.
The rest of this article is organized as follows. Section 2 outlines methods of variable selection, parameter estimation, and asymptotic properties for high-dimensional heteroscedastic PLSIMs using PEL. Section 3 extends the method to PLMs and SIMs as special examples. In Section 4, we exhibit simulation results, and an application of the proposed method is stated in Section 5. Lemmas and technical proofs are shown in Appendix A.

2. Penalized Empirical Likelihood for PLSIM

Denote the weight function as w ( X i , Z i ) = [ E ( ε i 2 X i , Z i ) ] 1 , i = 1 , , n . Using the kernel function K h ( u ) = h 1 K ( u / h ) with bandwidth h 0 , the nonparametric estimators are defined as follows:
E ^ { w ^ ( X , Z ) | Z i γ } = i j K h 3 ( Z i γ Z j γ ) w ^ ( X i , Z i ) i j K h 3 ( Z i γ Z j γ ) , E ^ { w ^ ( X , Z ) Z 1 , i | Z i γ } = i j K h 3 ( Z i γ Z j γ ) w ^ ( X i , Z i ) Z 1 , i i j K h 3 ( Z i γ Z j γ ) , E ^ { w ^ ( X , Z ) X | Z i γ } = i j K h 3 ( Z i γ Z j γ ) w ^ ( X i , Z i ) X i i j K h 3 ( Z i γ Z j γ ) ,
w ^ ( X i , Z i ) = i j K h 2 ( η i η j ) / i j K h 2 ( η i η j ) e i 2 ,
g ^ ( Z i γ ) = i j K h 1 ( Z i γ Z j γ ) ( Y i X i θ ) / i j K h 1 ( Z i γ Z j γ )
and
g ^ ( Z i γ ) = h 1 1 { i j K h 1 ( Z i γ Z j γ ) ( Y i X i θ ) i j K h 1 ( Z i γ Z j γ ) } i j K h 1 ( Z i γ Z j γ ) ( Y i X i θ ) × i j K h 1 ( Z i γ Z j γ ) } / { i j K h 1 ( Z i γ Z j γ ) } 2 .
We propose the following estimating equations for PLSIM:
1 n i = 1 n ε ˜ i w ^ ( X i , Z i ) [ X i E ^ { w ^ ( X , Z ) X | Z i γ } E ^ { w ^ ( X , Z ) | Z i γ } ] = 0 , 1 n i = 1 n ε ˜ i w ^ ( X i , Z i ) g ^ ( Z i γ ) [ Z 1 , i E ^ { w ^ ( X , Z ) Z 1 | Z i γ } E ^ { w ^ ( X , Z ) | Z i γ } ] = 0 ,
where w ^ i w ^ ( X i , Z i ) and ε ˜ i = Y i θ X i g ^ ( Z i γ ) denotes the residual term for i = 1 , , n .
Attempting to estimate var ( ε X , Z ) or w ( X , Z ) by applying nonparametric regression to the residuals and the covariates (X, Z) poses a significant challenge, as this constitutes a high-dimensional problem that is highly susceptible to the curse of dimensionality. To simplify the estimation of w ( X , Z ) , we assume there exists a function η i = η ( X i , Z i ) satisfying var ( ε i X i , Z i ) = var ( ε i η i ) , where η i is a known low-dimensional function of the covariates ( X i , Z i ) , i = 1 , , n . For instance, η could take the form θ X , implying that the error variance depends solely on a linear combination of X. Alternatively, η could be γ Z , indicating dependence only on Z, or it could represent a combination of both or other structures. A similar assumption is also present in Ma and Zhu [6]. In practice, a reasonable approximation of η can be obtained using standard procedures for modeling heteroscedasticity, based on residuals from a preliminary model fit. It should be noted, however, that this assumption can be relaxed to incorporate intermediate multivariate frameworks, such as additive structures, thereby preserving univariate convergence rates while maintaining considerable flexibility in variance modeling.
Define
S e f f = w ε X E ( w X | Z γ ) E ( w | Z γ ) , Z 1 E ( w Z 1 | Z γ ) E ( w | Z γ ) g ( Z γ ) .
According to Ma and Zhu [6], S e f f is the semiparametric efficient score, and the estimator ( θ ^ , γ ^ ) , based on (2), is doubly robust and efficient with fixed p and r. Double robustness means that a consistent estimator of the target parameter can be obtained as long as one of the two models is correctly specified. For example, we can use an estimator g ^ ( · ) that is inconsistent for g ( · ) ; as long as the conditional expectation is consistently estimated, i.e., E ^ ( · Z γ ) converges to E ( · Z γ ) , then expression (3) will yield consistent estimators for θ and γ . The converse also holds: if g ^ ( · ) is consistent for g ( · ) , then consistency of the estimators in (3) is maintained even if the model for the conditional expectation is misspecified. However, the doubly robust and efficient property of the estimator achieved by solving (2) is no longer valid when p and r tend to infinity as n .
Our goal is, under high-dimensional sparse setting, to develop new estimation and variable selection approaches for heteroscedastic model (1) by using the PEL method. In order to construct the PEL function, we need to propose an auxiliary random vector using S e f f . Define
ξ i ( θ , γ ) = w i ε i X i E ( w i X i | Z i T γ ) E ( w i | Z i γ ) , Z 1 , i E ( w i Z 1 , i | Z i γ ) E ( w i | Z i γ ) g ( Z i γ ) .
We have E { ξ i ( θ , γ ) } = 0 for i = 1 , , n . Let q = ( q 1 q n ) satisfying i = 1 n q i = 1 , q i 0 . For ( θ , γ ) , the EL function is written as
L ( θ , γ ) = sup { i = 1 n ( n q i ) : i = 1 n q i = 1 , q i 0 , i = 1 n q i ξ i ( θ , γ ) = 0 } .
Since L ( θ , γ ) contains unknown functions, it cannot be directly used for statistical inference on ( θ , γ ) . A natural approach to solving this issue is to substitute the unknown functions in L ( θ , γ ) with their corresponding estimator provided above. For ( θ , γ ) , redefine the estimated EL function as
L ˜ ( θ , γ ) = sup { i = 1 n ( n q i ) : i = 1 n q i = 1 , q i 0 , i = 1 n q i ξ ^ i ( θ , γ ) = 0 } ,
where ε ˜ i = Y i θ X i g ^ ( Z i γ ) and
ξ ^ i ( θ , γ ) = w ^ i ε ˜ i X i E ^ ( w ^ i X i | Z i γ ) E ^ ( w ^ i | Z i γ ) , Z 1 , i E ^ ( w ^ i Z 1 , i | Z i γ ) E ^ ( w ^ i | Z i γ ) g ^ ( Z i γ ) .
Define the PEL estimator ( θ ^ , γ ^ ) as the maximizer of
log { L ˜ ( θ , γ ) } n i = 1 p p τ ( | θ i | ) n i = 1 r p ν ( | γ i | ) ,
where p τ ( t ) and p ν ( t ) are the penalty functions with tuning parameters τ and ν , respectively.
Many commonly used penalty functions have been studied, i.e., the L 1 penalty (Donoho and Johnstone [26]); L 2 penalty (Hoerl and Kennard [27]); LASSO penalty (Tibshirani [11]); and SCAD penalty (Fan and Li [12]). It is well known that the SCAD penalty has the oracle property. Therefore, in this article, we consider PEL for a heteroscedastic PLSIM by using the SCAD penalty. Its first derivative satisfies
p ν ( t ) = ν { ( a ν t ) + ( a 1 ) ν I ( t > ν ) + I ( t ν ) } ,
where I ( · ) denotes an indicator function and a is a constant with a > 2 .
Combining the Lagrange multiplier method and Equation (5), we have
q i = 1 n 1 1 + λ ξ ^ i ( θ , γ ) ,
and λ satisfies
1 n i = 1 n ξ ^ i ( θ , γ ) 1 + λ ξ ^ i ( θ , γ ) = 0 .
By substituting Equation (7) into L ˜ ( θ , γ ) , we can show that maximizing (6) corresponds to minimizing
˜ p ( θ , γ ) = 2 i = 1 n log { 1 + λ ξ ^ i ( θ , γ ) } + n i = 1 p p τ ( | θ i | ) + n i = 1 r p ν ( | γ i | ) .
Therefore, ( θ ^ T , γ ^ ) can also be defined to be the minimization of (9).
Let A 1 = { j : θ 0 j 0 } and A 2 = { j : γ 0 j 0 } , and denote the cardinalities of A 1 and A 2 as d 1 and d 2 , where θ 0 and γ 0 are the true values of θ and γ respectively. Without loss of generality, we write θ = ( θ 1 , θ 2 ) , where θ 1 R d 1 and θ 2 R p d 1 represent θ ’s nonzero and zero components, respectively. γ = ( γ 1 , γ 2 ) can be similarly partitioned, where γ 1 R d 2 and γ 2 R r d 2 . Analogously, the true parameter values θ 0 and γ 0 can be decomposed as θ 0 = ( θ 10 , 0 ) and γ 0 = ( γ 10 , 0 ) T . For notational purposes, let I p = ( H 1 , H 2 ) and I r 1 = ( H 3 , H 4 ) , where H 1 R d 1 × p , H 2 R ( p d 1 ) × p , H 3 R d 2 × ( r 1 ) , and H 4 R ( r 1 d 2 ) × ( r 1 ) .
To derive asymptotic properties of the proposed PEL estimator, the following conditions are necessary.
Condition 1. K h ( · ) is symmetric with K h ( · ) continuous on [ 1 , 1 ] .
Condition 2. The bandwidth h i , for i = 1 , 2 , 3 , satisfies the following asymptotic assumption: (1) n h 1 8 0 , n h 1 4 , h 2 = O ( 1 / n 5 ) and h 3 = O ( 1 / n 5 ) ; (2) log 2 ( n ) / ( n h i ) 0 , log 4 ( n ) / ( n h 1 h i ) 0 and h 1 4 log 2 ( n ) / h i 0 .
Condition 3. Let Var ( X i ) = Σ x i and Var ( Z i ) = Σ z i , i = 1 n . For Σ x i and Σ z i , their eigenvalues satisfy C 1 Γ 1 ( Σ x i ) Γ p ( Σ x i ) C 2 and C 1 Γ 1 ( Σ z i ) Γ r ( Σ z i ) C 2 for some constants 0 < C 1 < C 2 , i = 1 n . In addition, E ( ε 4 + δ | X , Z ) < , where δ is a positive constant.
Condition 4. Let v ( · ) and η = η ( X , Z ) satisfy E ( ε 2 | X , Z ) = v ( η ) , where 0 < C 1 < v ( · ) < C 2 < , and C 1 and C 2 are positive constants. Moreover, Var ( X i | η ( X i , Z i ) ) is positive definite with a bounded spectrum.
Condition 5. There exist v 1 ( X , Z ) values satisfying
2 E ( X | Z γ ) γ i γ j , 2 E ( Z | Z γ ) γ i γ j , 2 E ( w | Z γ ) γ i γ j , 2 E ( w Z | Z γ ) γ i γ j ,
2 E ( w X | Z γ ) γ i γ j < v 1 ( X , Z ) , E v 1 2 < , ( i , j = 1 , , p ) .
There also exist v 2 ( X , Z ) values such that
3 η ( X , Z ) π i π j π k < v 2 ( X , Z ) , E v 2 2 < ,
where ( X , Z ) = ( π 1 π p + r ) and i , j , k = 1 , , p + r . There exist v 3 ( X , Z ) values satisfying
4 g ( Z γ ) γ i γ j γ k γ l , 4 v ( η ) η i 1 η j 1 η k 1 η l 1 < v 3 ( X , Z ) , E v 3 2 < ,
where η R p 1 , i , j , k , l = 1 , , p and i 1 , j 1 , k 1 , l 1 = 1 p 1 .
Condition 6. Assume η and Z γ possess densities, denoted by f η ( η ) and f Z γ ( Z γ ) , respectively, which are bounded away from zero and infinity. There exist v 4 ( X , Z ) values such that
2 f Z γ ( Z γ ) ) γ i γ j , 2 f η ( η ) η k η l < v 4 ( X , Z ) , E v 4 2 < , ( i , j = 1 p ; k , l = 1 p 1 ) .
Condition 7. As n , we assume p and p / n 5 0 , and r and r / n 5 0 .
Condition 8. All random elements X, ε , ε X , and ε Z have finite fourth moments.
Condition 9. Define
ξ n ( θ , γ ) = w ε X E ( w X | Z γ ) E ( w | Z i γ ) , Z E ( w Z | Z γ ) E ( w | Z γ ) g ( Z γ ) .
As n , we assume the following moments are uniformly bounded by a positive constant C: E ( ξ n ( θ , γ ) / p 4 ) < C , E ( Z X 4 ) < C , E ( X X 4 ) < C . Furthermore, we assume E ( X Z 4 ) < .
Condition 10. As n , τ ( p / n ) 1 2 , ν ( r / n ) 1 / 2 , min j A 1 θ 0 j / τ , and min j A 2 γ 0 j / ν .
Condition 11. Assume max j A 1 P τ ( | θ 0 j | ) = o { ( n p ) 1 / 2 } , max j A 2 P ν ( | γ 0 j | ) = o { ( n r ) 1 / 2 } , max j A 1 P τ ( | θ 0 j | ) = o { ( p ) 1 / 2 } , and max j A 2 P ν ( | γ 0 j | ) = o { ( r ) 1 / 2 } .
Conditions 1–6 support the existence of estimators ( θ ^ T , γ ^ ) . These conditions also ensure that the functions w ( X , Z ) , g ( Z γ ) , and g ( Z γ ) and the conditional expectations E { w ^ ( X , Z ) Z 1 , i | Z i γ } , E { w ^ ( X , Z ) X | Z i γ } , and E { w ^ ( X , Z ) | Z i γ } can be estimated with maintained precision. Moreover, these conditions also guarantee that nonparametric estimation does not alter the asymptotic behavior of the empirical likelihood ratio. As a result, the estimated PEL ratio L ˜ ( θ , γ ) converges to the same asymptotic distribution as the standard PEL ratio L ( θ , γ ) . Conditions 1–6 were also used by Ma and Zhu [6] as sufficient conditions for the double-robustness property of the estimators. Condition 7 serves as a technical requirement. Since determining the minimum upper bound is quite challenging, this condition is necessary, and the resulting bounds in the stochastic analysis are conservative. Condition 8 guarantees that the asymptotic variance exists for the estimator of the increasing-dimensional parameters ( θ T , γ ) . Condition 9 restricts the tail probability behavior of the estimating equation. Condition 10 requires that the weakest signal remains stronger than the penalty parameter, and Condition 11 helps limit the influence of the penalty on the nonzero components. Conditions 10 and 11 are satisfied by a range of penalty functions, including those discussed in Fan and Li [12]. Due to the considerable theoretical challenges in establishing asymptotic properties for PEL methods in the context of diverging covariate dimensions, these conditions are intentionally stringent rather than minimally sufficient, and the resulting stochastic bounds are conservative.
In the following theorem, we will show the theoretical properties of the PEL estimator ( θ ^ , γ ^ ) .
Theorem 1.
As n , under Conditions 1–11, we have
(1) 
θ ^ 2 = 0 and γ ^ 2 = 0 , with probability tending to 1;
(2) 
n B I B 1 / 2 { ( θ ^ 1 , γ ^ 1 ) ( θ 10 , γ 10 ) } L N ( 0 , G ) , where B R ( q 1 + q 2 ) × ( p + r 1 ) , B B G and G is a ( q 1 + q 2 ) × ( q 1 + q 2 ) matrix with fixed q 1 and q 2 ,
H 0 = H 1 0 0 H 3 ,   H = H 2 0 0 H 4 ,
B = B 1 0 0 B 2 , V = V 11 V 12 V 21 V 22 ,   U = U 11 U 12 U 21 U 22
V 11 = E w X X T E ( w X | Z T γ ) E ( w X T | Z T γ ) E ( w | Z T γ ) ,
V 12 = E g ( Z T γ ) w X Z 1 T E ( w X | Z T γ ) E ( w Z 1 T | Z T γ ) E ( w | Z T γ ) ,
V 21 = E g ( Z T γ ) w Z 1 X T E ( w Z 1 | Z T γ ) E ( w X T | Z T γ ) E ( w | Z T γ ) ,
V 22 = E g ( Z T γ ) 2 w Z 1 Z 1 T E ( w Z 1 | Z T γ ) E ( w Z 1 T | Z T γ ) E ( w | Z T γ ) ,
U 11 = E ξ i ( θ , γ ) θ V 1 ξ i ( θ , γ ) θ , U 12 = E ξ i ( θ , γ ) θ V 1 ξ i ( θ , γ ) γ ,
U 21 = E ξ i ( θ , γ ) γ V 1 ξ i ( θ , γ ) θ , U 12 = E ξ i ( θ , γ ) γ V 1 ξ i ( θ , γ ) γ ,
I B = H 0 U 1 V U 1 H 0 H 0 U 1 H ( H U 1 H ) 1 H 2 U 1 V A 1 H 2 ( H U 1 H ) 1 H U 1 H 0 , and L denotes convergence in distribution.
In Theorem 1, B projects the diverging dimensional parameter vector ( θ 1 , γ 1 ) onto a fixed ( q 1 + q 2 ) -dimensional subspace.
Remark 1.
Theorem 1 proves that the proposed estimator satisfies the oracle property. Specifically, the components of θ 20 and γ 20 are estimated as zero, and the PEL estimator of the nonzero components θ 10 and γ 10 is efficient, with probability tending to 1.
Next, we will describe the construction of confidence regions and hypothesis testing for ( θ , γ ) using the PEL method. Consider testing the following null and alternative hypotheses:
H 0 : L n ( θ 0 , γ 0 ) = 0 vs . H 1 : L n ( θ 0 , γ 0 ) 0 ,
where L n R ( q 1 + q 2 ) × ( p + r 1 ) satisfies that, for the fixed q 1 and q 2 ,
L n = L n 1 0 0 L n 2 , L n L n = I q 1 0 0 I q 2 ,
L n 1 R q 1 × p , L n 2 R q 2 × r 1 , and I q 1 and I q 2 are the q 1 -dimensional and q 2 -dimensional identity matrixes, respectively. Not only can we use this type of hypothesis to test the hypothesis for individual and multiple components of ( θ 0 , γ 0 ) , but we can also use it to test the hypotheses about linear functions of ( θ 0 , γ 0 ) .
Similar to the EL ratio for the PLSIM in [3], we can construct the PEL ratio statistic as
˜ ( L n ) = { ˜ p ( θ ^ , γ ^ ) min ( θ , γ ) : L n ( θ , γ ) = 0 ˜ p ( θ , γ ) } .
The following theorem shows the properties of the PEL ratio statistic for model (1).
Theorem 2.
As n , under the null hypothesis and Conditions 1–11, we have
˜ ( L n ) L χ q 1 + q 2 2 .
The standard PEL ratio, under some regular conditions, converges in law to a chi-square distribution. This is one of the most important properties of the PEL method, and similar conclusions can be found in Fang et al. [7] and Tang and Leng [24], among others. Theorem 2 shows that, under Conditions 1–11, the estimated PEL ratio L ˜ ( θ , γ ) converges to the same asymptotic distribution as the standard PEL ratio L ( θ , γ ) . This result provides a convenient approach for testing hypotheses and constructing data-driven confidence regions without any shape constraints. Combined with the oracle property of the PEL method established in Theorem 1, these findings demonstrate the robustness and efficiency of the PEL method for PLSIMs.
Confidence regions for ( θ , γ ) can be constructed using Theorem 2; that is,
I α = { ( θ , γ ) : { ˜ p ( θ ^ , γ ^ ) min ( θ , γ ) : L n ( θ , γ ) = 0 ˜ p ( θ , γ ) } χ q 1 + q 2 , ( 1 α ) 2 } ,
where χ q 1 + q 2 , ( 1 α ) 2 is the 1 α quantile of the χ q 1 + q 2 2 distribution with q 1 + q 2 degrees of freedom. Here I α provides an asymptotic confidence region for ( θ , γ ) with confidence level 1 α , i.e., as n , P ( L n ( θ , γ ) I α ) 1 α .
Remark 2.
In this article, we develop a PEL method for simultaneous variable selection and parameter estimation in high-dimensional sparse heteroscedastic PLSIMs, and this requires n > p and n > r . When this is violated in practice, one can first adopt SIS, proposed by Fan and Lv [28], to reduce the dimensionality to a moderate level below sample size.

3. Penalized Empirical Likelihood for PLM and SIM

For two special cases of model (1), we develop PEL estimators for parameters θ and γ . If γ = 1 , model (1) reduces to a heteroscedastic PLM, and it can be written as
Y i = θ X i + g ( Z i ) + ε i , f o r i = 1 , , n .
Consider the PEL method for the high-dimensional PLM. Redefine the EL function for θ and ξ ^ 1 i ( θ ) as
L ˜ ( θ ) = sup { i = 1 n ( n q i ) : i = 1 n q i = 1 , q i 0 , i = 1 n q i ξ ^ 1 i ( θ ) = 0 } ,
and
ξ ^ 1 i ( θ ) = w ^ i Y i X i T θ g ^ ( Z i ) X i E ^ ( w ^ i X i | Z i ) E ^ ( w ^ i | Z i ) , i = 1 , , n .
The PEL function for the model (12) can be written as
˜ p ( θ ) = 2 i = 1 n log { 1 + λ ξ ^ 1 i ( θ ) } + n i = 1 p p τ ( | θ i | ) .
We state the similar results as follows.
Corollary 1.
As n , under Conditions 1–11, we have
(1) 
θ ^ 2 = 0 , with probability tending to 1;
(2) 
n B 1 I 1 / 2 { ( θ ^ 1 ( θ 10 ) } L N ( 0 , G ) , where B 1 R q 1 × p , B 1 B 1 G 1 , G 1 is a q 1 × q 1 matrix with fixed q 1 and L stands for convergence in distribution.
Consider testing the following null and alternative hypotheses:
H 0 : L n 1 θ 0 = 0 vs . H 1 : L n 1 θ 0 0 ,
where L n 1 R q 1 × p satisfies that, for the fixed q 1 , L n 1 L n 1 = I q 1 , where I q 1 is a q 1 -dimensional identity matrix. The PEL ratio statistic can be constructed as follows:
˜ ( L n 1 ) = { ˜ p ( θ ^ ) min θ : L n 1 θ = 0 ˜ p ( θ ) } .
Corollary 2.
As n , under the null hypothesis and Conditions 1–11, we have
˜ ( L n 1 ) L χ q 1 2 .
Next, we consider the following SIM with a diverging number of parameters, which is another special case of model (1).
Y i = g ( Z i γ ) + ε i , E ( ε i | X i , Z i ) = 0 i = 1 , , n .
Redefine ξ ^ 2 i ( γ ) as
ξ ^ 2 i ( γ ) = w ^ i ( Y i g ^ ( Z i γ ) ) Z 1 , i E ^ ( w ^ i Z 1 , i | Z i γ ) E ^ ( w ^ i | Z i γ ) g ^ ( Z i γ ) , i = 1 , , n ,
and rewrite the PEL ratio (9) as
˜ p ( γ ) = 2 i = 1 n log { 1 + λ ξ ^ 2 i ( γ ) } + n i = 1 r p ν ( | γ i | ) .
Corollary 3.
As n , nnder Conditions 1–11, we have
(1) 
γ ^ 2 = 0 , with probability tending to 1;
(2) 
n B 2 I 1 / 2 { ( γ ^ 1 ( γ 10 ) } L N ( 0 , G ) , where B 2 R q 2 × r 1 , B 2 B 2 G 2 , and G 2 is a q 2 × q 2 matrix with fixed q 2 .
Consider testing the following null and alternative hypotheses:
H 0 : L n 2 γ 0 = 0 vs . H 1 : L n 2 γ 0 0 ,
where L n 2 R q 2 × r satisfies that, for the fixed q 2 , L n 2 L n 2 = I q 2 and I q 2 is a q 2 -dimensional identity matrix. The PEL ratio statistic for γ can be written as follows:
˜ ( L n 2 ) = { ˜ p ( γ ^ ) min γ : L n 2 γ = 0 ˜ p ( γ ) } .
Corollary 4.
As n , under the null hypothesis and Conditions 1–11, we have
˜ ( L n 2 ) L χ q 2 2 .

4. Simulations

First, we describe how to solve the optimization problems introduced by the PEL. The minimizer of the PEL ratio is obtained through the local quadratic approximation algorithm. For the PEL estimator, we minimize (9) using a nested optimization algorithm. The following steps outline the nested algorithm for calculating the PEL estimator by minimizing Equation (9).
  • Step 1: We use the estimation procedure (a relatively simple but inefficient estimation method) described in Section 2 of Ma and Zhu [6] to obtain an initial estimator ( θ 0 , γ 0 ) .
  • Step 2: Obtain g ^ ( Z i γ ) , g ^ ( Z i γ ) , w ^ ( X i , Z i ) , E ^ { w ^ ( X , Z ) | Z i γ } , E ^ { w ^ ( X , Z ) X | Z i γ } , and E ^ { w ^ ( X , Z ) Z 1 | Z i γ } described above using fixed values of ( θ , γ ) .
  • Step 3: Obtain the auxiliary random vector ξ ^ i ( θ , γ ) .
  • Step 4: Use Newton’s method to minimize (9) with respect to λ for fixed values of ( θ , γ ) .
  • Step 5: Use the local quadratic approximation algorithm to minimize (9) with respect to ( θ , γ ) for fixed values of λ obtained from Step 4.
  • Step 6: Iterate Steps 4 and 5 until ( θ , γ ) converges.
Assume that ( θ 0 , γ 0 ) is an initial value of ( θ , γ ) , and θ j ( k ) and γ l ( k ) are the k-th step estimators of θ j and γ l , respectively. When θ j ( k ) ( | θ j ( k ) | < ς ) or γ l ( k ) (where | γ l ( k ) |   < ς ) are very close to 0, we set θ ^ j ( k ) = 0 or γ ^ l ( k ) = 0 , where ς is a predefined small positive tolerance. If θ j ( k ) 0 , p τ ( | θ j | ) can be locally approximated by p τ ( | θ j ( k ) | ) + 1 2 { p τ ( | θ j ( k ) | ) / | θ j ( k ) | } { θ j 2 ( θ j ( k ) ) 2 } . Similarly, we can use p ν ( | γ l ( k ) | ) + 1 2 { p ν ( | γ l ( k ) | ) / | γ l ( k ) | } { γ l 2 ( γ l ( k ) ) 2 } to approximate p ν ( | γ l | ) when γ l ( k ) 0 . These procedures are repeated until ( θ ( j + 1 ) , γ ( j + 1 ) ) ( θ j , γ j )   < ς 1 , where ς 1 is a very small positive number.
Next, we present simulation studies to illustrate the properties of the PEL inference for a heteroscedastic PLSIM.
Example 1.
For the PLSIM, we generated X 1 from a Poisson distribution with parameter 2, X p from a binomial distribution with a success probability 0.6, X j from the uniform distribution U ( 0 , 1 ) for j = 2 , , ( p 1 ) , and Z k from the normal distribution with mean 0 and variance 1. Using ( X , Z ) , we generated responses from Y N ( X T θ + exp ( Z T γ ) , Var ( Y ) = | Z T γ | ) . Let θ = ( 2 , , 1 , 0 ) and γ = ( 1 , 1.5 , 2 , , 0 ) . We consider dimensions p = 10 , 20 and r = 10 , 20 , and sample sizes n = 50 , 100 , and 200, respectively. We applied the cross-validation method to select the penalty parameters τ and ν. In order to compare the influence of the kernel function, we consider using the cosine kernel, defined as π 4 cos π t 2 · I ( t 1 ) , and the Epanechnikov kernel, given by K ( t ) = 3 4 ( 1 t 2 ) + , respectively. In accordance with Condition 2, the bandwidth was set to n 1 / 5 , resulting in h 0.45 at n = 50 , h 0.4 at n = 100 , and h 0.35 at n = 200 . Furthermore, to examine the robustness of the bandwidth selection, a grid search algorithm was also employed. For each of these settings, we repeated the simulation 500 times, and the results are reported in Table 1 and Table 2.
From Table 1 and Table 2, we observe that (1) for fixed p and n, as the sample size increases, the accuracy of variable selection improves and the standard deviation of the estimation decreases; (2) the choice of kernel function has a relatively minor influence on the results. Overall, the Epanechnikov kernel performs slightly better in estimation than the cosine kernel.
Example 2.
To consider the performance of the presented method for dependent covariates, we generated predictors by ( X , Z ) T N ( 0 , Σ ) , where σ i j = 0 . 3 | i j | , and generated responses by Y N ( X T θ + exp ( Z T γ ) , Var ( Y ) = | Z T γ | ) . Let θ = ( 1 , , 1 , 0 ) and γ = ( 1 , 1 , 2 , 1 , , 0 ) . We consider dimensions p = 20 , 30 and r = 20 , 30 , and sample sizes of n = 200 , 400 , respectively. We applied the Epanechnikov kernel functions K ( t ) = 3 4 ( 1 t 2 ) + , and applied the cross-validation method to select the penalty parameters τ and ν. According to Condition 2, the bandwidth was set to be n 1 / 5 , which means that h 0.35 when n = 200 , and h 0.3 when n = 400 . Because Lai et al. [16] also studied parameter estimation and variable selection for a heteroscedastic PLSIM, we computed their estimator (PVS) in this simulation study for the purpose of comparison. For each of these settings, we repeated the simulation 500 times, and the results are reported in Table 3 and Table 4.
From Table 3 and Table 4, it can be observed that (1) both estimators (PEL and PVS) yield estimates close to the true parameter values, with PEL exhibiting slightly smaller standard deviations than PVS; (2) in terms of variable selection, PEL produces, on average, fewer false zeros than PVS. Furthermore, the PEL method is a nonparametric methodology that retains the advantages of parametric likelihood while possessing double robustness. In contrast, the PVS method is a semiparametric efficient method and does not possess double robustness. For instance, when the model is misspecified, the performance of the PVS estimator is adversely affected. Thus, the proposed PEL method demonstrates favourable performance and outperforms the PVS method.

5. Real Data Application

We will demonstrate the proposed methodology through application of a PLSIM to the AIDS Clinical Trials Group Protocol 175 (ACTG175) dataset (Hammer et al. [29]; https://www.nejm.org/doi/full/10.1056/NEJM199610103351501#tab-contributors (accessed on 5 August 2025)), previously examined by Lai and Wang [30]. The CD4 glycoprotein functions as an essential T-cell receptor (TCR) coreceptor that facilitates antigen-presenting cell interactions, establishing CD4 cell count as the primary immunological endpoint for comparing antiretroviral treatment effects during predefined observation periods in HIV clinical research. ACTG175 evaluates four distinct antiretroviral regimens: didanosine (ddI), zidovudine (ZDV) monotherapy, ZDV+ddI, and ZDV+zalcitabine, utilizing a balanced randomization design to assign 2138 eligible participants across therapeutic arms. The trial results demonstrate that structured antiretroviral interventions effectively reduce disease progression risks among clinically asymptomatic individuals with intermediate-stage HIV infection.
We aimed to construct a PLSIM to analyze subject responses under zidovudine (ZDV) monotherapy. Our analysis utilizes a curated subset of the ACTG175 cohort comprising 320 patients with complete CD4 endpoint data, derived from an initial pool of 521 subjects exhibiting baseline CD4 counts between 200–500 cells/mm3. The response variable Y (CD496) quantifies CD4 cell counts at 96 ± 5 weeks post-treatment. Predictors include the following:
Linear component: Discrete covariates x 1 (drugs (history of IV drug use (0 = no, 1 = yes))), x 2 (str2 (antiretroviral history (0 = naive, 1 = experienced))), x 3 (gender (0 = F, 1 = M)), x 4 (symptom (symptomatic indicator (0 = asymp, 1 = symp))), x 5 (race (0 = White, 1 = non-white)), x 6 (hemo (hemophilia (0 = no, 1 = yes))), x 7 (homo (homosexual activity (0 = no, 1 = yes))), and x 8 (karnof (Karnofsky score (on a scale of 0–100))).
Single-index component: Continuous covariates z 1 (CD80 (baseline CD8 count)), z 2 (CD820 (CD8 count at 20 ± 5 weeks)), z 3 (CD420 (CD4 count at 20 ± 5 weeks)), z 4 (CD40 (baseline CD4 count)), z 5 (wtkg (weight)), and z 6 (age (age (yrs) at baseline)).
After standardizing Y, we specify the following heteroscedastic PLSIM:
Y = θ X + g ( Z γ ) + ε ,
where X = ( x 1 , , x 8 ) and Z = ( z 1 , , z 6 ) . The test for heteroscedasticity confirms that the model is homoscedastic. We applied the Epanechnikov kernel function in this data analysis. According to Condition 2, the bandwidth was set to n 1 / 5 , which means that h 0.32 . We required γ = ( γ 1 , , γ 6 ) to have unit length to ensure identifiability. We compared our results to the PVS method in [16], and the results are summarized in Table 5, with residual plots shown in Figure 1. The residual sums of squares estimated by PEL and PVS were 186 and 194, respectively. From these results, homosexuality exhibits a significant positive linear association with Y. The estimated coefficient for antiretroviral therapy (str2) is negative, indicating a beneficial effect of this treatment for asymptomatic patients with HIV. CD820 shows a negative nonlinear relationship with Y, while age, CD40, and CD420 are positively associated with Y. These factors play important roles in antiretroviral regimens. Moreover, both methods yield similar results, but the approach in [16] produces a slightly larger standard error.

Author Contributions

J.F. proposed the original research problem and Z.T. generated the associated numerical computations. All authors have read and agreed to the published version of the manuscript.

Funding

This work is supported by Provincial Natural Science Foundation of Hunan (Grant No. 2023JJ30187) and the Scientific Research Fund of Hunan Provincial Education Department (Grant No. 24A0518).

Data Availability Statement

Data are contained within the article.

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A

Let D n = { ( θ , γ ) : ( θ , γ ) ( θ 0 , γ 0 ) c a n } with a positive constant C, a n = O p { ( p / n ) 1 / 2 } , and let A = { t r ( A A ) } 1 2 denote the Frobenius norm for A.
We present some lemmas before proving Theorems as follows.
Lemma A1.
Let θ ˜ θ = O p ( a n ) and γ ˜ γ = O p ( a n ) . Under Conditions 1–9, we have
sup η R | w ^ ( η ) w ( η ) | = O p h 2 2 + log n ( p / n ) 1 / 2 h 2 1 / 2 ,
where
w ^ ( η ) = i = 1 n K h 2 ( η i η ) / i = 1 n K h 2 ( η i η ) e i 2 ,
e i = Y i θ ˜ X i g ˜ ( Z i γ ˜ ) = ( θ θ ˜ ) X i + g ˜ ( Z i γ ) g ˜ ( Z i γ ˜ ) + g ( Z i γ ) g ˜ ( Z i γ ) + ε i
and
g ˜ ( Z i γ ) = ( n 1 ) 1 j i K h ( Z j γ Z i γ ) Y i .
Proof of Lemma A1.
According to Lemma 3.1 and Lemma 3.3 in Zhu and Fang [31], we have
sup ( Z i γ ) R g ( Z i γ ) g ˜ ( Z i γ ) = O p h 2 2 + ( p / n ) 1 / 2 h 2 1 / 2 log n
and
sup η R i = 1 n K h 2 ( η i η ) ε i 2 i = 1 n K h 2 ( η i η ) 1 w ( η ) = O p h 2 2 + ( p / n ) 1 / 2 h 2 1 / 2 log n .
Combining (20) and (21), we can obtain that
sup η R i = 1 n K h 2 ( η i η ) g ( Z i γ ) g ˜ ( Z i γ ) 2 / i = 1 n K h 2 ( η i η ) = O p h 2 4 + p log 2 n ( n h 2 ) .
Because of θ ˜ θ = O p ( a n ) and E ( X i 2 | η i ) < , we have
sup η R i = 1 n K h 2 ( η i η ) X i ( θ ˜ θ ) 2 / i = 1 n K h 2 ( η i η ) = O p ( p / n ) .
Using Lemma 1 of Ma and Zhu [6], we obtain that
sup Z i R p sup { γ ˜ : γ ˜ γ a n } g ˜ ( Z i γ ) g ˜ ( Z i γ ˜ ) E g ˜ ( Z i γ ) g ˜ ( Z i γ ˜ ) = o p ( a n ) .
By (24) and γ ˜ γ = O p ( a n ) , together with Taylor’s expansion, we can show that
sup η R i = 1 n K h 2 ( η i η ) g ˜ ( Z i γ ) g ˜ ( Z i γ ˜ ) 2 / i = 1 n K h 2 ( η i η ) = O p h 2 4 + p log 2 n ( n h 2 ) .
Obviously, n 1 / 2 i = 1 n g ˜ ( X i ) ε i can be written as
n 1 / 2 i = 1 n g ˜ ( X i ) ε i = n 3 / 2 i j n K h ( X i X j ) ( ε i Y j ε j Y i ) .
Let r ( X ) = E ( Y | X ) . According to Serfling [32], n 3 / 2 i j n K h ( X i X j ) ( ε i Y j ε j Y i ) can be approximated by its projection, which means that
n 1 / 2 i = 1 n g ˜ ( X i ) ε i n 1 / 2 i = 1 n ε i E K h ( X i X j ) r ( X i ) | X i = o p ( n 1 p log ( n ) ) .
Therefore, we have
n 1 / 2 i = 1 n ε i g ˜ ( X i ) r ( X i ) f ( X i ) = o p ( n 1 p log ( n ) ) ,
where f ( X ) is the density function of X. By combining (20)–(26), we have
sup η R | w ^ ( η ) w ( η ) | = O p h 2 2 + log n ( p / n ) 1 / 2 h 2 1 / 2 .
Thus, Lemma A1 holds. □
Lemma A2.
Under Conditions 1–9, we have
(1) 
1 n i = 1 n ξ ^ i ( θ , γ ) = 1 n i = 1 n ξ i ( θ , γ ) + o p ( 1 ) ;
(2) 
1 n i = 1 n ξ ^ i ( θ , γ ) ξ ^ i ( θ , γ ) = 1 n i = 1 n ξ i ( θ , γ ) ξ i ( θ , γ ) + o p ( 1 ) .
Proof of Lemma A2.
We first expand the expression as follows:
w i ε i g ( Z i γ ) Z i E ( w i Z i | Z i γ ) E ( w i | Z i γ ) w ^ i ε ˜ i g ^ ( Z i γ ) Z i E ^ ( w ^ i Z i | Z i γ ) E ^ ( w ^ i | Z i γ ) = ε i ( w i w ^ i ) g ( Z i γ ) Z i E ( w i Z i | Z i γ ) E ( w i | Z i γ ) + w ^ i ε i g ( Z i γ ) g ^ ( Z i γ ) Z i E ( w i Z i | Z i γ ) E ( w i | Z i γ ) w ^ i g ( Z i γ ) g ^ ( Z i γ ) g ( Z i γ ) Z i E ( w i Z i | Z i γ ) E ( w i | Z i γ ) w ^ i g ^ ( Z i γ ) g ( Z i γ ) g ( Z i γ ) g ^ ( Z i γ ) Z i E ( w i Z i | Z i γ ) E ( w i | Z i γ ) + w ^ i ε i g ^ ( Z i γ ) E ^ ( w ^ i Z i | Z i γ ) E ^ ( w ^ i | Z i γ ) E ( w i Z i | Z i γ ) E ( w i | Z i γ ) w ^ i g ^ ( Z i γ ) g ^ ( Z i γ ) g ( Z i γ ) E ^ ( w ^ i Z i | Z i γ ) E ^ ( w ^ i | Z i γ ) E ( w i Z i | Z i γ ) E ( w i | Z i γ ) = 1 n i = 1 n A 1 i + i = 1 n A 2 i + i = 1 n A 3 i + i = 1 n A 4 i + i = 1 n A 5 i + i = 1 n A 6 i ,
and
w i ε i X i E ( w i Z i | Z i γ ) E ( w i | Z i γ ) w ^ i ε ˜ i X i E ^ ( w ^ i X i | Z i γ ) E ^ ( w ^ i | Z i γ ) = ε i ( w i w ^ i ) X i E ( w i Z i | Z i γ ) E ( w i | Z i γ ) + w ^ i g ^ ( Z i γ ) g ( Z i γ ) X i E ( w i X i | Z i γ ) E ( w i | Z i γ ) + w ^ i ε i E ^ ( w ^ i X i | Z i γ ) E ^ ( w ^ i | Z i γ ) E ( w i Z i | Z i γ ) E ( w i | Z i γ ) + w ^ i E ^ ( w ^ i X i | Z i γ ) E ^ ( w ^ i | Z i γ ) E ( w i Z i | Z i γ ) E ( w i | Z i γ ) g ( Z i γ ) g ^ ( Z i γ ) = 1 n i = 1 n A 7 i + i = 1 n A 8 i + i = 1 n A 9 i + i = 1 n A 10 i .
By Lemma A1, we can obtain
i = 1 n A 1 i i = 1 n ( w i w ^ i ) ε i g ( Z i γ ) Z i E ( w i Z i | Z i γ ) E ( w i | Z i γ ) = o p ( n ) ,
and
i = 1 n A 7 i = i = 1 n ( w i w ^ i ) ε i X i E ( w i Z i | Z i γ ) E ( w i | Z i γ ) = o p ( n ) .
Write ε i = w i g ( Z i γ ) Z i E ( w i Z i | Z i γ ) E ( w i | Z i γ ) . It implies that E ( ε i | Z i γ ) = 0 . According to Lemma A1 and (26), we have
i = 1 n w ^ i w i g ^ ( Z i γ ) g ( Z i γ ) g ( Z i γ ) Z i E ( w i Z i | Z i γ ) E ( w i | Z i γ ) sup 1 i n w ^ i w i i = 1 n g ^ ( Z i γ ) g ( Z i γ ) g ( Z i γ ) Z i E ( w i Z i | Z i γ ) E ( w i | Z i γ ) = o p ( n ) ,
and
i = 1 n w i g ( Z i γ ) g ^ ( Z i γ ) g ( Z i γ ) Z i E ( w i Z i | Z i γ ) E ( w i | Z i γ ) = o p ( n ) .
Thus, i = 1 n A 2 i = o p ( n ) . Similarly, we let ε i = w i Z i E ( w i Z i | Z i γ ) E ( w i | Z i γ ) . We can obtain that E ε i | Z i γ = 0 . According to Lemma A1 and (26) again, we have
i = 1 n A 8 i   i = 1 n w ^ i w i g ^ ( Z i γ ) g ( Z i γ ) X i E ( w ^ i X i | Z i γ ) E ( w ^ i | Z i γ ) + i = 1 n g ^ ( Z i γ ) g ( Z i γ ) w i X i E ( w ^ i X i | Z i γ ) E ( w ^ i | Z i γ ) = o p ( n ) .
Similarly to Lemma 3 in Ma and Zhu [6], we can show that,
sup X n 1 i = 1 n K h ( X i X ) Y i r ( X ) f ( X ) = O p h 2 + log n ( p / n ) 1 / 2 h 1 / 2 .
According to Lemma A1, together with (29), we can obtain
g ^ ( Z i γ ) g ( Z i γ ) = o p ( 1 ) ,
E ^ w ^ ( X , Z ) Z | γ Z i E w ( X , Z ) Z | γ Z i = o p ( 1 ) ,
E ^ w ^ ( X , Z ) | γ Z i E w ( X , Z ) | γ Z i = o p ( 1 ) ,
and
E ^ w ^ ( X , Z ) X | γ Z i E w ( X , Z ) X | γ Z i = o p ( 1 ) .
Similarly, to prove i = 1 n A 2 i = o p ( n ) , together with (30)–(33), we have i = 1 n A 4 i = o p ( n ) , i = 1 n A 6 i = o p ( n ) , and i = 1 n A 10 i = o p ( n ) . Applying (26), together with (31), it is easy to show that i = 1 n A 3 i = o p ( n ) . In addition, by combining (28)–(33) and using Lemma A1, we can show that i = 1 n A 5 i = o p ( n ) and i = 1 n A 9 i = o p ( n ) .
1 n i = 1 n ξ ^ i ( θ , γ ) i = 1 n ξ i ( θ , γ ) = 1 n i = 1 n A 1 i + A 2 i + A 3 i + A 4 i + A 5 i + A 6 i A 7 i + A 8 i + A 9 i + A 10 i .
Therefore, we have
1 n i = 1 n ξ ^ i ( θ , γ ) = 1 n i = 1 n ξ i ( θ , γ ) + o p ( 1 ) .
Next, we will prove the second part of Lemma A2. By Conditions 8 and 9, ϵ > 0 ,
P { max 1 i n ξ i ( θ , γ )   n 1 / 4 p ϵ } i = 1 n P { ξ i ( θ , γ )   n 1 / 4 p ϵ } 1 n p 2 ϵ 4 i = 1 n E ξ i ( θ , γ ) 4 = 1 ϵ k E ξ 1 ( θ , γ ) / p 4 .
By Cauchy–Schwarz inequality, we gain that ξ 1 ( θ , γ ) / p 4 1 / ( p + r ) l = 1 p + r | ξ 1 l ( θ , γ ) | 4 , where ξ 1 l ( θ , γ ) is the l-th component of ξ 1 ( θ , γ ) . This implies that
max 1 i n ξ i ( θ , γ ) = o p ( n 1 / 4 p ) .
1 n i = 1 n ξ ^ i ( θ , γ ) ξ ^ i ( θ , γ ) = 1 n i = 1 n ξ i ( θ , γ ) ξ i ( θ , γ ) + 1 n i = 1 n ξ i ( θ , γ ) A 1 i + + A 6 i A 7 i + + A 10 i + 1 n i = 1 n A 1 i + + A 6 i A 7 i + + A 10 i ξ i ( θ , γ ) + 1 n i = 1 n A 1 i + + A 6 i A 7 i + + A 10 i A 1 i + + A 6 i A 7 i + + A 10 i .
According to (34) and Condition 7, we can obtain 1 / n ξ i ( θ , γ ) = o p ( 1 ) . Using the proof of the first part above again, we have
i = 1 n A k i ξ i ( θ , γ ) = o p ( n ) , k = 1 , , 10 .
Similarly,
i = 1 n A k i A l i = o p ( n ) , k , l = 1 , , 10 .
By combining the above equations, we have
1 n i = 1 n ξ ^ i ( θ , γ ) ξ ^ i ( θ , γ ) = 1 n i = 1 n ξ i ( θ , γ ) ξ i ( θ , γ ) + o p ( 1 ) .
Therefore, the second part of Lemma A2 holds. □
Lemma A3.
Under Conditions 1–9, S n V = O p ( p / n ) , where S n = 1 / n i = 1 n ξ i ( θ , γ ) ξ i ( θ , γ ) .
Proof of Lemma A3.
We can obtain that t r { ( S n V ) 2 } = O p ( p 2 / n ) , using Lemma 4 in Chen et al. [23]. Therefore, S n V = { t r [ ( S n V ) ( S n V ) ] } 1 / 2 = O p ( p n 1 / 2 ) . □
Lemma A4.
Under Conditions 1–11, max 1 i n ξ ^ i ( θ , γ ) = o p ( n 1 / 4 p ) and max 1 i n | λ ξ ^ i ( θ , γ ) | = o p ( 1 ) for all λ = O p ( a n ) .
Proof of Lemma A4.
Based on the proof of the first part in Lemma A2, it is easy to show that
ξ ^ i ( θ , γ ) = ξ i ( θ , γ ) + o p ( 1 ) .
Therefore, by combining the above equation and (34), we have
ξ ^ i ( θ , γ ) = = o p ( n 1 / 4 p ) ,
and for all λ = O p ( a n ) ,
max 1 i n | λ ξ ^ i ( θ , γ ) | = o p ( 1 ) .
Lemma A5.
Under Conditions 1–11, λ ( θ 0 , γ 0 ) = O p ( a n ) and λ ( θ ^ , γ ^ ) = O p ( a n ) .
Proof of Lemma A5.
Let λ ( θ , γ ) = ρ α , where ρ = λ ( θ , γ ) , α R p + r , and α = 1 . According to (8), λ ( θ , γ ) R p + r satisfies
0 = 1 n i = 1 n ξ ^ i ( θ , γ ) 1 + λ ( θ , γ ) ξ ^ i ( θ , γ ) = : ψ ( λ ) .
Using 1 / ( 1 + λ ( θ , γ ) ξ ^ i ( θ , γ ) ) = 1 λ ( θ , γ ) ξ ^ i ( θ , γ ) / ( 1 + λ ( θ , γ ) ξ ^ i ( θ , γ ) ) , we can obtain that
n 1 | α i = 1 n ξ ^ i ( θ , γ ) | ρ 1 + ρ max 1 i n ξ ^ i ( θ , γ ) α S ^ n ( θ , γ ) α ,
where S ^ n ( θ , γ ) = 1 n i = 1 n ξ ^ i ( θ , γ ) ξ ^ i ( θ , γ ) . Note that
0 < 1 + λ ( θ , γ ) ξ ^ i ( θ , γ ) 1 + ρ max 1 i n ξ ^ i ( θ , γ ) ,
and we have
ρ α S ^ n ( θ , γ ) α α ξ ^ i ( θ , γ ) max 1 i n ξ ^ i ( θ , γ ) n 1 | α i = 1 n ξ ^ i ( θ , γ ) | .
Using Lemma A4, we can show
n 1 | α i = 1 n ξ ^ i ( θ 0 , γ 0 ) | n 1 i = 1 n ξ ^ i ( θ 0 , γ 0 ) = O p ( a n ) .
Therefore,
max 1 i n n 1 ξ ^ i ( θ 0 , γ 0 ) | α i = 1 n ξ ^ i ( θ 0 , γ 0 ) | = o p ( 1 ) .
Combining (35) and (36), we have
| ρ α S ^ n ( θ 0 , γ 0 ) α + o p ( 1 ) | = O p ( a n ) .
Using Lemma A1, we can show that P ( α S ^ n ( θ 0 , γ 0 ) α 1 2 C ) 1 as n . Therefore, ρ = O p ( a n ) , which means that
λ ( θ 0 , γ 0 ) = ρ = O p ( a n ) ,
and the proof of λ ( θ ^ , γ ^ ) = O p ( a n ) follows by Owen [33]. □
Lemma A6.
Under Conditions 1–11, ˜ p ( θ , γ ) attains its minimum in D n , with probability approaching 1.
Proof of Lemma A6.
The proof of Lemma A6 is analogous to that of Lemma 2 in Tang and Leng [24], and is therefore omitted for brevity. □
Proof of Theorem 1.
According to Lemma A6, ˜ p ( θ , γ ) attains its minimum in D n . Let ( θ , γ ) D n . combining Lemma A2 and Taylor’s expansion, we can show that
1 n ˜ p ( θ , γ ) θ j = 1 n i = 1 n λ ξ ^ i ( θ , γ ) / θ j 1 + λ ξ ^ i ( θ , γ ) + p ν ( | θ j | ) sign ( θ j ) = 1 n i = 1 n λ ξ i ( θ , γ ) / θ j 1 + λ ξ i ( θ , γ ) + o p ( 1 ) + p ν ( | θ j | ) sign ( θ j ) = 1 n i = 1 n λ ξ i ( θ 0 , γ 0 ) θ j + 2 ξ i ( θ 0 , γ 0 ) θ j θ ( θ θ 0 ) + o p ( 1 ) + p ν ( | θ j | ) sign ( θ j ) = A 1 + A 2 + p ν ( | θ j | ) sign ( θ j ) + o p ( 1 ) ,
and
1 n ˜ p ( θ , γ ) γ j = 1 n i = 1 n λ ξ ^ i ( θ , γ ) / γ j 1 + λ ξ ^ i ( θ , γ ) + p ν ( | γ j | ) sign ( γ j ) = 1 n i = 1 n λ ξ i ( θ , γ ) / γ j 1 + λ ξ i ( θ , γ ) + o p ( 1 ) + p ν ( | γ j | ) sign ( γ j ) = 1 n i = 1 n λ ξ i ( θ 0 , γ 0 ) γ j + 2 ξ i ( θ 0 , γ 0 ) γ j γ ( γ γ 0 ) + o p ( 1 ) + p ν ( | γ j | ) sign ( γ j ) = A 3 + A 4 + p ν ( | γ j | ) sign ( γ j ) + o p ( 1 ) .
It follows from Conditions 5 and 6 and Lemma A5 that
max j A 1 ( | A 1 | ) = max j A 1 1 n i = 1 n λ E ξ i ( θ 0 , γ 0 ) θ j + ξ i ( θ 0 , γ 0 ) θ j E ξ i ( θ 0 , γ 0 ) θ j max j A 1 λ E ξ ( θ 0 , γ 0 ) θ j + 1 n λ i = 1 n ξ i ( θ 0 , γ 0 ) θ j E ξ ( θ 0 , γ 0 ) θ j = o p ( 1 ) ,
and
max j A 1 ( | A 2 | ) 1 n λ i = 1 n 2 ξ i ( θ 0 , γ 0 ) θ j θ E 2 ξ ( θ 0 , γ 0 ) θ j θ ( θ θ 0 ) + max j A 1 λ E 2 ξ ( θ 0 , γ 0 ) θ j θ ( θ θ 0 ) = o p ( 1 ) .
Similarly, we can obtain
max j A 2 ( | A 3 | ) = max j A 2 1 n i = 1 n λ E ξ i ( θ 0 , γ 0 ) γ j + ξ i ( θ 0 , γ 0 ) γ j E ξ i ( θ 0 , γ 0 ) γ j = o p ( 1 ) ,
and
max j A 2 ( | A 4 | ) 1 n λ i = 1 n 2 ξ i ( θ 0 , γ 0 ) γ j γ E 2 ξ ( θ 0 , γ 0 ) γ j γ ( γ γ 0 ) + max j A 2 λ E 2 ξ ( θ 0 , γ 0 ) γ j γ ( γ γ 0 ) = o p ( 1 ) .
Based on Condition 10, we can show that P τ ( | θ j | ) sign ( θ j ) { j A 1 } = τ and P τ ( | θ j | ) sign ( θ j ) { j A 1 } = τ sign ( θ j ) { j A 1 } . Thus, with probability approaching 1, l ˜ p ( θ , γ ) / θ j is dominated by the sign of θ j , j A 1 , as n . It means that lim n P ( θ ^ 2 = 0 ) = 1 . Similarly, using Condition 10 again, we have P ν ( | γ j | ) sign ( γ j ) { j A 2 } = ν , P ν ( | γ j | ) sign ( γ j ) { j A 2 } = ν sign ( γ j ) { j A 2 } , and lim n P ( γ ^ 2 = 0 ) = 1 . Therefore, Theorem 1 (1) is proved.
In this section, we begin to prove Theorem 1 (2). Minimising (9) is equal to minimising the following function
˜ p ( θ , γ , λ , μ , ϑ ) = 1 n i = 1 n log { 1 + λ ξ ^ i ( θ , γ ) } + i = 1 p p τ ( | θ i | ) + i = 1 r p ν ( | γ i | ) + μ H 2 θ + ϑ H 4 γ ,
where μ and ϑ are also Lagrange multipliers. Define
Q 1 n ( θ , γ , λ , μ , ϑ ) = 1 n i = 1 n ξ ^ i ( θ , γ ) 1 + λ ξ ^ i ( θ , γ ) , Q 2 n ( θ , γ , λ , μ , ϑ ) = 1 n i = 1 n { ξ ^ i ( θ , γ ) / θ } λ 1 + λ ξ ^ i ( θ , γ ) + b 1 ( θ ) + H 2 μ , Q 3 n ( θ , γ , λ , μ , ϑ ) = 1 n i = 1 n { ξ ^ i ( θ , γ ) / γ } λ 1 + λ ξ ^ i ( θ , γ ) + b 2 ( γ ) + H 4 ϑ , Q 4 n ( θ , γ , λ , μ , ϑ ) = H 2 θ , Q 5 n ( θ , γ , λ , μ , ϑ ) = H 4 γ ,
where
b 1 ( θ ) = { P τ ( | θ 1 | ) sign ( θ 1 ) , , P τ ( | θ p 1 | ) sign ( θ p 1 ) } ,
and
b 2 ( γ ) = { P ν ( | γ 1 | ) sign ( γ 1 ) , , P ν ( | γ p 2 | ) sign ( θ p 2 ) } .
The minimizer ( θ ^ , γ ^ , λ ^ , μ ^ , ϑ ^ ) satisfies Q j n ( θ , γ , λ , μ , ϑ ) = 0 , j = 1 , , n . According to λ = O p ( a n ) , Q 2 n ( θ , γ , λ , μ , ϑ ) = 0 , and Q 3 n ( θ , γ , λ , μ , ϑ ) = 0 , we can obtain μ = O p ( a n ) and ϑ = O p ( a n ) . Thus, by expanding Q j n ( θ , γ , λ , μ , ϑ ) at ( θ 0 , γ 0 , 0 , 0 , 0 ) , we have
Q 1 n ( θ 0 , γ 0 , 0 , 0 , 0 ) 0 0 0 0 = S ^ n M ^ 1 M ^ 2 0 0 M ^ 1 T b 1 ( θ ) 0 H 2 0 M ^ 2 0 b 2 ( γ ) 0 H 4 0 H 2 0 0 0 0 0 H 4 0 0 λ ^ 0 θ ^ θ 0 γ ^ γ 0 μ ^ 0 ϑ ^ 0 ,
where M ^ 1 = n 1 i = 1 n ξ ^ i ( θ , γ ) / θ and M ^ 2 = n 1 i = 1 n ξ ^ i ( θ , γ ) / γ . Let M 1 = n 1 i = 1 n ξ i ( θ , γ ) / θ and M 2 = n 1 i = 1 n ξ ^ i ( θ , γ ) / γ , it is easy to show that
Q 1 n ( θ 0 , γ 0 , 0 , 0 , 0 ) 0 0 0 0 = S n M 1 M 2 0 0 M 1 T 0 0 H 2 0 M 2 0 0 0 H 4 0 H 2 0 0 0 0 0 H 4 0 0 λ ^ 0 θ ^ θ 0 γ ^ γ 0 μ ^ 0 ϑ ^ 0 + R n ,
where R n = k = 1 8 R n ( k ) , R n ( 1 ) = ( R 1 n ( 1 ) , R 2 n ( 1 ) , R 3 n ( 1 ) , 0 , 0 ) , R 1 n ( 1 ) R p + r 1 , R 2 n ( 1 ) R p , R 3 n ( 1 ) R r 1 , the k-th component of R j n ( 1 ) is given by
R j n , k ( 1 ) = 1 2 ( ϕ ^ ϕ ) 2 Q j n , k ( ϕ * ) ϕ ϕ ( ϕ ^ ϕ ) ,
ϕ = ( λ , θ , γ ) , ϕ * = ( λ * , θ * , γ * ) such that λ * λ ^ , θ * θ 0 θ ^ θ 0 , and γ * γ 0 γ ^ γ 0 . In addition, we have R n ( 2 ) = { 0 , b 1 ( θ 0 ) , 0 , 0 , 0 } , R n ( 3 ) = { 0 , 0 , b 2 ( γ 0 ) , 0 , 0 } , R n ( 4 ) = { 0 , { b 1 ( θ * ) ( θ * θ 0 ) } , 0 , 0 , 0 } , R n ( 5 ) = { 0 , 0 , { b 2 ( γ * ) ( γ * γ 0 ) } , 0 , 0 } , R n ( 6 ) = { { ( S ^ n ( θ 0 , γ 0 ) S n ) λ ^ } + { M ^ 1 ( θ 0 , γ 0 ) ( θ ^ θ 0 ) } + { M ^ 2 ( θ 0 , γ 0 ) ( γ ^ γ 0 ) } , 0 , 0 , 0 , 0 } , R n ( 7 ) = { 0 , { ( M ^ 1 ( θ 0 , γ 0 ) M 1 ) λ ^ } , 0 , 0 , 0 } , and R n ( 8 ) = { 0 , 0 , { ( M ^ 2 ( θ 0 , γ 0 ) M 2 ) λ ^ } , 0 , 0 } . By Conditions 5–7 and Lemma A5, we can show that
R l n ( 1 ) 2 n 2 ϕ ^ ϕ 4 i , j , k = 1 2 ( p + r 1 ) 2 Q i ( X , Z ) ϕ j ϕ k = O p ( p + r 1 ) 3 a n 4 , l = 1 , 2 , 3 .
Combining this with Condition 7, it implies that R n ( 1 ) = o p ( 1 / n ) . According to Conditions 10 and 11, it easy to show that R n ( 2 ) = o p ( 1 / n ) , R n ( 3 ) = o p ( 1 / n ) , R n ( 4 ) = o p ( 1 / n ) , and R n ( 5 ) = o p ( 1 / n ) . Similarly to Leng and Tang [25], we can also prove that R n ( 6 ) = o p ( 1 / n ) , R n ( 7 ) = o p ( 1 / n ) , and R n ( 8 ) = o p ( 1 / n ) . Therefore, we have R n = o p ( 1 / n ) . Let
ψ = ( θ , γ , μ , ϑ ) , K 11 = S n , K 12 = M 1 , M 2 , 0 , 0 , K 21 = K 12
and
K 22 = 0 0 H 2 0 0 0 0 H 4 H 2 0 0 0 0 H 4 0 0 , K = K 11 K 12 K 21 K 22 .
By inverting (37), it can be shown that
( λ ^ 0 ) , ( ψ ^ ψ ) = K 1 Q 1 n ( θ 0 , γ 0 , 0 , 0 , 0 ) , 0 + o p ( n 1 / 2 ) .
Applying block matrix inversion, we have
K 1 = K 11 1 + K 11 1 K 12 F 1 K 21 K 11 1 K 11 1 K 12 F 1 F 1 K 21 K 11 1 F 1 ,
where
F = K 22 K 21 K 11 1 K 12 .
Thus,
ψ ^ ψ = F 1 K 21 K 11 1 Q 1 n ( θ , γ , λ , μ , ϑ ) + o p ( n 1 / 2 ) ,
and
F 1 = U 1 Ω U 1 H ( H U 1 H ) 1 ( H U 1 H ) 1 H U 1 ( H U 1 H ) 1 ,
where Ω = U 1 H ( H U 1 H ) 1 H U 1 and
U = M 1 S n 1 M 1 M 1 S n 1 M 2 M 2 S n 1 M 1 M 2 S n 1 M 2 .
This implies that
θ ^ θ 0 γ ^ γ 0 = { U 1 Ω } { 1 n i = 1 n ξ i ( θ 0 , γ 0 ) + o p ( n 1 2 ) } .
Furthermore, we can obtain
θ ^ 1 θ 10 γ ^ 1 γ 10 = H 0 { U 1 Ω } { 1 n i = 1 n ξ i ( θ 0 , γ 0 ) + o p ( n 1 2 ) } .
Using Lemma A3, we have Var { n 1 / 2 ( θ ^ 1 , γ ^ 1 ) } = I B = H 0 U 1 V U 1 H 0 H 0 U 1 H ( H U 1 H ) 1 H 2 U 1 V A 1 H 2 ( H U 1 H ) 1 H U 1 H 0 . Define Y n i = 1 n T n i , where T n i = B I B 1 2 ( H 0 U 1 H 0 U 1 H ( H U 1 H ) 1 H U 1 ) ξ i ( θ 0 , γ 0 ) . According to the central limit theorem, it follows that
1 n B I B 1 2 { H 0 U 1 H 0 U 1 H ( H U 1 H ) 1 H U 1 } i = 1 n ξ i ( θ 0 , γ 0 ) N ( 0 , G )
in distribution. □
Proof of Theorem 2.
Define φ ^ i = λ ^ ξ ^ i ( θ ^ , γ ^ ) and φ i = λ ^ ξ i ( θ ^ , γ ^ ) for i = 1 , , n . Combining Lemma A2, Lemma A4, and Taylor’s expansion, we can show that
l ˜ p ( θ ^ , γ ^ ) = i = 1 n φ ^ i φ ^ i 2 2 + φ ^ i 3 3 ( 1 + ζ i ) 4 + o p ( 1 ) = i = 1 n φ i φ i 2 2 + φ i 3 3 ( 1 + ζ i ) 4 + o p ( 1 ) + o p ( 1 ) ,
where | ζ i | | φ i | . According to (38), the asymptotic expansion for λ ^ can be shown as
λ ^ = Σ 1 + Σ 1 ( M 1 , M 2 ) F 1 ( M 1 , M 2 ) Σ 1 ξ ¯ i + o p ( 1 ) ,
where F 1 = U 1 Ω , Ω = U 1 H ( H U 1 H ) 1 H U 1 , and ξ ¯ i = 1 n i = 1 n ξ i ( θ 0 , γ 0 ) . From (39), we can gain the expansion of l ˜ p ( θ ^ , γ ^ ) as follows
l ˜ p ( θ ^ , γ ^ ) = n ξ ¯ i Σ 1 ( M 1 , M 2 ) U 1 H ( H U 1 H ) 1 H U 1 ( M 1 , M 2 ) Σ 1 ξ ¯ i + o p ( 1 ) .
There exist values of H ˜ 2 and H ˜ 4 that satisfy H ˜ 2 θ = 0 , H ˜ 4 θ = 0 , H ˜ 2 H ˜ 2 = I p q 1 , and H ˜ 4 H ˜ 4 = I p q 2 . Let H ˜ = H ˜ 2 0 0 H ˜ 4 . Similarly, we can show that
l ˜ p ( θ ^ , γ ^ ) L n ( θ 0 , L n ( γ 0 ) = 0 ) = n ξ ¯ i Σ 1 ( M 1 , M 2 ) Ω ˜ ( M 1 , M 2 ) Σ 1 ξ ¯ i + o p ( 1 ) ,
where Ω ˜ = U 1 H ˜ ( H ˜ U 1 H ˜ ) 1 H ˜ U 1 .
By combining above two equations, it follows that
l ˜ p ( L n ) = n ξ ¯ i Σ 1 / 2 ( P 1 P 2 ) Σ 1 / 2 ξ ¯ i + o p ( 1 ) ,
where P 1 = Σ 1 / 2 ( M 1 , M 2 ) Ω ˜ ( M 1 , M 2 ) Σ 1 / 2 ,   P 2 = Σ 1 / 2 ( M 1 , M 2 ) Ω ( M 1 , M 2 ) Σ 1 / 2 . Because P 1 P 2 is an idempotent matrix with rank q 1 + q 2 , P 1 P 2 can be denoted as P n P n , where P n R ( q 1 + q 2 ) × ( p + r 1 ) and P n P n = I q 1 + q 2 . By the central limit theorem, we have n P n Σ 1 / 2 ξ ¯ i L N ( 0 , I q 1 + q 2 ) . Therefore, l ˜ p ( L n ) L χ q 1 + q 2 2 and Theorem 2 follows. □
The SIM and PLM are two special cases of the PLSIM. According to the same arguments in the proofs of Theorems 1 and 2, Corollaries 1–4 can be proved in a similar manner, and are hence omitted.

References

  1. Carroll, R.; Fan, J.; Gijbels, I.; Wand, M.P. Generalized partially linear single-index models. J. Am. Stat. Assoc. 1997, 92, 477–489. [Google Scholar] [CrossRef]
  2. Yu, Y.; Ruppert, D. Penalized spline estimation for partially linear single-index models. J. Am. Stat. Assoc. 2002, 97, 1042–1054. [Google Scholar] [CrossRef]
  3. Zhu, L.X.; Xue, L.G. Empirical likelihood confidence regions in a partially linear single-index model. J. R. Stat. Soc. Ser. B 2006, 68, 549–570. [Google Scholar] [CrossRef]
  4. Xia, Y.; Härdle, W. Semi-parametric estimation of partially linear single-index models. J. Multivar. Anal. 2006, 97, 1162–1184. [Google Scholar] [CrossRef]
  5. Liang, H.; Xia, L.; Li, R.; Tsai, C.L. Estimation and testing for partially linear single-index models. Ann. Stat. 2010, 38, 3811–3836. [Google Scholar] [CrossRef] [PubMed]
  6. Ma, Y.; Zhu, L.P. Doubly robust and efficient estimators for heteroscedastic partially linear single-index models allowing high dimensional covariates. J. R. Stat. Soc. Ser. B 2013, 75, 305–322. [Google Scholar] [CrossRef]
  7. Fang, J.L.; Liu, W.R.; Lu, X.W. Empirical likelihood for heteroscedastic partially linear single-index models with growing dimensional data. Metrika 2018, 81, 255–281. [Google Scholar] [CrossRef]
  8. Hao, C.; Yin, X. A Normality Test for High-dimensional Data Based on the Nearest Neighbor Approach. J. Am. Stat. Assoc. 2023, 118, 719–731. [Google Scholar]
  9. Liu, B.; Zhang, Q.; Xue, L.; Song, P.; Kang, J. Robust High-Dimensional Regression with Coefficient Thresholding and its Application to Imaging Data Analysis. J. Am. Stat. Assoc. 2024, 119, 715–729. [Google Scholar] [CrossRef]
  10. Breiman, L. Heuristics of instability and stabilization in model selection. Ann. Stat. 1996, 24, 2350–2383. [Google Scholar] [CrossRef]
  11. Tibshirani, R. Regression shrinkage and selection via the Lasso. J. R. Stat. Soc. Ser. B 1996, 58, 267–288. [Google Scholar] [CrossRef]
  12. Fan, J.; Li, R. Variable selection via nonconcave penalized likelihood and its oracle properties. J. Am. Stat. Assoc. 2001, 96, 1348–1360. [Google Scholar] [CrossRef]
  13. Xie, H.; Huang, J. SCAD-penalized regression in high-dimensional partially linear models. Ann. Stat. 2009, 37, 673–696. [Google Scholar] [CrossRef]
  14. Wang, T.; Zhu, L.X. Consistent Model Selection and Estimation in a General Single-Index Model with “Large p and Small n”; Technical Report; Department of Mathematics, Hong Kong Baptist University: Hong Kong, China, 2011. [Google Scholar]
  15. Zhang, J.; Wang, T.; Zhu, L.X.; Liang, H. A dimension reduction based approach for estimation and variable selection in partially linear single-index models with high-dimensional covariates. Electron. J. Stat. 2012, 6, 2235–2273. [Google Scholar] [CrossRef]
  16. Lai, P.; Wang, Q.H.; Zhou, X.H. Variable selection and semiparametric efficient estimation for the heteroscedastic partially linear single-index model. Comput. Stat. Data Anal. 2014, 70, 241–256. [Google Scholar] [CrossRef]
  17. Owen, A. Empirical likelihood ratio confidence intervals for a single function. Biometrika 1988, 75, 237–249. [Google Scholar] [CrossRef]
  18. Owen, A. Empirical likelihood for linear models. Ann. Stat. 1991, 19, 1725–1747. [Google Scholar] [CrossRef]
  19. Kolaczyk, E.D. Empirical likelihood for generalized linear models. Stat. Sinaca 1994, 4, 199–218. [Google Scholar]
  20. Lu, X.W. Empirical likelihood for heteroscedastic partially linear models. J. Multivar. Anal. 2009, 100, 387–395. [Google Scholar] [CrossRef][Green Version]
  21. Xue, L.; Zhu, L. Empirical likelihood for single-index models. J. Multivar. Anal. 2006, 97, 1295–1312. [Google Scholar] [CrossRef][Green Version]
  22. Matsushita, Y.; Otsuke, T. Empirical likelihood for network data. J. Am. Stat. Assoc. 2023, 119, 2117–2128. [Google Scholar] [CrossRef]
  23. Chen, S.; Peng, L.; Qin, Y. Effects of data dimension on empirical likelihood. Biometrika 2009, 96, 712–722. [Google Scholar] [CrossRef]
  24. Tang, C.; Leng, C. Penalized high-dimensional empirical likelihood. Biometrika 2010, 97, 905–920. [Google Scholar] [CrossRef]
  25. Leng, C.; Tang, C. Penalized empirical likelihood and growing dimensional general estimating equations. Biometrika 2012, 99, 706–716. [Google Scholar] [CrossRef]
  26. Donoho, D.; Johnstone, I. Ideal spatial adaptation by wavelet shrinkage. Biometrika 1994, 81, 425–455. [Google Scholar] [CrossRef]
  27. Hoerl, A.; Kennard, R. Ridge regression: Biased estimation for nonorthogonal problems. Technometrics 1970, 12, 55–67. [Google Scholar] [CrossRef]
  28. Fan, J.Q.; Lv, J.C. Sure independence screening for ultrahigh dimensional feature space. J. R. Stat. Soc. Ser. B 2008, 70, 894–911. [Google Scholar] [CrossRef]
  29. Hammer, S.; Katzenstein, D.; Hughes, M.; Gundaker, H.; Schooley, R.; Haubrich, R.; Henry, W.; Lederman, M.; Phair, J.; Niu, M.; et al. A trial comparing nucleoside monotherapy with combination therapy in HIV-infected adults with CD4 cell counts from 200 to 500 per cubic millimeter. N. Engl. J. Med. 1996, 335, 1081–1089. [Google Scholar] [CrossRef]
  30. Lai, P.; Wang, Q. Semiparametric efficient estimation for partially linear single-index models with responses missing at random. J. Multivar. Anal. 2014, 128, 33–50. [Google Scholar] [CrossRef]
  31. Zhu, L.X.; Fang, K.T. Asymptotics for kernel estimate of sliced inverse regression. Ann. Stat. 1996, 24, 1053–1068. [Google Scholar] [CrossRef]
  32. Serfling, R.J. Approximation of Stochastic Processes; John Wiley: New York, NY, USA, 1980. [Google Scholar]
  33. Owen, A. Empirical Likelihood; Chapman and Hall-CRC: New York, NY, USA, 2001. [Google Scholar]
Figure 1. Residual plots for the PLSIM. (a) Residual plot of PEL. (b) Residual plot of PVS.
Figure 1. Residual plots for the PLSIM. (a) Residual plot of PEL. (b) Residual plot of PVS.
Entropy 27 00964 g001
Table 1. Variable selection results for the PEL method.
Table 1. Variable selection results for the PEL method.
( p , r ) nKernel FunctionMean Count of Zero Coefficients
CorrectIncorrect
(10,10)50Epanechnikov(4.47 [55.9%], 3.82 [54.6%])(2.51, 3.24)
Cosine(4.35 [54.4%], 3.69 [52.7%])(2.38, 3.47)
(10,10)100Epanechnikov(6.93 [86.6%], 5.63 [80.4%])(0.52, 0.76)
Cosine(6.86 [85.8%], 5.58 [79.7%])(0.55, 0.74)
(10,10)200Epanechnikov(7.32 [91.5%], 6.27 [89.5%])(0.34, 0.51)
Cosine(7.29 [91.1%], 6.24 [89.1%])(0.37, 0.58)
(20,20)50Epanechnikov(9.71 [53.9%], 8.95 [52.6%])(4.83, 5.95)
Cosine(9.58 [53.2%], 8.79 [51.7%])(4.92, 6.14)
(20,20)100Epanechnikov(12.69 [70.5%], 11.81 [69.4%])(3.58, 5.32)
Cosine(12.66 [70.3%], 11.67 [68.6%])(3.71, 5.45)
(20,20)200Epanechnikov(15.81 [87.8%], 14.71 [86.5%])(0.42, 0.87)
Cosine(15.28 [84.9%], 14.56 [85.6%])(0.45, 0.93)
Table 2. Estimation mean and standard deviations of the PEL estimators (the values in parentheses are the corresponding standard deviations).
Table 2. Estimation mean and standard deviations of the PEL estimators (the values in parentheses are the corresponding standard deviations).
( p , r ) nKernel Function θ 1 θ ( p 1 ) γ 2 γ 3
(10,10)50Epanechnikov4.35 (2.616)2.74 (1.738)3.91 (2.237)−4.96 (3.521)
Cosine4.27 (2.853)2.59 (1.802)4.14 (2.394)−4.58 (3.475)
(10,10)100Epanechnikov1.94 (0.128)0.96 (0.106)1.56 (0.132)−1.93 (0.103)
Cosine2.06 (0.131)0.94 (0.114)1.43 (0.146)−2.09 (0.115)
(10,10)200Epanechnikov1.95 (0.087)1.03 (0.082)1.56 (0.105)−2.04 (0.103)
Cosine2.02 (0.094)1.06 (0.091)1.43 (0.113)−2.07 (0.107)
(20,20)50Epanechnikov4.92 (3.587)3.17 (3.264)5.89 (4.316)−6.83 (4.763)
Cosine4.53 (3.951)2.96 (3.728)6.34 (4.512)−6.71 (4.625)
(20,20)100Epanechnikov2.97 (1.438)1.53 (0.896)2.16 (1.048)−2.79 (0.951)
Cosine3.18 (1.502)1.61 (0.925)2.12 (1.073)−2.85 (1.027)
(20,20)200Epanechnikov2.09 (0.135)1.05 (0.107)1.54 (0.139)−1.96 (0.114)
Cosine2.12 (0.139)1.07 (0.119)1.43 (0.147)−2.08 (0.121)
Table 3. Variable selection results for the PEL and PVS methods.
Table 3. Variable selection results for the PEL and PVS methods.
( p , r ) nMethodMean Count of Zero Coefficients
CorrectIncorrect
(20,20)200PEL(15.21 [84.5%], 13.45 [84.1%])(0.37, 1.03)
PVS(14.75 [81.9%], 13.17 [82.3%])(0.45, 1.26)
400PEL(15.89 [88.3%], 14.02 [87.6%])(0.35, 0.85)
PVS(15.46 [85.8%], 13.74 [85.9%])(0.41, 0.92)
(20,30)200PEL(15.24 [84.6%], 22.56 [86.7%])(0.39, 1.27)
PVS(14.98 [83.2%], 22.12 [85.1%])(0.38, 1.35)
400PEL(15.79 [87.7%], 23.31 [89.7%])(0.34, 1.14)
PVS(15.13 [84.1%], 22.85 [87.9%])(0.43, 1.28)
(30,20)200PEL(24.38 [87.1%], 13.51 [84.4%])(0.38, 1.24)
PVS(23.85 [85.2%], 12.97 [81.1%])(0.44, 1.31)
400PEL(25.14 [89.7%], 14.13 [88.3%])(0.33, 1.17)
PVS(24.73 [88.3%], 13.54 [84.6%])(0.39, 1.26)
(30,30)200PEL(24.25 [86.6%], 23.36 [89.8%])(0.40, 1.31)
PVS(23.72 [84.7%], 22.68 [87.2%])(0.47, 1.38)
400PEL(25.27 [90.0%], 23.15 [89.1%])(0.38, 1.19)
PVS(24.54 [87.6%], 22.82 [87.7%])(0.45, 1.25)
Table 4. Estimation mean and standard deviations of the PEL and PVS estimators (the Values in parentheses are the corresponding standard deviations).
Table 4. Estimation mean and standard deviations of the PEL and PVS estimators (the Values in parentheses are the corresponding standard deviations).
( p , r ) nMethod θ 1 θ ( p 1 ) γ 2 γ 3 γ 4
(20,20)200PEL1.05 (0.132)−1.04 (0.101)1.06 (0.143)2.07 (0.127)0.95 (0.106)
PVS1.12 (0.145)−1.08 (0.134)0.93 (0.175)2.11 (0.142)1.08 (0.131)
400PEL0.98 (0.102)−1.03 (0.091)1.04 (0.126)1.98 (0.108)1.06 (0.113)
PVS1.07 (0.134)−1.06 (0.114)0.96 (0.135)2.08 (0.125)1.05 (0.117)
(20,30)200PEL0.94 (0.137)−1.05 (0.109)1.10 (0.128)2.06 (0.121)1.07 (0.149)
PVS1.86 (0.153)−1.12 (0.142)0.92 (0.136)1.96 (0.147)1.13 (0.151)
400PEL0.96 (0.125)−1.04 (0.098)1.08 (0.124)2.03 (0.119)1.09 (0.128)
PVS1.91 (0.138)−1.09 (0.117)0.92 (0.225)1.95 (0.124)1.92 (0.137)
(30,20)200PEL1.09 (0.141)−1.13 (0.135)1.07 (0.127)1.94 (0.130)1.08 (0.143)
PVS0.89 (0.159)−1.08 (0.162)1.09 (0.143)1.91 (0.129)1.13 (0.156)
400PEL1.06 (0.098)−0.96 (0.107)1.03 (0.126)2.05 (0.115)1.04 (0.112)
PVS1.86 (0.116)−1.58 (0.154)0.94 (0.139)1.97 (0.129)1.07 (0.123)
(30,30)200PEL1.14 (0.129)−1.13 (0.138)0.91 (0.142)2.09 (0.138)1.12 (0.128)
PVS0.79 (0.148)−1.18 (0.145)0.88 (0.157)1.90 (0.141)0.93 (0.135)
400PEL1.09 (0.114)−1.07 (0.126)0.93 (0.119)2.10 (0.128)1.09 (0.119)
PVS1.15 (0.123)−1.15 (0.131)1.05 (0.125)1.89 (0.133)0.92 (0.121)
Table 5. Estimations and confidence intervals of the methods of PEL and PVS (the values in parentheses are the corresponding confidence intervals).
Table 5. Estimations and confidence intervals of the methods of PEL and PVS (the values in parentheses are the corresponding confidence intervals).
VariableMethod
PELPVS
homo0.193 ([0.065, 0.317])0.131 ([0.001, 0.262])
str2−0.322 ([−0.517, −0.016])−0.045 ([−0.324, 0.234])
age0.528 ([0.314, 0.735])0.153 ([−0.118, 0.424])
CD820−0.691 ([−0.478, −0.893])−0.208 ([−0.413, 0.003])
cD400.255 ([0.009, 0.452])0.446 ([0.134, 0.758])
cD4200.423 ([0.126, 0.674])0.857 ([0.536, 1.178])
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Fang, J.; Tian, Z. Statistical Inference for High-Dimensional Heteroscedastic Partially Single-Index Models. Entropy 2025, 27, 964. https://doi.org/10.3390/e27090964

AMA Style

Fang J, Tian Z. Statistical Inference for High-Dimensional Heteroscedastic Partially Single-Index Models. Entropy. 2025; 27(9):964. https://doi.org/10.3390/e27090964

Chicago/Turabian Style

Fang, Jianglin, and Zhikun Tian. 2025. "Statistical Inference for High-Dimensional Heteroscedastic Partially Single-Index Models" Entropy 27, no. 9: 964. https://doi.org/10.3390/e27090964

APA Style

Fang, J., & Tian, Z. (2025). Statistical Inference for High-Dimensional Heteroscedastic Partially Single-Index Models. Entropy, 27(9), 964. https://doi.org/10.3390/e27090964

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop