Next Article in Journal
Correction: Huang et al. An Improved Lotka–Volterra Model Using Quantum Game Theory. Mathematics 2021, 9, 2217
Previous Article in Journal
Unsteady Magnetohydrodynamics (MHD) Flow of Hybrid Ferrofluid Due to a Rotating Disk
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Asymptotic Normality in Linear Regression with Approximately Sparse Structure

by
Saulius Jokubaitis
and
Remigijus Leipus
*,†
Faculty of Mathematics and Informatics, Institute of Applied Mathematics, Vilnius University, Naugarduko 24, LT-03225 Vilnius, Lithuania
*
Author to whom correspondence should be addressed.
These authors contributed equally to this work.
Mathematics 2022, 10(10), 1657; https://doi.org/10.3390/math10101657
Submission received: 1 March 2022 / Revised: 28 March 2022 / Accepted: 7 May 2022 / Published: 12 May 2022
(This article belongs to the Section Probability and Statistics)

Abstract

:
In this paper, we study the asymptotic normality in high-dimensional linear regression. We focus on the case where the covariance matrix of the regression variables has a KMS structure, in asymptotic settings where the number of predictors, p, is proportional to the number of observations, n. The main result of the paper is the derivation of the exact asymptotic distribution for the suitably centered and normalized squared norm of the product between predictor matrix, X , and outcome variable, Y, i.e., the statistic X Y 2 2 , under rather unrestrictive assumptions for the model parameters β j . We employ variance-gamma distribution in order to derive the results, which, along with the asymptotic results, allows us to easily define the exact distribution of the statistic. Additionally, we consider a specific case of approximate sparsity of the model parameter vector β and perform a Monte Carlo simulation study. The simulation results suggest that the statistic approaches the limiting distribution fairly quickly even under high variable multi-correlation and relatively small number of observations, suggesting possible applications to the construction of statistical testing procedures for the real-world data and related problems.

1. Introduction

Consider a linear regression model
Y = X β + ε ,
where Y : = ( y 1 , , y n ) R n × 1 are n observations of outcome and X = ( X 1 , , X n ) R n × p are p-dimensional predictors with X 1 , , X n being i.i.d. p × 1 random vectors X i = ( X 1 , i , , X p , i ) , which are normally distributed with zero mean and the covariance matrix Σ , denoted X i = d N p ( 0 , Σ ) . We assume that the covariance matrix Σ has a form
Σ = ( ϱ | i j | ) i , j = 1 p = 1 ϱ ϱ p 1 ϱ 1 ϱ p 2 ϱ p 1 ϱ p 2 1 ,
if 0 < | ϱ | < 1 and Σ = I p if ϱ = 0 (here and below I p denotes the p × p identity matrix). This matrix is often called the Kac–Murdock–Szego (KMS) matrix, originally introduced in [1]. As the autocorrelation matrix of corresponding causal AR(1) processes, the KMS matrix is positive definite and is considered due to the wide array of applications in the literature and its well known spectral properties (see, e.g., [2] for a thorough literature review). When carefully chosen, such a structure could well-approximate a wide array of possible covariance structures (see, e.g., [3] for a more general approach with various Toeplitz covariance structures). Furthermore, ε : = ( ε 1 , , ε n ) R n × 1 = d N n ( 0 , σ ε 2 I n ) are unobserved i.i.d. errors with E ε i = 0 , Var ( ε i ) = σ ε 2 > 0 , and β : = ( β 1 , , β p ) R p × 1 is an unknown p-dimensional parameter. In practice, the assumption is that E X i = 0 can be untenable, and it may be appropriate to add an intercept to the linear model (1); however, for simplicity, throughout this paper we will assume that the intercept is known and the variables are centered. Similar settings are considered when dealing with certain geospatial data, longitudinal studies, microarray data, and research on approximate message passing algorithms (see, e.g., [4,5,6,7,8,9]).
This paper is concerned with the derivation of the exact asymptotic distribution for the suitably centered and normalized squared norm X Y 2 2 under the assumption of the KMS type covariance structure in (2), where p and n are assumed to be large. Throughout the paper, we assume that p , n and p / n c ( 0 , ) . We are particularly interested in cases where p > n . Statistics of such form arise in various applications in the context of high-dimensional linear regression, and under normality assumptions general results can be derived using random matrix theory through Wishart distributions (see, e.g., [7,10,11,12]). Dealing with such statistics typically require strong restrictions on the model parameters β ; however, in this paper, we only require that β 2 2 < is satisfied. Moreover, our results could be extended by using β -generating functions (e.g., parameters of FARIMA models). In comparison to the related papers, ref. [12] assumes exact sparsity, while [7,10] require approximate sparsity.
We approach the problem following an observation by [13] that the distribution of product of Gaussian random variables admits a variance-gamma distribution, which results in a set of attractive properties. We contribute to the literature on variance-gamma distribution by extending the results by [14,15,16]. We demonstrate that, along with the derivation of the asymptotic distribution of X Y 2 2 , this approach allows us to define the exact distribution of the statistic given any fixed values p , n , which can be expressed through a combination of gamma and normal random variables. In the related literature we were not able to find results for the exact distribution and asymptotic analysis of the statistic X Y 2 2 based on the variance-gamma distribution. Furthermore, we deem that such a result is much easier to work with than when considering the characteristic or density functions of X Y 2 2 straightforwardly. Therefore, in addition to the 2 -norm statistic, we argue that the obtained results can be easily extended towards alternative forms of the statistic, e.g., by using a different norm, which would reduce the problem to manipulating variance-gamma distribution, thus suggesting possible further research cases and useful extensions.
Additionally, we examine a specific case of parameter β by considering β j = j 1 , j 1 . Similar structures of the vector β are often found in the literature when approximate sparsity of the coefficients in the linear regression model (1) is assumed. See, e.g., [17,18] for a broader view towards sparsity requirements and its implications to specific high-dimensional algorithms; refs. [19,20] for model selection problems in autoregressive time series models; refs. [21,22,23,24,25,26,27,28] for applications on inference of high-dimensional models and high-dimensional instrumental variable (IV) regression models; or [29,30,31,32,33] for recent applications of high-dimensional and sparse methods with financial and economic data. Performing Monte Carlo simulations, we find that the empirical distributions of the corresponding statistic approach the limiting distribution reasonably quickly even for large values of ϱ and c. These results suggest that the assumption of sparse structure can be included in the applications and statistical tests, thus, could be further extended following the literature on testing for sparsity or construction of signal-to-noise ratio estimators (see, e.g., [7,10,11,12]).
In this paper, = d , d and P denote the equality of distributions, convergence of distributions and convergence in probability, respectively. The notation of C represents a generic positive constant which may assume different values at various locations, and 1 A denotes the indicator function of a set A.
The structure of the paper is as follows. In Section 2, we present the main results of the paper. In Section 3, we present useful properties of variance-gamma distribution, which are used in Section 4 in order to prove some auxiliary results. In Section 5, we present the proof of the main result. Finally, in Section 6, we provide an example of the main result under imposed approximate sparsity assumption for the parameter β of the model (1). Technical results are presented in Appendix A, while, for brevity, some straightforward yet tedious proofs are presented in the Supplementary Material.

2. Main Results

In this section we formulate the main results on the normality of statistic X Y 2 2 . Introduce the notations:
κ 1 , p : = k = 1 p l = 1 p β k β l ϱ | k l | ,
κ 2 , p : = k = 1 p l = 1 p β l ϱ | k l | 2 ,
κ 3 , p : = k , l , j , j = 1 p β j β j ϱ | k j | ϱ | l j | ϱ | k l | .
It can be observed that under j = 1 β j 2 < , there exist limits
κ i = lim p κ i , p , i = 1 , 2 , 3 .
Additionally, κ 2 , p 0 . Since ( ϱ | i j | ) i , j = 1 p is positive semi-definite, κ i , p 0 , i = 1 , 3 . Indeed, k , l = 1 p ϱ | k l | a k a l 0 , thus it suffices to take a k = β k for i = 1 and a k = j = 1 p β j ϱ | k j | for i = 3 .
Our first main result is the following theorem.
Theorem 1.
Assume the model in (1) with covariance structure in (2). Let n and let p = p n satisfy
p , p n c ( 0 , ) .
Let also the β j satisfy
j = 1 β j 2 < .
Then
X Y 2 2 n 2 κ 2 , p p n ( κ 1 , p + σ ε 2 ) n 3 / 2 d N ( 0 , s 2 ) ,
where variance s 2 has the structure
s 2 = 4 κ 2 2 + 4 ( κ 1 + σ ε 2 ) 2 κ 2 c + κ 3 + 2 c ( κ 1 + σ ε 2 ) 2 c + 1 + ϱ 2 1 ϱ 2 .
Our second main result deals with the case where the centering sequence in (8) is modified to include the limiting values of κ i , p , i = 1 , 2 .
Theorem 2.
Let the assumptions of Theorem 1 hold. In addition, assume that j = p + 1 β j 2 = o ( p 1 / 2 ) and sup j 1 | β j | j α < with α > 1 / 2 . Then,
X Y 2 2 n 2 κ 2 + c ( κ 1 + σ ε 2 ) n 3 / 2 d N ( 0 , s 2 ) .
The proofs of these theorems are given in Section 5.
Remark 1.
For alternative expressions of κ 1 , κ 2 and κ 3 , see Lemma 5 below.
Define
β ( x ) : = j = 1 β j 2 x j , | x | 1 .
The following corollary deals with the case when ϱ = 0 , i.e., Σ = I p . The result follows from Theorem 2, noting that in this case κ i = β ( 1 ) , i = 1 , 2 , 3 .
Corollary 1.
Assume a model (1) with covariance structure Σ = I p . Let assumptions (6) and (7) be satisfied. In addition, assume that j = p + 1 β j 2 = o ( p 1 / 2 ) and sup j 1 | β j | j α < with α > 1 / 2 . Then,
X Y 2 2 n 2 ( β ( 1 ) ( 1 + c ) + c σ ε 2 ) n 3 / 2 d N ( 0 , s 2 ) ,
where
s 2 = 2 β ( 1 ) 2 4 + 5 c + c 2 + 4 β ( 1 ) σ ε 2 1 + 3 c + c 2 + 2 σ ε 4 ( c + c 2 ) .

3. Properties of the Variance-Gamma Distribution

In this section, we provide some properties of the variance-gamma distribution, which will be used in the following proofs.
Recall that the variance-gamma distribution with parameters r > 0 , θ R , σ > 0 and μ R has density
f VG ( x ) = 1 σ π Γ ( r / 2 ) e θ ( x μ ) / σ 2 | x μ | 2 θ 2 + σ 2 ( r 1 ) / 2 K ( r 1 ) / 2 θ 2 + σ 2 σ 2 | x μ | ,
where x R , K ν ( x ) is the modified Bessel function of the second kind. For a random variable Q with density (13), we write Q = d VG ( r , θ , σ , μ ) . Let Γ ( a , b ) , a > 0 , b > 0 , denote the gamma distribution with density
f G ( x ) = b a Γ ( a ) x a 1 e b x , x > 0 .
It holds that
Q = d μ + θ W r + σ W r U ,
where W r = d Γ ( r / 2 , 1 / 2 ) , U = d N ( 0 , 1 ) , W r and U are independent. The characteristic function of Q = d VG ( r , θ , σ , μ ) has a form (see, e.g., [34,35])
φ Q ( t ) = e i μ t ( 1 + σ 2 t 2 i θ t ) r / 2 , t R .
We note the following properties of the variance-gamma distribution.
(i)
If Q 1 = d VG ( r 1 , θ , σ , μ 1 ) and Q 2 = d VG ( r 2 , θ , σ , μ 2 ) are independent random variables then
Q 1 + Q 2 = d VG ( r 1 + r 2 , θ , σ , μ 1 + μ 2 ) .
(ii)
If Q = d VG ( r , θ , σ , μ ) , then for any a > 0
a Q = d VG ( r , a θ , a σ , a μ ) .
The following proposition is crucial for our purposes.
Proposition 1.
(i) If ( ξ 1 , ξ 2 ) = d N 2 ( 0 , Σ ) , where Σ = σ 1 2 ϱ σ 1 σ 2 ϱ σ 1 σ 2 σ 2 2 , then
ξ 1 ξ 2 = d VG ( 1 , ϱ σ 1 σ 2 , 1 ϱ 2 σ 1 σ 2 , 0 ) .
(ii) If ( ξ 1 j , ξ 2 j ) , j = 1 , , n , are i.i.d. random vectors with common distribution N 2 ( 0 , Σ ) , then
j = 1 n ξ 1 j ξ 2 j = d VG ( n , ϱ σ 1 σ 2 , 1 ϱ 2 σ 1 σ 2 , 0 )
and
j = 1 n ξ 1 j ξ 2 j = d σ 1 σ 2 ( ϱ W n + 1 ϱ 2 W n U ) ,
where W n = d Γ ( n / 2 , 1 / 2 ) and U = d N ( 0 , 1 ) are independent random variables.
(iii) Assume that ( ξ 1 j ( 1 ) , , ξ 1 j ( p ) , ξ 2 j ) , j = 1 , , n , are i.i.d. copies of ( ξ 1 ( 1 ) , , ξ 1 ( p ) , ξ 2 ) = d N p + 1 ( 0 , Σ ( p ) ) and let ϱ ( k l ) : = Corr ( ξ 1 ( k ) , ξ 1 ( l ) ) , ϱ ( k ) : = Corr ( ξ 1 ( k ) , ξ 2 ) , ( σ 1 ( k ) ) 2 : = Var ( ξ 1 ( k ) ) , σ 2 2 : = Var ( ξ 2 ) , k , l = 1 , , p . Then
j = 1 n ξ 1 j ( 1 ) ξ 2 j j = 1 n ξ 1 j ( p ) ξ 2 j = d σ 1 ( 1 ) σ 2 ( ϱ ( 1 ) W n + 1 ( ϱ ( 1 ) ) 2 W n U 1 ) σ 1 ( p ) σ 2 ( ϱ ( p ) W n + 1 ( ϱ ( p ) ) 2 W n U p ) ,
where ( U 1 , , U p ) = d N p ( 0 , Σ U ) , Σ U = ( σ U ( k l ) ) with
σ U ( k , l ) = E U k U l = ϱ ( k l ) ϱ ( k ) ϱ ( l ) 1 ( ϱ ( k ) ) 2 1 ( ϱ ( l ) ) 2 , k , l = 1 , , p .
Proof. 
The statements in (i), (ii) are well known, see e.g., [16]. The proof of part (iii) follows from Lemma 1. □
Lemma 1.
Assume that ( ξ 1 ( 1 ) , , ξ 1 ( p ) , ξ 2 ) has distribution N p + 1 ( 0 , Σ ( p ) ) and let ϱ ( k l ) : = Corr ( ξ 1 ( k ) , ξ 1 ( l ) ) , ϱ ( k ) : = Corr ( ξ 1 ( k ) , ξ 2 ) , ( σ 1 ( k ) ) 2 : = Var ( ξ 1 ( k ) ) , σ 2 2 : = Var ( ξ 2 ) , k , l = 1 , , p . Then
ξ 1 ( 1 ) ξ 2 ξ 1 ( p ) ξ 2 = d σ 1 ( 1 ) σ 2 ϱ ( 1 ) W 1 + 1 ( ϱ ( 1 ) ) 2 W 1 U 1 σ 1 ( p ) σ 2 ϱ ( p ) W 1 + 1 ( ϱ ( p ) ) 2 W 1 U p ,
where W 1 = d Γ ( 1 / 2 , 1 / 2 ) , ( U 1 , , U p ) is, independent of W 1 , zero mean normal vector with covariances in (16).
Proof. 
It suffices to prove that for any ( t 1 , , t p ) R p it holds
k = 1 p t k ξ 1 ( k ) ξ 2 = d σ 2 k = 1 p t k σ 1 ( k ) ϱ ( k ) W 1 + 1 ( ϱ ( k ) ) 2 W 1 U k .
Since
k = 1 p t k ξ 1 ( k ) = d N 0 , k , l = 1 p t k t l ϱ ( k l ) σ 1 ( k ) σ 1 ( l ) , ξ 2 = d N ( 0 , σ 2 2 ) ,
by Proposition 1(i) we obtain that
k = 1 p t k ξ 1 ( k ) ξ 2 = d VG 1 , σ 2 k = 1 p t k ϱ ( k ) σ 1 ( k ) , σ 2 k , l = 1 p t k t l σ 1 ( k ) σ 1 ( l ) ( ϱ ( k l ) ϱ ( k ) ϱ ( l ) ) , 0 .
For the right-hand side of (17) write
σ 2 k = 1 p t k σ 1 ( k ) ϱ ( k ) W 1 + 1 ( ϱ ( k ) ) 2 W 1 U k = σ 2 k = 1 p t k σ 1 ( k ) ϱ ( k ) W 1 + σ 2 k = 1 p t k σ 1 ( k ) 1 ( ϱ ( k ) ) 2 U k W 1 .
Here, by (16),
σ 2 k = 1 p t k σ 1 ( k ) 1 ( ϱ ( k ) ) 2 U k = d σ 2 k , l = 1 p t k t l σ 1 ( k ) σ 1 ( l ) 1 ( ϱ ( k ) ) 2 1 ( ϱ ( l ) ) 2 E ( U k U l ) 1 / 2 U 1 = σ 2 k , l = 1 p t k t l σ 1 ( k ) σ 1 ( l ) ( ϱ ( k l ) ϱ ( k ) ϱ ( l ) ) 1 / 2 U 1 .
Note that U 1 = d N ( 0 , 1 ) . So that,
σ 2 k = 1 p t k σ 1 ( k ) ϱ ( k ) W 1 + 1 ( ϱ ( k ) ) 2 W 1 U k = d σ 2 k = 1 p t k σ 1 ( k ) ϱ ( k ) W 1 + σ 2 k , l = 1 p t k t l σ 1 ( k ) σ 1 ( l ) ( ϱ ( k l ) ϱ ( k ) ϱ ( l ) ) 1 / 2 W 1 U 1 ,
which, by representation (14), has the same VG distribution as that in (18). This proves (17). □

4. Some Auxiliary Lemmas

In this section we establish some auxiliary results that will be used in the proofs of Theorems 1 and 2. Here and throughout the paper we remove the upper indices when working with triangular schemes of random variables, e.g., ( V 1 , , V p ) ( V 1 ( p ) , , V p ( p ) ) , whenever it is clear from the context.
Lemma 2.
Let V = ( V 1 , , V p ) = d N p ( 0 , Σ V ( p ) ) , where Σ V ( p ) is positive definite covariance matrix and tr ( ( Σ V ( p ) ) 2 ) = o ( p 2 ) , p . Then
1 p k = 1 p V k 2 E V k 2 P 0 a s p .
If, in addition, p 1 tr ( Σ V ( p ) ) 1 , then
1 p k = 1 p V k 2 P 1 a s p .
Proof. 
Due to the Spectral Theorem, we have
V V = k = 1 p V k 2 = d j = 1 p λ j ( p ) Z ˜ j 2 ,
where Z ˜ j are i.i.d. standard normal variables and λ 1 ( p ) , , λ p ( p ) are the eigenvalues of Σ V ( p ) . Observe from (21) that
E V V = j = 1 p λ j ( p ) = tr ( Σ V ( p ) ) ,
Var ( V V ) = Var j = 1 p λ j ( p ) Z ˜ j 2 = 2 j = 1 p ( λ j ( p ) ) 2 = 2 tr ( ( Σ V ( p ) ) 2 ) .
Thus, by (22) and (23), for any ϵ > 0
P | 1 p V V E V V | > ϵ Var ( V V ) p 2 ϵ 2 0 , p ,
and the relation in (19) follows due to assumption tr ( ( Σ V ( p ) ) 2 ) = o ( p 2 ) . Finally, if p 1 tr ( Σ V ( p ) ) 1 , by (22), the result (19) leads to (20). □
Remark 2.
The assumption on matrix Σ V = Σ V ( p ) in Lemma 2, requiring that tr ( Σ V 2 ) = o ( p 2 ) , is not overly restrictive: assume, for example, that Σ V = ( σ ( i , j ) ) is any KMS type covariance matrix, as in (2). Then, it can be seen that
tr ( Σ V 2 ) = i , j = 1 p ( σ ( i , j ) ) 2 = i , j = 1 p ϱ 2 | i j | = | m | < p ( p | m | ) ϱ 2 | m | p | m | < p | m | ϱ 2 | m | = O ( p ) .
Lemma 3.
Assume that Z ˜ 1 , Z ˜ 2 , are i.i.d. N ( 0 , 1 ) random variables. For any p N define
ζ j ( p ) : = ν j ( p ) ( Z ˜ j 2 1 ) + γ j ( p ) p Z ˜ j , j = 1 , , p ,
where ν j ( p ) , j = 1 , , p , are positive scalars, and γ j ( p ) , j = 1 , , p , are real scalars, such that
j = 1 p ( ν j ( p ) ) 3 = o j = 1 p Var ζ j ( p ) 3 / 2 ,
p j = 1 p ( γ j ( p ) ) 2 ν j ( p ) = o j = 1 p Var ζ j ( p ) 3 / 2
with Var ( ζ j ( p ) ) = 2 ( ν j ( p ) ) 2 + p ( γ j ( p ) ) 2 . Then, as p ,
j = 1 p ζ j ( p ) j = 1 p Var ( ζ j ( p ) ) d N ( 0 , 1 ) .
Proof. 
The proof uses the method of cumulants and is structured as follows:
(i)
We establish the moment-generating function of ζ j ( p ) , M ζ j ( p ) ( t ) : = E e t ζ j ( p ) , and log ( M ζ j ( p ) ( t ) ) ;
(ii)
We find G ( t ; p ) which corresponds to the cumulant generating function of the sum j = 1 p ζ j ( p ) ;
(iii)
We find K ( t ; p ) : = G ( t j = 1 p ( 2 ( ν j ( p ) ) 2 + p ( γ j ( p ) ) 2 ) ; p ) , which corresponds to the cumulant generating function of the left hand side of (27);
(iv)
Finally, in order to prove (27), we show that the cumulants ϰ j ( p ) , generated by K ( t ; p ) , satisfy ϰ 1 ( p ) = 0 , ϰ 2 ( p ) = 1 and ϰ d ( p ) 0 , d = 3 , 4 , , as p .
Step 1. First, rewrite
ζ j ( p ) = ν j ( p ) Z ˜ j + γ j ( p ) p 2 ν j ( p ) 2 ν j ( p ) ( γ j ( p ) ) 2 p 4 ν j ( p ) .
Here, ψ j ( p ) : = Z ˜ j + γ j ( p ) p 2 ν j ( p ) 2 has a noncentral chi-squared distribution with the following moment-generating function:
M ψ j ( p ) ( t ) : = E e t ψ j ( p ) = ( 1 2 t ) 1 / 2 exp γ j ( p ) 2 ν j ( p ) 2 t p ( 1 2 t ) 1 , | t | < 1 2 .
Therefore, by (28) and (29),
M ζ j ( p ) ( t ) = M ψ j ( p ) ( ν j ( p ) t ) exp t ν j ( p ) t p γ j ( p ) 2 ν j ( p ) 2 = 1 2 ν j ( p ) t 1 2 exp ( γ j ( p ) ) 2 4 ν j ( p ) t p 1 2 ν j ( p ) t 1 t ν j ( p ) + ( γ j ( p ) ) 2 p 4 ν j ( p ) ,
for | t | < ( 2 ν j ( p ) ) 1 , and
log M ζ j ( p ) ( t ) = γ j ( p ) 2 ν j ( p ) 2 p t ν j ( p ) 1 2 ν j ( p ) t 1 1 2 log 1 2 ν j ( p ) t t ν j ( p ) + ( γ j ( p ) ) 2 p 4 ν j ( p ) = 1 2 ( γ j ( p ) ) 2 p + 2 ( ν j ( p ) ) 2 t 2 + ( γ j ( p ) ) 2 p 2 k = 3 t k 2 k 2 ( ν j ( p ) ) k 2 + 1 2 k = 3 2 k ( ν j ( p ) ) k t k k .
Step 2. Since ζ 1 ( p ) , , ζ j ( p ) are independent, we have that
G ( t ; p ) = j = 1 p log M ζ j ( p ) ( t ) = t 2 2 j = 1 p ( γ j ( p ) ) 2 p + 2 ( ν j ( p ) ) 2 + p 2 k = 3 2 k 2 t k j = 1 p ( γ j ( p ) ) 2 ( ν j ( p ) ) k 2 + 1 2 k = 3 2 k k t k j = 1 p ( ν j ( p ) ) k .
Step 3. It can be observed that
K ( t ; p ) = G t j = 1 p 2 ( ν j ( p ) ) 2 + p ( γ j ( p ) ) 2 ; p = t 2 2 + 1 2 k = 3 2 k 2 t k p j = 1 p ( γ j ( p ) ) 2 ( ν j ( p ) ) k 2 j = 1 p ( 2 ( ν j ( p ) ) 2 + ( γ j ( p ) ) 2 p ) k / 2 + 1 2 k = 3 2 k k t k j = 1 p ( ν j ( p ) ) k j = 1 p ( 2 ( ν j ( p ) ) 2 + ( γ j ( p ) ) 2 p ) k / 2 = k = 1 ϰ k ( p ) t k k ! ,
where ϰ 1 ( p ) = 0 , ϰ 2 ( p ) = 1 , and for k 3 ,
ϰ k ( p ) = k ! 2 k 3 p j = 1 p ( γ j ( p ) ) 2 ( ν j ( p ) ) k 2 + ( k 1 ) ! 2 k 1 j = 1 p ( ν j ( p ) ) k j = 1 p ( 2 ( ν j ( p ) ) 2 + ( γ j ( p ) ) 2 p ) k / 2 .
Step 4. In order to prove that (27) holds, it remains to show that, as p , ϰ d ( p ) 0 for all d 3 . By (30), it is equivalent to showing that for any fixed k 3 , as p ,
j = 1 p ( ν j ( p ) ) k j = 1 p 2 ( ν j ( p ) ) 2 + ( γ j ( p ) ) 2 p k / 2 0 ,
p j = 1 p ( γ j ( p ) ) 2 ( ν j ( p ) ) k 2 j = 1 p 2 ( ν j ( p ) ) 2 + ( γ j ( p ) ) 2 p k / 2 0 .
In order to prove (31) we use induction. The case for k = 3 holds by assumption. Assuming that (31) holds for fixed k 3 , we have
j = 1 p ( ν j ( p ) ) k + 1 j = 1 p 2 ( ν j ( p ) ) 2 + ( γ j ( p ) ) 2 p ( k + 1 ) / 2 j = 1 p ( ν j ( p ) ) 2 1 / 2 j = 1 p ( ν j ( p ) ) k j = 1 p 2 ( ν j ( p ) ) 2 + ( γ j ( p ) ) 2 p ( k + 1 ) / 2 j = 1 p 2 ( ν j ( p ) ) 2 + ( γ j ( p ) ) 2 p 1 / 2 j = 1 p ( ν j ( p ) ) k j = 1 p 2 ( ν j ( p ) ) 2 + ( γ j ( p ) ) 2 p ( k + 1 ) / 2 = j = 1 p ( ν j ( p ) ) k j = 1 p 2 ( ν j ( p ) ) 2 + ( γ j ( p ) ) 2 p k / 2 0 ,
concluding that (31) holds for all k 3 . The proof for (32) is analogous: the case for k = 3 holds by assumption; thus, we repeat the same arguments as with (31) and conclude that (32) holds for all k 3 . This concludes the proof of the lemma. □

5. Proof of the Main Results

In this section we give the proofs of Theorems 1 and 2. Throughout the proofs, we express corresponding constants in terms of κ i , p and κ i , i = 1 , 2 , 3 , introduced in (3)–(5). Recall that κ i , p 0 , and, by Remark 3, κ i < , for i = 1 , 2 , 3 .
Proof of Theorem 1. 
Write
X Y 2 2 = H 1 2 + + H p 2 = : H ,
where
H k : = j = 1 n X k , j l = 1 p β l X l , j + ε j , k = 1 , , p .
Denote Z j : = l = 1 p β l X l , j + ε j , j = 1 , , n . By covariance structure (2) and X k , j = d N ( 0 , 1 ) , ε j = d N ( 0 , σ ε 2 ) , we have Z j = d N ( 0 , σ Z 2 ) , where σ Z 2 = l , l = 1 p β l β l ϱ | l l | + σ ε 2 and Cov ( X k , j , Z j ) = l = 1 p β l ϱ | k l | .
Applying Proposition 1(iii) with ξ 1 j ( k ) = X k , j , ξ 2 j = Z j , and σ 1 ( k ) = 1 , σ 2 , p = σ Z , θ k ( p ) : = ϱ ( k ) = σ Z 1 l = 1 p β l ϱ | k l | , where ϱ ( k l ) = ϱ | k l | , we obtain that
X Y 2 2 = d σ 2 , p 2 k = 1 p θ k ( p ) W n + 1 ( θ k ( p ) ) 2 W n U k 2 ,
where W n = d Γ ( n / 2 , 1 / 2 ) and ( U 1 , , U p ) = d N p ( 0 , Σ U ( p ) ) with Σ U ( p ) = ( σ U ( k , l ) ) defined as (see (16)):
σ U ( k , l ) = ϱ | k l | θ k ( p ) θ l ( p ) 1 ( θ k ( p ) ) 2 1 ( θ l ( p ) ) 2 , k , l = 1 , , p .
By expanding the square we can write
X Y 2 2 = d σ 2 , p 2 ( ( W n E W n + E W n ) 2 k = 1 p ( θ k ( p ) ) 2 + 2 W n 3 / 2 k = 1 p θ k ( p ) 1 ( θ k ( p ) ) 2 U k + ( W n E W n ) k = 1 p 1 ( θ k ( p ) ) 2 U k 2 + E W n k = 1 p 1 ( θ k ( p ) ) 2 U k 2 ) .
By further rearranging the right-hand side, we have
X Y 2 2 n 3 / 2 = d I 1 + I 2 + I 3 + I 4 ,
where
I 1 : = σ 2 , p 2 n 3 / 2 ( W n E W n ) 2 k = 1 p ( θ k ( p ) ) 2 ,
I 2 : = σ 2 , p 2 n 3 / 2 ( W n E W n ) 2 E W n k = 1 p ( θ k ( p ) ) 2 + k = 1 p ( 1 ( θ k ( p ) ) 2 ) U k 2 ,
I 3 : = σ 2 , p 2 n 3 / 2 2 W n 3 / 2 k = 1 p θ k ( p ) 1 ( θ k ( p ) ) 2 U k + σ 2 , p 2 n 3 / 2 E W n k = 1 p ( 1 ( θ k ( p ) ) 2 ) U k 2 1 ,
I 4 : = σ 2 , p 2 n 3 / 2 p E W n + ( E W n ) 2 k = 1 p ( θ k ( p ) ) 2 .
We will show that, as p , n , p / n c ( 0 , ) , the term I 1 = o P ( 1 ) , while the terms I 2 and I 3 are asymptotically normal. More precisely, we will show that I 2 d N ( 0 , s 1 2 ) and I 3 d N ( 0 , s 2 2 ) , where s 1 2 and s 2 2 are given by (44) and (62) below. Here, since W n and ( U 1 , , U p ) are mutually independent for each n, it follows that I 2 + I 3 d N ( 0 , s 1 2 + s 2 2 ) . Finally, the term I 4 defines the mean of the statistic, i.e.,
X Y 2 2 n 3 / 2 I 4 d N ( 0 , s 1 2 + s 2 2 ) .
Thus, we will conclude by establishing that I 4 = n ( κ 2 , p + p n 1 ( κ 1 , p + σ ε 2 ) ) , while s 1 2 + s 2 2 = s 2 , as in the statement of the theorem.
First, consider I 1 defined in (35). We will show that I 1 = o P ( 1 ) . Denote
c 2 : = lim p k = 1 p ( θ k ( p ) ) 2 = ( κ 1 + σ ε 2 ) 1 κ 2 , σ 2 2 : = lim p σ 2 , p 2 = κ 1 + σ ε 2 .
It is clear that c 2 < and σ 2 2 < . Recall that, by CLT,
W n E W n n 1 / 2 d N ( 0 , 2 ) .
Therefore,
I 1 = O ( 1 ) n 1 / 2 W n E W n n 1 / 2 2 = o ( 1 ) O P ( 1 ) = o P ( 1 ) .
Second, consider I 2 , defined in (36). We will show that
I 2 d N ( 0 , s 1 2 )
with s 1 2 given by
s 1 2 = 2 σ 2 4 ( 2 c 2 + c ) 2 = 8 κ 2 2 + 8 c ( κ 1 + σ ε 2 ) κ 2 + 2 c 2 ( κ 1 + σ ε 2 ) 2 .
Rewrite
I 2 = σ 2 , p 2 W n E W n n 1 / 2 2 E W n n k = 1 p ( θ k ( p ) ) 2 + 1 n k = 1 p ( 1 ( θ k ( p ) ) 2 ) U k 2 .
Applying (40) and (41) for the outer term of (45), we obtain
σ 2 , p 2 W n E W n n 1 / 2 d N ( 0 , 2 σ 2 4 ) .
We will show that the inner term of (45) approaches 2 c 2 + c . Since E W n = n , by (40) and assumption p / n c it suffices to prove the convergence
1 p k = 1 p ( 1 ( θ k ( p ) ) 2 ) U k 2 P 1 .
Denote matrix
A : = diag 1 ( θ 1 ( p ) ) 2 , , 1 ( θ p ( p ) ) 2 .
To prove (46), we apply Lemma 2 with V j = 1 ( θ j ( p ) ) 2 U j , j = 1 , , p , and Σ V ( p ) = A 1 / 2 Σ U A 1 / 2 . The conditions of Lemma 2 will hold if tr ( ( A 1 / 2 Σ U A 1 / 2 ) 2 ) = O ( p ) and p 1 tr ( A 1 / 2 Σ U A 1 / 2 ) 1 , as p . Observe, that
tr ( ( A 1 / 2 Σ U A 1 / 2 ) 2 ) = tr ( ( A Σ U ) 2 ) = k = 1 p k = 1 p ( 1 ( θ k ( p ) ) 2 ) ( 1 ( θ k ( p ) ) 2 ) ( σ U ( k , k ) ) 2 = k = 1 p k = 1 p ϱ 2 | k k | 2 ϱ | k k | θ k ( p ) θ k ( p ) + ( θ k ( p ) ) 2 ( θ k ( p ) ) 2 = k = 1 p k = 1 p ϱ 2 | k k | 2 κ 1 , p + σ ε 2 1 κ 3 , p + κ 1 , p + σ ε 2 2 κ 2 , p 2 = k = 1 p k = 1 p ϱ 2 | k k | + o ( p ) p 1 + ϱ 2 1 ϱ 2 ,
since κ i < , i = 1 , 2 , 3 and κ 1 , p 0 . Here we used (40) and the observation that
k = 1 p k = 1 p ϱ | k k | θ k ( p ) θ k ( p ) = κ 3 , p κ 1 , p + σ ε 2 κ 3 κ 1 + σ ε 2 , as p .
Similarly, we have
1 p tr ( A 1 / 2 Σ U A 1 / 2 ) = 1 p k = 1 p ( 1 ( θ k ( p ) ) 2 ) = 1 κ 2 , p p ( κ 1 , p + σ ε 2 ) 1 ,
since, by Lemma A4, κ 2 , p = o ( p ) , while κ 1 , p 0 , κ 1 < . This concludes the proof of (46).
Next, consider I 3 , defined by (37). We will show that
I 3 d N ( 0 , s 2 2 ) ,
with s 2 2 defined in (62). Write
I 3 = σ 2 , p 2 2 W n 3 / 2 n 3 / 2 b U + n 1 / 2 ( U A U p ) ,
where U = U 1 , , U p , A is defined by (47), and
b = θ 1 ( p ) 1 ( θ 1 ( p ) ) 2 , , θ p ( p ) 1 ( θ p ( p ) ) 2 .
Observe that n 3 / 2 W n 3 / 2 P 1 due to the Law of Large Numbers. Thus, since W n and U are independent for any n and p / n c , it follows that
I 3 = σ 2 , p 2 2 b U + c p U A U p + o P ( 1 ) .
First, we consider the inner term of (51) and show that as p ,
2 b U + c p ( U A U p ) d V 2 ,
where V 2 = d N ( 0 , σ 2 4 s 2 2 ) . Then, (50) readily follows from (51).
Recall, that U = d N p ( 0 , Σ U ) , Σ U > 0 . Further, let Z ˜ = d N p ( 0 , I p ) . Clearly, one has that U = d Σ U 1 / 2 Z ˜ , where Σ U 1 / 2 denotes the symmetric square root of Σ U . By the Spectral Theorem, we construct V : = P Z ˜ , where V = d N p ( 0 , I p ) and P is an orthogonal matrix that diagonalizes Σ U 1 / 2 A Σ U 1 / 2 , such that P Σ U 1 / 2 A Σ U 1 / 2 P = Λ , with Λ = diag ( λ 1 ( p ) , , λ p ( p ) ) comprised of the eigenvalues of Σ U 1 / 2 A Σ U 1 / 2 . Then,
c p U A U p + 2 b U = d c p V Λ V p + 2 b Σ U 1 / 2 P V = c p j = 1 p λ j ( p ) ( V j 2 1 ) + g j ( p ) p V j = : c p j = 1 p V ˜ j ( p ) ,
where ( g 1 ( p ) , , g p ( p ) ) = 2 c 1 / 2 b Σ U 1 / 2 P , and
V ˜ j ( p ) : = λ j ( p ) ( V j 2 1 ) + g j ( p ) p V j , j = 1 , , p .
Clearly, E V ˜ j ( p ) = 0 and E ( V ˜ j ( p ) ) 2 = 2 ( λ j ( p ) ) 2 + ( g j ( p ) ) 2 p . Therefore, proving the result (52) is equivalent to showing:
c p j = 1 p V ˜ j ( p ) d N ( 0 , σ 2 4 s 2 2 ) ,
where
σ 2 4 s 2 2 = c lim p p 1 j = 1 p E ( V ˜ j ( p ) ) 2 = 2 c lim p p 1 j = 1 p ( λ j ( p ) ) 2 + c lim p j = 1 p ( g j ( p ) ) 2 .
We prove (55) by applying Lemma 3 with ν j ( p ) = λ j ( p ) as the eigenvalues of Σ U 1 / 2 A Σ U 1 / 2 and γ j ( p ) = g j ( p ) . By the conditions of Lemma 3, we need to show that the following holds
j = 1 p ( λ j ( p ) ) 3 + p j = 1 p ( g j ( p ) ) 2 λ j ( p ) = o j = 1 p 2 ( λ j ( p ) ) 2 + ( g j ( p ) ) 2 p 3 / 2 .
First, observe that p 1 j = 1 p ( 2 ( λ j ( p ) ) 2 + ( g j ( p ) ) 2 p ) C ( 0 , ) . Indeed, we have that j = 1 p ( g j ( p ) ) 2 C g ( 0 , ) , since
j = 1 p ( g j ( p ) ) 2 = 4 c 1 ( b Σ U 1 / 2 P ) ( b Σ U 1 / 2 P ) = 4 c 1 b Σ U b = 4 c 1 j = 1 p j = 1 p θ j ( p ) θ j ( p ) 1 ( θ j ( p ) ) 2 1 ( θ j ( p ) ) 2 σ U ( j , j ) = 4 c 1 j = 1 p j = 1 p θ j ( p ) θ j ( p ) ϱ | j j | θ j ( p ) θ j ( p ) 4 c 1 κ 1 + σ ε 2 1 κ 3 4 c 1 κ 1 + σ ε 2 2 κ 2 2 = C g
by (40) and (49).
Next, by (48), we find that p 1 j = 1 p ( λ j ( p ) ) 2 C λ ( 0 , ) . Indeed, by (48), we have
j = 1 p ( λ j ( p ) ) 2 = tr ( ( Σ U 1 / 2 A Σ U 1 / 2 ) 2 ) = tr ( ( Σ U A ) 2 ) = j = 1 p j = 1 p ϱ 2 | j j | + o ( p ) p 1 + ϱ 2 1 ϱ 2 .
Thus, by (58) and (59), it follows that p 1 j = 1 p ( 2 c ( λ j ( p ) ) 2 + ( g j ( p ) ) 2 p ) C ( 0 , ) and condition (57) reduces to:
j = 1 p ( λ j ( p ) ) 3 + p j = 1 p ( g j ( p ) ) 2 λ j ( p ) = o ( p 3 / 2 ) .
We show that (60) holds. For the first term of (60), we have
j = 1 p ( λ j ( p ) ) 3 = tr ( ( Σ U 1 / 2 A Σ U 1 / 2 ) 3 ) = tr ( ( Σ U A ) 3 ) = i , j , k = 1 p 1 ( θ i ( p ) ) 2 1 ( θ k ( p ) ) 2 1 ( θ j ( p ) ) 2 σ U ( i , j ) σ U ( i , k ) σ U ( k , j ) = i , j , k = 1 p ϱ | i j | + θ i ( p ) θ j ( p ) ϱ | i k | + θ i ( p ) θ k ( p ) ϱ | k j | + θ k ( p ) θ j ( p ) = o ( p 3 / 2 ) ,
where the last equality follows from Lemma A5. For the second term of (60), observe that by Hölder’s inequality and (61),
p j = 1 p ( g j ( p ) ) 2 λ j ( p ) p j = 1 p | g j ( p ) | 3 2 / 3 j = 1 p ( λ j ( p ) ) 3 1 / 3 = p 3 / 2 O ( 1 ) j = 1 p ( λ j ( p ) ) 3 p 3 / 2 1 / 3 = o ( p 3 / 2 ) .
This concludes with (60), ensuring that the conditions of Lemma 3 hold.
Now we can establish the expression for s 2 2 . By (40), (56), (58) and (59),
s 2 2 = σ 2 4 lim p j = 1 p 2 p 1 c ( λ j ( p ) ) 2 + c ( g j ( p ) ) 2 = σ 2 4 lim p 2 c p k = 1 p k = 1 p ϱ 2 | k k | + o ( p ) + 4 σ 2 4 κ 1 + σ ε 2 1 κ 3 4 σ 2 4 κ 1 + σ ε 2 2 κ 2 2 = 2 c 1 + ϱ 2 1 ϱ 2 ( κ 1 + σ ε 2 ) 2 + 4 ( κ 1 + σ ε 2 ) κ 3 4 κ 2 2 .
By (44) and (62), recalling that s 2 = s 1 2 + s 2 2 , we have that
s 2 = 4 κ 2 2 + 4 ( κ 1 + σ ε 2 ) 2 κ 2 c + κ 3 + 2 c ( κ 1 + σ ε 2 ) 2 c + 1 + ϱ 2 1 ϱ 2 .
Finally, consider I 4 , defined by (38). Since E W n = n , we have that
I 4 = κ 1 , p + σ ε 2 n 3 / 2 n 2 κ 2 , p κ 1 , p + σ ε 2 + p n = n κ 2 , p + p n ( κ 1 , p + σ ε 2 ) .
By (34), having established four parts by (35)–(38), we proved that (39) holds due to (42), (43), (50), (62), with terms (63) and (64), as in the statement of the theorem, thus concluding the proof. □
Before proceeding with the proof of Theorem 2, we establish the following lemma that ensures O ( p 1 / 2 ) convergence rate for κ 1 , p and κ 2 , p , appearing in Theorem 1, under additional restrictions for the parameters β j .
Lemma 4.
Assume that j = p + 1 β j 2 = o ( p 1 / 2 ) and sup j 1 | β j | j α < , α > 1 / 2 , and | ϱ | < 1 . Then,
(i)
κ 1 = κ 1 , p + o ( p 1 / 2 ) ,
(ii)
κ 2 = κ 2 , p + o ( p 1 / 2 ) .
Proof. 
Proof of Theorem 2. 
Rewrite the left-hand side of (10) as follows:
X Y 2 2 n 2 ( κ 2 + c ( κ 1 + σ ε 2 ) ) n 3 / 2 = X Y 2 2 n 2 κ 2 , p p n ( κ 1 , p + σ ε 2 ) n 3 / 2 + n ( κ 2 , p κ 2 ) + n c ( κ 1 , p κ 1 ) + o ( 1 ) .
It remains to apply Lemma 4 and Theorem 1 in order to conclude the proof of the theorem. □
We end this section by deriving two supporting results that allows us to derive convenient alternative expressions for the terms κ 1 , κ 2 and κ 3 . For this, we introduce functions β ( · ) and b ( · ) by Definition 3 below, which, under the assumptions of Theorem 1 and a given structure of β j ’s, requires only to evaluate the terms β ( 1 ) , β ( ϱ ) , β ( ϱ 2 ) and b 1 ( ϱ ) , b 2 ( ϱ ) . Then, due to Lemma 5 below, the expressions for κ 1 , κ 2 and κ 3 easily follow.
Definition 3.
Assume that j = 1 β j 2 < and | ϱ | 1 . Define,
β ( ϱ ) : = j = 1 β j 2 ϱ j ,
b 1 ( ϱ ) : = j = 2 j = 1 j 1 β j β j ϱ j j ,
b 2 ( ϱ ) : = j = 2 j = 1 j 1 β j β j ϱ j + j ,
and define the following quantities which involve derivatives of (65)–(67):
β ( 1 ) ( ϱ ) : = ϱ d β ( ϱ ) d ϱ = j = 1 j β j 2 ϱ j ,
b 1 ( 1 ) ( ϱ ) : = ϱ d b 1 ( ϱ ) d ϱ = j = 2 j = 1 j 1 β j β j ϱ j j ( j j ) ,
b 2 ( 1 ) ( ϱ ) : = ϱ d b 2 ( ϱ ) d ϱ = j = 2 j = 1 j 1 β j β j ϱ j + j ( j + j ) ,
b ( 2 ) ( ϱ ) : = ϱ 2 d 2 b 1 ( ϱ ) d ϱ 2 + b 1 ( 1 ) ( ϱ ) = j = 2 j = 1 j 1 β j β j ϱ j j ( j j ) 2 .
Note that, by the rules of differentiation of power series, the functions (68)–(71) are well defined.
Lemma 5.
Let the assumptions of Theorem 1 hold. Let κ 1 , κ 2 and κ 3 be given by (3)–(5), respectively. Then, under notation in Definition 3, the following identities hold:
(i)
κ 1 = β ( 1 ) + 2 b 1 ( ϱ ) ,
(ii)
κ 2 = β ( 1 ) 1 + ϱ 2 1 ϱ 2 β ( ϱ 2 ) 1 1 ϱ 2 + 2 b 1 ( 1 ) ( ϱ ) + b 1 ( ϱ ) 1 + ϱ 2 1 ϱ 2 b 2 ( ϱ ) 1 1 ϱ 2 ,
(iii)
κ 3 = 1 ( 1 ϱ 2 ) 2 ( 1 + 4 ϱ 2 + ϱ 4 ) ( β ( 1 ) + 2 b 1 ( ϱ ) ) ( 1 + 3 ϱ 2 ) ( β ( ϱ 2 ) + 2 b 2 ( ϱ ) ) + 1 1 ϱ 2 3 b 1 ( 1 ) ( ϱ ) ( 1 + ϱ 2 ) 2 b 2 ( 1 ) ( ϱ ) + β ( 1 ) ( ϱ 2 ) + b ( 2 ) ( ϱ ) .
Proof. 
See the proof in Appendix A.2. □
Remark 3.
From the assumptions of Definition 3 it follows that β ( 1 ) , | β ( ϱ ) | , | b 1 ( ϱ ) | , | b 2 ( ϱ ) | < for | ϱ | < 1 . Thus, it follows from Lemma 5 that κ i < , i = 1 , 2 , 3 .
Proof of Remark 3. 
Cases for β ( 1 ) and β ( ϱ ) follow straightforwardly from the assumptions. Consider b 1 ( ϱ ) . Note that for any p,
| b 1 ( ϱ ) | l 1 , l 2 = 1 | β l 1 | | β l 2 | | ϱ | | l 1 l 2 | = l 1 , l 2 = 1 | β l 1 | | ϱ | | l 1 l 2 | / 2 | β l 2 | | ϱ | | l 1 l 2 | / 2 ( 1 / 2 ) l 1 , l 2 = 1 β l 1 2 | ϱ | | l 1 l 2 | + β l 2 2 | ϱ | | l 1 l 2 | = l 1 = 1 β l 1 2 l 2 = 1 | ϱ | | l 1 l 2 | β ( 1 ) 1 + | ϱ | 1 | ϱ | <
by (S9). In a similar manner, it can be seen that | b 2 ( ϱ ) | β ( 1 ) | ϱ | 1 | ϱ | . □

6. Approximate Sparsity: An Example

In this section, we study the case when coefficients β j decay hyperbolically, i.e., β j = j 1 , j 1 . This assumption is analogous to the assumption of approximate sparsity, as defined by [21]. The authors of the aforementioned paper note that for approximately sparse models, the regression function can be well approximated by a linear combination of relatively few important regressors, which is one of the reasons of popularity of variable selection approaches such as LASSO ([36]) and its modifications (see, e.g., [37,38,39]). At the same time, approximate sparsity allows all coefficients β j to be nonzero, which is a more plausible assumption in many real world settings.
In order to derive the quantities in Theorem 2, we apply the results of Lemma 5. For this, we establish the expressions for the quantities in Definition 3.
Define the real dilogarithm function (see, e.g., [40]):
Li 2 ( x ) = 0 x log ( 1 u ) u d u , x 1 , x R .
(Here and below, 0 x = x 0 if x 0 .) For | x | 1 the real dilogarithm has a series representation,
Li 2 ( x ) = k = 1 x k k 2 .
Then,
β ( 1 ) = j = 1 1 j 2 = π 2 6 , β ( ϱ ) = j = 1 ϱ j j 2 = Li 2 ( ϱ ) .
Additionally, we have
d d ϱ Li 2 ( ϱ ) = log ( 1 ϱ ) ϱ .
Thus, by (68) and (74), we establish
β ( 1 ) ( ϱ ) = ϱ d d ϱ β ( ϱ ) = ϱ d d ϱ Li 2 ( ϱ ) = log ( 1 ϱ ) .
Next, note that
b 1 ( ϱ ) = i = 2 j = 1 i 1 ϱ i j i j = i = 2 k = 1 i 1 ϱ k i ( i k ) = k = 1 ϱ k i = k + 1 1 i ( i k ) = k = 1 ϱ k k l = 1 k 1 l = l = 1 1 l k = l ϱ k k = l = 1 1 l 0 ϱ x l 1 1 x d x = 0 ϱ log ( 1 x ) x ( 1 x ) d x = log 2 ( 1 ϱ ) 2 + Li 2 ( ϱ ) ,
where we have used identities
i = k + 1 1 i ( i k ) = 1 k l = 1 k 1 l , k 1 , k = l ϱ k k = 0 ϱ x l 1 1 x d x
and (72). Then, by (69), (74) and (75),
b 1 ( 1 ) ( ϱ ) = ϱ d d ϱ b 1 ( ϱ ) = log ( 1 ϱ ) 1 ϱ ,
whereas by (71),
b ( 2 ) ( ϱ ) = ϱ 2 d 2 b 1 ( ϱ ) d ϱ 2 + b 1 ( 1 ) ( ϱ ) = ϱ ϱ log ( 1 ϱ ) ( 1 ϱ ) 2 .
Furthermore, note that
b 2 ( ϱ ) = i = 2 j = 1 i 1 ϱ i + j i j = i = 2 ϱ i i j = 1 i 1 ϱ j j = i = 2 ϱ i i 0 ϱ j = 1 i 1 x j 1 d x = i = 1 ϱ i + 1 i + 1 0 ϱ 1 x i 1 x d x = log ( 1 ϱ ) i = 1 ϱ i i ϱ 0 ϱ i = 1 ϱ i i x i 1 1 x ϱ 1 1 x d x = log ( 1 ϱ ) i = 1 ϱ i i 0 ϱ i = 1 ( ϱ x ) i i 1 x ( 1 x ) d x = log 2 ( 1 ϱ ) + 0 ϱ log ( 1 ϱ x ) x ( 1 x ) d x = 1 2 log 2 ( 1 ϱ ) Li 2 ( ϱ 2 ) ,
where the last equality follows from Lemma A1. Next, by (69), (74) and (76) we have
b 2 ( 1 ) ( ϱ ) = log 1 ϱ 2 ϱ log ( 1 ϱ ) 1 ϱ .
Thus, we can apply Lemma 5(i) and arrive at the following expression for κ 1 :
κ 1 = π 2 6 + log 2 ( 1 ϱ ) + 2 Li 2 ( ϱ ) .
Similarly, for κ 2 , by collecting and simplifying the terms, by Lemmas 5(ii) and A1, we have
κ 2 = 1 + ϱ 2 1 ϱ 2 π 2 6 + 2 Li 2 ( ϱ ) 2 log ( 1 ϱ ) 1 ϱ + log 2 ( 1 ϱ ) ϱ 2 1 ϱ 2 = 1 1 ϱ 2 ( ( 1 + ϱ 2 ) κ 1 log 2 ( 1 ϱ ) 2 ( 1 + ϱ ) log ( 1 ϱ ) ) .
Lastly, for κ 3 , by Lemma 5(iii), through simplification of terms, we get
κ 3 = 1 ( 1 ϱ 2 ) 2 ( ( 1 + 4 ϱ 2 + ϱ 4 ) π 2 6 + 2 Li 2 ( ϱ ) + log 2 ( 1 ϱ ) ϱ 2 ( 1 + ϱ 2 ) ( 3 ϱ + 4 ϱ 2 ) ( 1 + ϱ ) log ( 1 ϱ ) + ϱ ( 1 + ϱ ) 2 ) = κ 2 1 + 3 ϱ 2 1 ϱ 2 + 1 ( 1 ϱ 2 ) 2 ( 1 + ϱ + 2 ϱ 2 ) ( 1 + ϱ ) log ( 1 ϱ ) + ϱ ( 1 + ϱ ) 2 2 ϱ 4 κ 1 .
This allows us to apply Theorem 2 under the considered specification of the parameter β and conclude with the following corollary.
Corollary 2.
Assume a model (1) with (2) covariance structure and consider β j : = j 1 , j = 1 , , p . Let p = p n satisfies
p , p n c ( 0 , ) .
Then
X Y 2 2 n 2 κ 2 + c ( κ 1 + σ ε 2 ) n 3 / 2 d N ( 0 , s 2 ) ,
where
s 2 = 4 κ 2 2 + 4 ( κ 1 + σ ε 2 ) 2 κ 2 c + κ 3 + 2 c ( κ 1 + σ ε 2 ) 2 c + 1 + ϱ 2 1 ϱ 2 ,
and κ 1 , κ 2 and κ 3 are defined by (77)–(79), respectively.
In order to illustrate the results of Corollary 2, we end this section with a Monte Carlo simulation study where we generate 1000 independent replications of the statistic X Y 2 2 . The data is generated following the assumptions of Corollary 2. We consider the following parameter values: p = 100 , 500 , 1000 , 1500 , 2000 , 3000 , c = 1 , 2 , 5 , 10 , σ ε 2 = 1 , 2 , 4 , 10 . Due to the large number of resulting figures, we present only selected cases in Figure 1, Figure 2, Figure 3, Figure 4, Figure 5, Figure 6, Figure 7, Figure 8 and Figure 9, which demonstrate certain disparities in greater detail. Figures show the empirical cumulative distribution function (CDF) and the empirical probability density function (PDF), together with the limiting CDF and PDF of N ( 0 , s 2 ) by (80) for different parameter combinations. In addition, we present the corresponding Q-Q plots in order to inspect the tails of the resulting distributions in greater detail.
We find that for relatively small values of ϱ , the observed distribution of the statistic is fairly close to the limiting distribution even for small values of p , n and larger σ ε 2 , c (see, e.g., Figure 1, Figure 2, Figure 3 and Figure 4). However, slower convergence is more evident with increasing values of ϱ . Furthermore, for moderate values of ϱ , c , σ ε 2 , only with larger values of p we observe adequate convergence towards the limiting distribution (see Figure 5 and Figure 6). Similar behaviour is observed when the relation between the parameters ϱ , c , σ ε 2 is appropriately controlled: e.g., in Figure 7, we see comparable results to those presented by Figure 6, where the effect of the increase in parameter value ϱ is countered by a smaller value of σ ε 2 . Alternatively, analogous effects can be achieved when reducing the values of c, instead.
Figure 2. Comparison of the PDF and CDF (left) and the corresponding Q-Q plots (right) after 1000 replications from the Monte Carlo simulation of the statistic (80) with the limiting distribution N ( 0 , s 2 ) by the Corollary 2 (in black) for ϱ = 0.3 , c = 1 , σ ε 2 = 2 and p = 1500 , 2000 , 3000 .
Figure 2. Comparison of the PDF and CDF (left) and the corresponding Q-Q plots (right) after 1000 replications from the Monte Carlo simulation of the statistic (80) with the limiting distribution N ( 0 , s 2 ) by the Corollary 2 (in black) for ϱ = 0.3 , c = 1 , σ ε 2 = 2 and p = 1500 , 2000 , 3000 .
Mathematics 10 01657 g002
Figure 3. Comparison of the PDF and CDF (left) and the corresponding Q-Q plots (right) after 1000 replications from the Monte Carlo simulation of the statistic (80) with the limiting distribution N ( 0 , s 2 ) by the Corollary 2 (in black) for ϱ = 0.3 , c = 1 , σ ε 2 = 10 and p = 100 , 500 , 1000 .
Figure 3. Comparison of the PDF and CDF (left) and the corresponding Q-Q plots (right) after 1000 replications from the Monte Carlo simulation of the statistic (80) with the limiting distribution N ( 0 , s 2 ) by the Corollary 2 (in black) for ϱ = 0.3 , c = 1 , σ ε 2 = 10 and p = 100 , 500 , 1000 .
Mathematics 10 01657 g003
Figure 4. Comparison of the PDF and CDF (left) and the corresponding Q-Q plots (right) after 1000 replications from the Monte Carlo simulation of the statistic (80) with the limiting distribution N ( 0 , s 2 ) by the Corollary 2 (in black) for ϱ = 0.3 , c = 1 , σ ε 2 = 10 and p = 1500 , 2000 , 3000 .
Figure 4. Comparison of the PDF and CDF (left) and the corresponding Q-Q plots (right) after 1000 replications from the Monte Carlo simulation of the statistic (80) with the limiting distribution N ( 0 , s 2 ) by the Corollary 2 (in black) for ϱ = 0.3 , c = 1 , σ ε 2 = 10 and p = 1500 , 2000 , 3000 .
Mathematics 10 01657 g004
Figure 5. Comparison of the PDF and CDF (left) and the corresponding Q-Q plots (right) after 1000 replications from the Monte Carlo simulation of the statistic (80) with the limiting distribution N ( 0 , s 2 ) by the Corollary 2 (in black) for ϱ = 0.7 , c = 5 , σ ε 2 = 4 and p = 100 , 500 , 1000 .
Figure 5. Comparison of the PDF and CDF (left) and the corresponding Q-Q plots (right) after 1000 replications from the Monte Carlo simulation of the statistic (80) with the limiting distribution N ( 0 , s 2 ) by the Corollary 2 (in black) for ϱ = 0.7 , c = 5 , σ ε 2 = 4 and p = 100 , 500 , 1000 .
Mathematics 10 01657 g005
Figure 6. Comparison of the PDF and CDF (left) and the corresponding Q-Q plots (right) after 1000 replications from the Monte Carlo simulation of the statistic (80) with the limiting distribution N ( 0 , s 2 ) by the Corollary 2 (in black) for ϱ = 0.7 , c = 5 , σ ε 2 = 4 and p = 1500 , 2000 , 3000 .
Figure 6. Comparison of the PDF and CDF (left) and the corresponding Q-Q plots (right) after 1000 replications from the Monte Carlo simulation of the statistic (80) with the limiting distribution N ( 0 , s 2 ) by the Corollary 2 (in black) for ϱ = 0.7 , c = 5 , σ ε 2 = 4 and p = 1500 , 2000 , 3000 .
Mathematics 10 01657 g006
Figure 7. Comparison of the PDF and CDF (left) and the corresponding Q-Q plots (right) after 1000 replications from the Monte Carlo simulation of the statistic (80) with the limiting distribution N ( 0 , s 2 ) by the Corollary 2 (in black) for ϱ = 0.9 , c = 5 , σ ε 2 = 1 and p = 1500 , 2000 , 3000 .
Figure 7. Comparison of the PDF and CDF (left) and the corresponding Q-Q plots (right) after 1000 replications from the Monte Carlo simulation of the statistic (80) with the limiting distribution N ( 0 , s 2 ) by the Corollary 2 (in black) for ϱ = 0.9 , c = 5 , σ ε 2 = 1 and p = 1500 , 2000 , 3000 .
Mathematics 10 01657 g007
Finally, slow convergence is observed for large values of ϱ , c , σ ε 2 , as expected (see Figure 8 and Figure 9). In such cases, the simulation results suggest that even larger values of p , n would be needed for more accurate results.
Figure 8. Comparison of the PDF and CDF (left) and the corresponding Q-Q plots (right) after 1000 replications from the Monte Carlo simulation of the statistic (80) with the limiting distribution N ( 0 , s 2 ) by the Corollary 2 (in black) for ϱ = 0.95 , c = 10 , σ ε 2 = 4 and p = 100 , 500 , 1000 .
Figure 8. Comparison of the PDF and CDF (left) and the corresponding Q-Q plots (right) after 1000 replications from the Monte Carlo simulation of the statistic (80) with the limiting distribution N ( 0 , s 2 ) by the Corollary 2 (in black) for ϱ = 0.95 , c = 10 , σ ε 2 = 4 and p = 100 , 500 , 1000 .
Mathematics 10 01657 g008
Figure 9. Comparison of the PDF and CDF (left) and the corresponding Q-Q plots (right) after 1000 replications from the Monte Carlo simulation of the statistic (80) with the limiting distribution N ( 0 , s 2 ) by the Corollary 2 (in black) for ϱ = 0.95 , c = 10 , σ ε 2 = 4 and p = 1500 , 2000 , 3000 .
Figure 9. Comparison of the PDF and CDF (left) and the corresponding Q-Q plots (right) after 1000 replications from the Monte Carlo simulation of the statistic (80) with the limiting distribution N ( 0 , s 2 ) by the Corollary 2 (in black) for ϱ = 0.95 , c = 10 , σ ε 2 = 4 and p = 1500 , 2000 , 3000 .
Mathematics 10 01657 g009

7. Discussion

In this paper, we consider a specific KMS covariance structure due to its attractive properties and wide application possibilities for working with real world datasets. Moreover, our results could be extended further by considering a wider family of Toeplitz covariance structures. For instance, under specific constraints, one could employ the approaches proposed in [3] in order to extend the application of our results towards more complex covariance structures of the data.
Furthermore, for future work, it would be interesting to expand and examine the results by removing the assumption of independence between the observations X i , i = 1 , , n .
Finally, in this paper we have established both the exact and the asymptotic distributions of the statistic X Y 2 2 (see (8), (10) and (34)). Both distributions could be used for estimating β , σ ε 2 or related measures (e.g., by applying the method of moments or maximum likelihood estimation) in future research. Such research direction could open up interesting avenues when compared with popular LASSO type methods in high-dimensional linear regression. Similar approach is taken by [10], who construct maximum likelihood estimators for the signal strength β 2 2 in a high-dimensional regression context. Note that the results by [10] are achieved under certain strong restrictions, which are consistent with the related literature (see, e.g., [7,41,42]). In our case, we impose weaker assumptions; therefore, both our asymptotic or exact results could be used in order to extend the approaches in the aforementioned literature.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/math10101657/s1, Proof of Lemmas 4, A2, A5, Proof of result (A9) of Lemma 5(ii), Proof of results (A13)–(A14) of Lemma 5(iii).

Author Contributions

Conceptualization, S.J., R.L.; methodology, S.J., R.L.; investigation, S.J., R.L.; writing—original draft preparation, S.J., R.L.; writing—review and editing, S.J., R.L.; visualization, S.J., R.L. All authors have read and agreed to the published version of the manuscript.

Funding

Supported by grant No. S-MIP-20-16 from the Research Council of Lithuania.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Acknowledgments

The authors would like to thank the anonymous Referees for their very constructive and detailed comments and suggestions on the first version of the manuscript.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A

Throughout the proofs we use the notation C to mark generic constants, the specific values of which can change from line to line.

Appendix A.1. Technical Lemmas

Lemma A1.
Assume that | ϱ | < 1 . Then,
0 ϱ log ( 1 ϱ x ) x ( 1 x ) d x = 1 2 Li 2 ( ϱ 2 ) + log 2 ( 1 ϱ ) ,
where Li 2 denotes the real dilogarithm function. (Recall, that for ϱ < 0 , by 0 ϱ we denote ϱ 0 .)
Proof. 
Write,
0 ϱ log ( 1 ϱ x ) x ( 1 x ) d x = 0 ϱ log ( 1 ϱ x ) x d x + 0 ϱ log ( 1 ϱ x ) 1 x d x .
By (72), we have
0 ϱ log ( 1 ϱ x ) x d x = Li 2 ( ϱ 2 ) .
It remains to show that
0 ϱ log ( 1 ϱ x ) 1 x d x = 1 2 Li 2 ( ϱ 2 ) log 2 ( 1 ϱ ) .
Indeed, by substitution v = ϱ ϱ x , we have
0 ϱ log ( 1 ϱ x ) 1 x d x = ϱ ϱ 2 ϱ log ( 1 ϱ + v ) v d v = ϱ ϱ 2 ϱ log ( 1 + v 1 ϱ ) v d v log 2 ( 1 ϱ ) .
Further, by substitution w = v 1 ϱ , we have
ϱ ϱ 2 ϱ log ( 1 + v 1 ϱ ) v d v = ϱ 1 ϱ ϱ log ( 1 w ) w d w = Li 2 ( ϱ ) Li 2 ϱ 1 ϱ = Li 2 ( ϱ ) + Li 2 ( ϱ ) + 1 2 log 2 ( 1 ϱ )
  = 1 2 Li 2 ( ϱ 2 ) + log 2 ( 1 ϱ ) ,
where for (A4) and (A5) we apply the easily verifyable identities (see, e.g., [43]):
Li 2 x x 1 = Li 2 ( x ) 1 2 log 2 ( 1 x ) , x < 1 , Li 2 ( x ) + Li 2 ( x ) = 1 2 Li 2 ( x 2 ) , | x | < 1 .
Thus, (A3) and (A5) imply (A2), which concludes the proof. □
Lemma A2.
Assume that j = 1 β j 2 < and | ϱ | < 1 . Then, the following inequalities hold:
(i)
| l = p + 1 l = l + 1 β l β l ϱ l l | C l = p + 1 β l 2 .
(ii)
| l = p + 1 l = l + 1 β l β l ϱ l l ( l l ) | C l = p + 1 β l 2 .
(iii)
| l = 1 p l = p + 1 β l β l ϱ l l | C l = p + 1 β l 2 .
(iv)
| l = 1 p l = p + 1 β l β l ϱ l + l | C l = p + 1 β l 2 .
Proof. 
Lemma A3.
Assume that sup j 1 | β j | j α < , α > 1 / 2 and that | ϱ | < 1 . Then,
| j = 1 p β j ϱ p j | = o ( p 1 / 4 ) .
Proof. 
We have
| j = 1 p β j ϱ p j | j = 1 p | β j | | ϱ | p j + j = p + 1 p | β j | | ϱ | p j sup j 1 | β j | j = 1 p | ϱ | p j + p α / 2 j = p + 1 p | β j | p α / 2 | ϱ | p j sup j 1 | β j | j = 1 p | ϱ | p j + p α / 2 sup j 1 | β j | j α j = p + 1 p | ϱ | p j C j = 1 p | ϱ | p j + p α / 2 j = p + 1 p | ϱ | p j C | ϱ | p p + p α / 2 .
Here we used the fact that j = p + 1 p | ϱ | p j ( 1 | ϱ | ) 1 < . Thus,
p 1 / 4 | j = 1 p β j ϱ p j | C p 1 / 4 | ϱ | p p + p 1 4 α 2 0 .
Remark A1.
The assumption sup j 1 | β j | j α < , for α > 1 / 2 , implies that j = 1 β j 2 < :
j = 1 β j 2 = j = 1 β j 2 j 2 α j 2 α sup j 1 β j 2 j 2 α k = 1 k 2 α < .
Lemma A4.
Assume that the assumptions of Theorem 1 hold. Then,
κ 2 , p = o ( p ) .
Proof. 
Observe, that
κ 2 , p = k = 1 p l = 1 p β l ϱ | k l | 2 = k = 1 p l 1 , l 2 = 1 p β l 1 β l 2 ϱ | k l 1 | + | k l 2 | l 1 , l 2 = 1 p | β l 1 | | β l 2 | k = 1 p | ϱ | | k l 1 | + | k l 2 | C l = 1 p | β l 1 | 2 = o ( p )
where (A7) follows from (S9). Meanwhile, l = 1 p | β l 1 | = o ( p 1 / 2 ) , since
l = 1 p | β l | = l = 1 p 1 / 2 | β l | + l = p 1 / 2 + 1 p | β l | p 1 / 4 l = 1 β l 2 1 / 2 + p 1 / 2 l = p 1 / 2 + 1 β l 2 1 / 2 = o ( p 1 / 2 ) .
Lemma A5.
Assume that j = 1 β j 2 < and | ϱ | < 1 . Define θ k ( p ) = j = 1 p β j ϱ | k j | . Then,
| i , j , k = 1 p ϱ | i j | + θ i ( p ) θ j ( p ) ϱ | i k | + θ i ( p ) θ k ( p ) ϱ | k j | + θ k ( p ) θ j ( p ) | = o ( p 3 / 2 ) .
Proof. 

Appendix A.2. Proof of Lemma 5

Here and throughout the proof we employ the notation as in Definition 3.
(i) Note that, by (65) and (67), we have
κ 1 , p = k = 1 p β k 2 + 2 k = 2 p l = 1 k 1 β k β l ϱ k l β ( 1 ) + 2 b 1 ( ϱ ) as p .
(ii) Write
κ 2 , p = l = 1 p k = 1 p β l 2 ϱ 2 | k l | + 2 l > l k = 1 p β l β l ϱ | k l | ϱ | k l | .
From here, it can be seen that
κ 2 , p β ( 1 ) 1 + ϱ 2 1 ϱ 2 β ( ϱ 2 ) 1 1 ϱ 2 + 2 b 1 ( 1 ) ( ϱ ) + b 1 ( ϱ ) 1 + ϱ 2 1 ϱ 2 b 2 ( ϱ ) 1 1 ϱ 2 .
Technical details of the proof of (A9) are presented in Supplementary Materials, Section S4.
(iii) Consider
κ 3 , p = l = 1 p β l 2 J 1 ( l ) + 2 l < l β l β l J 2 ( l , l ) ,
where
J 1 ( l ) : = k , k = 1 p ϱ | k k | ϱ | k l | ϱ | k l | 1 { l = l } ,
J 2 ( l , l ) : = k , k = 1 p ϱ | k k | ϱ | k l | ϱ | k l | 1 { l < l } .
Then, as p , using the notation in Definition 3, we have that
l = 1 p β l 2 J 1 ( l ) β ( 1 ) 1 + 4 ϱ 2 + ϱ 4 ( 1 ϱ 2 ) 2 β ( ϱ 2 ) 1 + 3 ϱ 2 ( 1 ϱ 2 ) 2 2 1 ϱ 2 β ( 1 ) ( ϱ 2 ) ,
and
l > l β l 2 J 2 ( l , l ) 1 2 ( 1 ϱ 2 ) 2 ( b ( 2 ) ( ϱ ) ( 1 ϱ 2 ) 2 + 3 b 1 ( 1 ) ( ϱ ) ( 1 ϱ 4 ) + 2 b 1 ( ϱ ) ( 1 + 4 ϱ 2 + ϱ 4 ) 2 b 2 ( 1 ) ( ϱ ) ( 1 ϱ 2 ) 2 b 2 ( ϱ ) ( 1 + 3 ϱ 2 ) ) .
Technical details of the proof of (A13) and (A14) are omitted here and presented in the Supplementary Materials, Section S5. This concludes the proof.

References

  1. Kac, M.; Murdock, W.; Szego, G. On the eigen-values of certain Hermitian forms. J. Linear Ration. Mech. Anal. 1953, 2, 767–800. [Google Scholar] [CrossRef]
  2. Fikioris, G. Spectral properties of Kac–Murdock–Szegö matrices with a complex parameter. Linear Algebra Appl. 2018, 553, 182–210. [Google Scholar] [CrossRef] [Green Version]
  3. Yang, Y.; Zhou, J.; Pan, J. Estimation and optimal structure selection of high-dimensional Toeplitz covariance matrix. J. Multivar. Anal. 2021, 184, 104739. [Google Scholar] [CrossRef]
  4. Liang, K.Y.; Zeger, S.L. Longitudinal data analysis using generalized linear models. Biometrika 1986, 73, 13–22. [Google Scholar] [CrossRef]
  5. Rangan, S. Generalized approximate message passing for estimation with random linear mixing. In Proceedings of the 2011 IEEE International Symposium on Information Theory Proceedings, St. Petersburg, Russia, 31 July–5 August 2011; pp. 2168–2172. [Google Scholar]
  6. Vila, J.P.; Schniter, P. Expectation-maximization Gaussian-mixture approximate message passing. IEEE Trans. Signal Process. 2013, 61, 4658–4672. [Google Scholar] [CrossRef] [Green Version]
  7. Dicker, L.H. Variance estimation in high-dimensional linear models. Biometrika 2014, 101, 269–284. [Google Scholar] [CrossRef]
  8. Diggle, P.J.; Giorgi, E. Model-Based Geostatistics for Global Public Health: Methods and Applications; Chapman and Hall/CRC: Boca Raton, FL, USA, 2019. [Google Scholar]
  9. Patil, A.R.; Kim, S. Combination of ensembles of regularized regression models with resampling-based lasso feature selection in high dimensional data. Mathematics 2020, 8, 110. [Google Scholar] [CrossRef] [Green Version]
  10. Dicker, L.H.; Erdogdu, M.A. Maximum likelihood for variance estimation in high-dimensional linear models. In Proceedings of the 19th International Conference on Artificial Intelligence and Statistics, Cadiz, Spain, 9–11 May 2016. [Google Scholar]
  11. Carpentier, A.; Verzelen, N. Adaptive estimation of the sparsity in the Gaussian vector model. Ann. Stat. 2019, 47, 93–126. [Google Scholar] [CrossRef] [Green Version]
  12. Carpentier, A.; Verzelen, N. Optimal sparsity testing in linear regression model. Bernoulli 2021, 27, 727–750. [Google Scholar] [CrossRef]
  13. Gaunt, R.E. Rates of Convergence of Variance-Gamma Approximations via Stein’s Method. Ph.D. Thesis, The Queen’s College, University of Oxford, Oxford, UK, 2013. [Google Scholar]
  14. Gaunt, R.E. Variance-Gamma approximation via Stein’s method. Electron. J. Probab. 2014, 19, 1–33. [Google Scholar] [CrossRef]
  15. Gaunt, R.E. Products of normal, beta and gamma random variables: Stein operators and distributional theory. Braz. J. Probab. Stat. 2018, 32, 437–466. [Google Scholar] [CrossRef] [Green Version]
  16. Gaunt, R.E. A note on the distribution of the product of zero-mean correlated normal random variables. Stat. Neerl. 2019, 73, 176–179. [Google Scholar] [CrossRef] [Green Version]
  17. Ing, C.K. Model selection for high-dimensional linear regression with dependent observations. Ann. Stat. 2020, 48, 1959–1980. [Google Scholar] [CrossRef]
  18. Cha, J.; Chiang, H.D.; Sasaki, Y. Inference in high-dimensional regression models without the exact or Lp sparsity. arXiv 2021, arXiv:2108.09520. [Google Scholar]
  19. Shibata, R. Asymptotically Efficient Selection of the Order of the Model for Estimating Parameters of a Linear Process. Ann. Stat. 1980, 8, 147–164. [Google Scholar] [CrossRef]
  20. Ing, C.K. Accumulated Prediction Errors, Information Criteria and Optimal Forecasting for Autoregressive Time Series. Ann. Stat. 2007, 35, 1238–1277. [Google Scholar] [CrossRef] [Green Version]
  21. Belloni, A.; Chen, D.; Chernozhukov, V.; Hansen, C. Sparse Models and Methods for Optimal Instruments With an Application to Eminent Domain. Econometrica 2012, 80, 2369–2429. [Google Scholar] [CrossRef] [Green Version]
  22. Javanmard, A.; Montanari, A. Confidence intervals and hypothesis testing for high-dimensional regression. J. Mach. Learn. Res. 2014, 15, 2869–2909. [Google Scholar]
  23. Zhang, C.H.; Zhang, S.S. Confidence intervals for low dimensional parameters in high dimensional linear models. J. R. Stat. Soc. Ser. B Stat. Methodol. 2014, 76, 217–242. [Google Scholar] [CrossRef] [Green Version]
  24. Caner, M.; Kock, A.B. Asymptotically honest confidence regions for high dimensional parameters by the desparsified conservative Lasso. J. Econom. 2018, 203, 143–168. [Google Scholar] [CrossRef] [Green Version]
  25. Belloni, A.; Chernozhukov, V.; Chetverikov, D.; Hansen, C.; Kato, K. High-dimensional econometrics and regularized GMM. arXiv 2018, arXiv:1806.01888. [Google Scholar]
  26. Gold, D.; Lederer, J.; Tao, J. Inference for high-dimensional instrumental variables regression. J. Econom. 2020, 217, 79–111. [Google Scholar] [CrossRef] [Green Version]
  27. Ning, Y.; Peng, S.; Tao, J. Doubly Robust Semiparametric Difference-in-Differences Estimators with High-Dimensional Data. arXiv 2020, arXiv:2009.03151. [Google Scholar]
  28. Guo, Z.; Ćevid, D.; Bühlmann, P. Doubly Debiased Lasso: High-Dimensional Inference under Hidden Confounding. arXiv 2021, arXiv:2004.03758. [Google Scholar]
  29. Dai, Z.; Li, T.; Yang, M. Forecasting stock return volatility: The role of shrinkage approaches in a data-rich environment. J. Forecast. 2021, 1–17. [Google Scholar] [CrossRef]
  30. Dai, Z.; Zhu, H.; Zhang, X. Dynamic spillover effects and portfolio strategies between crude oil, gold and Chinese stock markets related to new energy vehicle. Energy Econom. 2022, 109, 105959. [Google Scholar] [CrossRef]
  31. Dai, Z.; Zhu, H. Time-varying spillover effects and investment strategies between WTI crude oil, natural gas and Chinese stock markets related to belt and road initiative. Energy Econ. 2022, 108, 105883. [Google Scholar] [CrossRef]
  32. Sánchez Garca, J.; Cruz Rambaud, S. Machine Learning Regularization Methods in High-Dimensional Monetary and Financial VARs. Mathematics 2022, 10, 877. [Google Scholar] [CrossRef]
  33. Yi, J.; Tang, N. Variational Bayesian inference in high-dimensional linear mixed models. Mathematics 2022, 10, 463. [Google Scholar] [CrossRef]
  34. Madan, D.B.; Carr, P.P.; Chang, E.C. The Variance Gamma Process and Option Pricing. Rev. Financ. 1998, 2, 79–105. [Google Scholar] [CrossRef] [Green Version]
  35. Kotz, S.; Kozubowski, T.; Podgórski, K. The Laplace Distribution and Generalizations: A Revisit with Applications to Communications, Economics, Engineering, and Finance; Birkhäuser: Boston, MA, USA, 2001. [Google Scholar]
  36. Tibshirani, R. Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Ser. B Methodol. 1996, 58, 267–288. [Google Scholar] [CrossRef]
  37. Zou, H. The adaptive lasso and its oracle properties. J. Am. Stat. Assoc. 2006, 101, 1418–1429. [Google Scholar] [CrossRef] [Green Version]
  38. Meinshausen, N. Relaxed Lasso. Comput. Stat. Data Anal. 2007, 52, 374–393. [Google Scholar] [CrossRef]
  39. Belloni, A.; Chernozhukov, V.; Wang, L. Square-root lasso: Pivotal recovery of sparse signals via conic programming. Biometrika 2011, 98, 791–806. [Google Scholar] [CrossRef] [Green Version]
  40. Morris, R. The Dilogarithm Function of a Real Argument. Math. Comput. 1979, 33, 778–787. [Google Scholar] [CrossRef]
  41. Bayati, M.; Erdogdu, M.A.; Montanari, A. Estimating lasso risk and noise level. In Proceedings of the Advances in Neural Information Processing Systems: 27th Annual Conference on Neural Information, Processing Systems 2013, Lake Tahoe, NV, USA, 5–10 December 2013; Volume 26. [Google Scholar]
  42. Janson, L.; Barber, R.F.; Candes, E. EigenPrism: Inference for high dimensional signal-to-noise ratios. J. R. Stat. Soc. Ser. B Stat. Methodol. 2017, 79, 1037–1065. [Google Scholar] [CrossRef] [Green Version]
  43. Maximon, L.C. The dilogarithm function for complex argument. Proc. R. Soc. Lond. Ser. A Math. Phys. Eng. Sci. 2003, 459, 2807–2819. [Google Scholar] [CrossRef]
Figure 1. Comparison of the PDF and CDF (left) and the corresponding Q-Q plots (right) after 1000 replications from the Monte Carlo simulation of the statistic (80) with the limiting distribution N ( 0 , s 2 ) by the Corollary 2 (in black) for ϱ = 0.3 , c = 1 , σ ε 2 = 2 and p = 100 , 500 , 1000 .
Figure 1. Comparison of the PDF and CDF (left) and the corresponding Q-Q plots (right) after 1000 replications from the Monte Carlo simulation of the statistic (80) with the limiting distribution N ( 0 , s 2 ) by the Corollary 2 (in black) for ϱ = 0.3 , c = 1 , σ ε 2 = 2 and p = 100 , 500 , 1000 .
Mathematics 10 01657 g001
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Jokubaitis, S.; Leipus, R. Asymptotic Normality in Linear Regression with Approximately Sparse Structure. Mathematics 2022, 10, 1657. https://doi.org/10.3390/math10101657

AMA Style

Jokubaitis S, Leipus R. Asymptotic Normality in Linear Regression with Approximately Sparse Structure. Mathematics. 2022; 10(10):1657. https://doi.org/10.3390/math10101657

Chicago/Turabian Style

Jokubaitis, Saulius, and Remigijus Leipus. 2022. "Asymptotic Normality in Linear Regression with Approximately Sparse Structure" Mathematics 10, no. 10: 1657. https://doi.org/10.3390/math10101657

APA Style

Jokubaitis, S., & Leipus, R. (2022). Asymptotic Normality in Linear Regression with Approximately Sparse Structure. Mathematics, 10(10), 1657. https://doi.org/10.3390/math10101657

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics