Asymptotic Normality in Linear Regression with Approximately Sparse Structure

Jokubaitis, Saulius; Leipus, Remigijus

doi:10.3390/math10101657

Open AccessArticle

Asymptotic Normality in Linear Regression with Approximately Sparse Structure

by

Saulius Jokubaitis

^†

and

Remigijus Leipus

^*,†

Faculty of Mathematics and Informatics, Institute of Applied Mathematics, Vilnius University, Naugarduko 24, LT-03225 Vilnius, Lithuania

^*

Author to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Mathematics 2022, 10(10), 1657; https://doi.org/10.3390/math10101657

Submission received: 1 March 2022 / Revised: 28 March 2022 / Accepted: 7 May 2022 / Published: 12 May 2022

(This article belongs to the Section D1: Probability and Statistics)

Download

Browse Figures

Versions Notes

Abstract

:

In this paper, we study the asymptotic normality in high-dimensional linear regression. We focus on the case where the covariance matrix of the regression variables has a KMS structure, in asymptotic settings where the number of predictors, p, is proportional to the number of observations, n. The main result of the paper is the derivation of the exact asymptotic distribution for the suitably centered and normalized squared norm of the product between predictor matrix,

X

, and outcome variable, Y, i.e., the statistic

∥ X^{'} {Y ∥}_{2}^{2}

, under rather unrestrictive assumptions for the model parameters

β_{j}

. We employ variance-gamma distribution in order to derive the results, which, along with the asymptotic results, allows us to easily define the exact distribution of the statistic. Additionally, we consider a specific case of approximate sparsity of the model parameter vector

β

and perform a Monte Carlo simulation study. The simulation results suggest that the statistic approaches the limiting distribution fairly quickly even under high variable multi-correlation and relatively small number of observations, suggesting possible applications to the construction of statistical testing procedures for the real-world data and related problems.

Keywords:

linear regression; sparsity; asymptotic normality; variance-gamma distribution

MSC:

60F05; 62E20; 62J99

1. Introduction

Consider a linear regression model

\begin{matrix} Y = X β + ε, \end{matrix}

(1)

where

Y : = {(y_{1}, \dots, y_{n})}^{'} \in R^{n \times 1}

are n observations of outcome and

X = {(X_{1}, \dots, X_{n})}^{'} \in R^{n \times p}

are p-dimensional predictors with

X_{1}, \dots, X_{n}

being i.i.d.

p \times 1

random vectors

X_{i} = {(X_{1, i}, \dots, X_{p, i})}^{'}

, which are normally distributed with zero mean and the covariance matrix

Σ

, denoted

X_{i} \overset{d}{=} N_{p} (0, Σ)

. We assume that the covariance matrix

Σ

has a form

\begin{matrix} Σ & = & {(ϱ^{| i - j |})}_{i, j = 1}^{p} = [\begin{matrix} 1 & ϱ & \dots & ϱ^{p - 1} \\ ϱ & 1 & \dots & ϱ^{p - 2} \\ ⋮ & ⋮ & ⋱ & ⋮ \\ ϱ^{p - 1} & ϱ^{p - 2} & \dots & 1 \end{matrix}], \end{matrix}

(2)

if

0 < | ϱ | < 1

and

Σ = I_{p}

if

ϱ = 0

(here and below

I_{p}

denotes the

p \times p

identity matrix). This matrix is often called the Kac–Murdock–Szego (KMS) matrix, originally introduced in [1]. As the autocorrelation matrix of corresponding causal AR(1) processes, the KMS matrix is positive definite and is considered due to the wide array of applications in the literature and its well known spectral properties (see, e.g., [2] for a thorough literature review). When carefully chosen, such a structure could well-approximate a wide array of possible covariance structures (see, e.g., [3] for a more general approach with various Toeplitz covariance structures). Furthermore,

ε : = {(ε_{1}, \dots, ε_{n})}^{'} \in R^{n \times 1} \overset{d}{=} N_{n} (0, σ_{ε}^{2} I_{n})

are unobserved i.i.d. errors with

E ε_{i} = 0

,

Var (ε_{i}) = σ_{ε}^{2} > 0

, and

β : = {(β_{1}, \dots, β_{p})}^{'} \in R^{p \times 1}

is an unknown p-dimensional parameter. In practice, the assumption is that

E X_{i} = 0

can be untenable, and it may be appropriate to add an intercept to the linear model (1); however, for simplicity, throughout this paper we will assume that the intercept is known and the variables are centered. Similar settings are considered when dealing with certain geospatial data, longitudinal studies, microarray data, and research on approximate message passing algorithms (see, e.g., [4,5,6,7,8,9]).

This paper is concerned with the derivation of the exact asymptotic distribution for the suitably centered and normalized squared norm

∥ X^{'} {Y ∥}_{2}^{2}

under the assumption of the KMS type covariance structure in (2), where p and n are assumed to be large. Throughout the paper, we assume that

p, n \to \infty

and

p / n \to c \in (0, \infty)

. We are particularly interested in cases where

p > n

. Statistics of such form arise in various applications in the context of high-dimensional linear regression, and under normality assumptions general results can be derived using random matrix theory through Wishart distributions (see, e.g., [7,10,11,12]). Dealing with such statistics typically require strong restrictions on the model parameters

β

; however, in this paper, we only require that

{∥ β ∥}_{2}^{2} < \infty

is satisfied. Moreover, our results could be extended by using

β

-generating functions (e.g., parameters of FARIMA models). In comparison to the related papers, ref. [12] assumes exact sparsity, while [7,10] require approximate sparsity.

We approach the problem following an observation by [13] that the distribution of product of Gaussian random variables admits a variance-gamma distribution, which results in a set of attractive properties. We contribute to the literature on variance-gamma distribution by extending the results by [14,15,16]. We demonstrate that, along with the derivation of the asymptotic distribution of

∥ X^{'} {Y ∥}_{2}^{2}

, this approach allows us to define the exact distribution of the statistic given any fixed values

p, n

, which can be expressed through a combination of gamma and normal random variables. In the related literature we were not able to find results for the exact distribution and asymptotic analysis of the statistic

∥ X^{'} {Y ∥}_{2}^{2}

based on the variance-gamma distribution. Furthermore, we deem that such a result is much easier to work with than when considering the characteristic or density functions of

∥ X^{'} {Y ∥}_{2}^{2}

straightforwardly. Therefore, in addition to the

ℓ_{2}

-norm statistic, we argue that the obtained results can be easily extended towards alternative forms of the statistic, e.g., by using a different norm, which would reduce the problem to manipulating variance-gamma distribution, thus suggesting possible further research cases and useful extensions.

Additionally, we examine a specific case of parameter

β

by considering

β_{j} = j^{- 1}

,

j \geq 1

. Similar structures of the vector

β

are often found in the literature when approximate sparsity of the coefficients in the linear regression model (1) is assumed. See, e.g., [17,18] for a broader view towards sparsity requirements and its implications to specific high-dimensional algorithms; refs. [19,20] for model selection problems in autoregressive time series models; refs. [21,22,23,24,25,26,27,28] for applications on inference of high-dimensional models and high-dimensional instrumental variable (IV) regression models; or [29,30,31,32,33] for recent applications of high-dimensional and sparse methods with financial and economic data. Performing Monte Carlo simulations, we find that the empirical distributions of the corresponding statistic approach the limiting distribution reasonably quickly even for large values of

ϱ

and c. These results suggest that the assumption of sparse structure can be included in the applications and statistical tests, thus, could be further extended following the literature on testing for sparsity or construction of signal-to-noise ratio estimators (see, e.g., [7,10,11,12]).

In this paper,

\overset{d}{=}

,

\overset{d}{\to}

and

\overset{P}{\to}

denote the equality of distributions, convergence of distributions and convergence in probability, respectively. The notation of C represents a generic positive constant which may assume different values at various locations, and

1_{A}

denotes the indicator function of a set A.

The structure of the paper is as follows. In Section 2, we present the main results of the paper. In Section 3, we present useful properties of variance-gamma distribution, which are used in Section 4 in order to prove some auxiliary results. In Section 5, we present the proof of the main result. Finally, in Section 6, we provide an example of the main result under imposed approximate sparsity assumption for the parameter

β

of the model (1). Technical results are presented in Appendix A, while, for brevity, some straightforward yet tedious proofs are presented in the Supplementary Material.

2. Main Results

In this section we formulate the main results on the normality of statistic

∥ X^{'} {Y ∥}_{2}^{2}

. Introduce the notations:

\begin{matrix} κ_{1, p} & : = & \sum_{k = 1}^{p} \sum_{l = 1}^{p} β_{k} β_{l} ϱ^{| k - l |}, \end{matrix}

(3)

\begin{matrix} κ_{2, p} & : = & \sum_{k = 1}^{p} {(\sum_{l = 1}^{p} β_{l} ϱ^{| k - l |})}^{2}, \end{matrix}

(4)

\begin{matrix} κ_{3, p} & : = & \sum_{k, l, j, j^{'} = 1}^{p} β_{j} β_{j^{'}} ϱ^{| k - j |} ϱ^{| l - j^{'} |} ϱ^{| k - l |} . \end{matrix}

(5)

It can be observed that under

\sum_{j = 1}^{\infty} β_{j}^{2} < \infty

, there exist limits

\begin{matrix} κ_{i} & = & lim_{p \to \infty} κ_{i, p}, i = 1, 2, 3 . \end{matrix}

Additionally,

κ_{2, p} \geq 0

. Since

{(ϱ^{| i - j |})}_{i, j = 1}^{p}

is positive semi-definite,

κ_{i, p} \geq 0

,

i = 1, 3

. Indeed,

\sum_{k, l = 1}^{p} ϱ^{| k - l |} a_{k} a_{l} \geq 0

, thus it suffices to take

a_{k} = β_{k}

for

i = 1

and

a_{k} = \sum_{j = 1}^{p} β_{j} ϱ^{| k - j |}

for

i = 3

.

Our first main result is the following theorem.

Theorem 1.

Assume the model in (1) with covariance structure in (2). Let

n \to \infty

and let

p = p_{n}

satisfy

\begin{matrix} p \to \infty, & \frac{p}{n} \to c \in (0, \infty) . \end{matrix}

(6)

Let also the

β_{j}

satisfy

\begin{matrix} \sum_{j = 1}^{\infty} β_{j}^{2} & < & \infty . \end{matrix}

(7)

Then

\begin{matrix} \frac{∥ X^{'} {Y ∥}_{2}^{2} - n^{2} κ_{2, p} - p n (κ_{1, p} + σ_{ε}^{2})}{n^{3 / 2}} & \overset{d}{\to} & N (0, s^{2}), \end{matrix}

(8)

where variance

s^{2}

has the structure

\begin{matrix} s^{2} & = & 4 κ_{2}^{2} + 4 (κ_{1} + σ_{ε}^{2}) (2 κ_{2} c + κ_{3}) + 2 c {(κ_{1} + σ_{ε}^{2})}^{2} (c + \frac{1 + ϱ^{2}}{1 - ϱ^{2}}) . \end{matrix}

(9)

Our second main result deals with the case where the centering sequence in (8) is modified to include the limiting values of

κ_{i, p}

,

i = 1, 2

.

Theorem 2.

Let the assumptions of Theorem 1 hold. In addition, assume that

\sum_{j = p + 1}^{\infty} β_{j}^{2} = o (p^{- 1 / 2})

and

{sup}_{j \geq 1} | β_{j} | j^{α} < \infty

with

α > 1 / 2

. Then,

\begin{matrix} \frac{∥ X^{'} {Y ∥}_{2}^{2} - n^{2} (κ_{2} + c (κ_{1} + σ_{ε}^{2}))}{n^{3 / 2}} & \overset{d}{\to} & N (0, s^{2}) . \end{matrix}

(10)

The proofs of these theorems are given in Section 5.

Remark 1.

For alternative expressions of

κ_{1}

,

κ_{2}

and

κ_{3}

, see Lemma 5 below.

Define

\begin{matrix} β (x) & : = & \sum_{j = 1}^{\infty} β_{j}^{2} x^{j}, | x | \leq 1 . \end{matrix}

The following corollary deals with the case when

ϱ = 0

, i.e.,

Σ = I_{p}

. The result follows from Theorem 2, noting that in this case

κ_{i} = β (1)

,

i = 1, 2, 3

.

Corollary 1.

Assume a model (1) with covariance structure

Σ = I_{p}

. Let assumptions (6) and (7) be satisfied. In addition, assume that

\sum_{j = p + 1}^{\infty} β_{j}^{2} = o (p^{- 1 / 2})

and

{sup}_{j \geq 1} | β_{j} | j^{α} < \infty

with

α > 1 / 2

. Then,

\begin{matrix} \frac{∥ X^{'} {Y ∥}_{2}^{2} - n^{2} (β (1) (1 + c) + c σ_{ε}^{2})}{n^{3 / 2}} & \overset{d}{\to} & N (0, s^{2}), \end{matrix}

(11)

where

\begin{matrix} s^{2} & = & 2 β {(1)}^{2} (4 + 5 c + c^{2}) + 4 β (1) σ_{ε}^{2} (1 + 3 c + c^{2}) + 2 σ_{ε}^{4} (c + c^{2}) . \end{matrix}

(12)

3. Properties of the Variance-Gamma Distribution

In this section, we provide some properties of the variance-gamma distribution, which will be used in the following proofs.

Recall that the variance-gamma distribution with parameters

r > 0

,

θ \in R

,

σ > 0

and

μ \in R

has density

\begin{matrix} f^{VG} (x) = \frac{1}{σ \sqrt{π} Γ (r / 2)} e^{θ (x - μ) / σ^{2}} {(\frac{| x - μ |}{2 \sqrt{θ^{2} + σ^{2}}})}^{(r - 1) / 2} K_{(r - 1) / 2} (\frac{\sqrt{θ^{2} + σ^{2}}}{σ^{2}} | x - μ |), \end{matrix}

(13)

where

x \in R

,

K_{ν} (x)

is the modified Bessel function of the second kind. For a random variable Q with density (13), we write

Q \overset{d}{=} VG (r, θ, σ, μ)

. Let

Γ (a, b)

,

a > 0

,

b > 0

, denote the gamma distribution with density

\begin{matrix} f^{G} (x) & = & \frac{b^{a}}{Γ (a)} x^{a - 1} e^{- b x}, x > 0 . \end{matrix}

It holds that

\begin{matrix} Q & \overset{d}{=} & μ + θ W_{r} + σ \sqrt{W_{r}} U, \end{matrix}

(14)

where

W_{r} \overset{d}{=} Γ (r / 2, 1 / 2)

,

U \overset{d}{=} N (0, 1)

,

W_{r}

and U are independent. The characteristic function of

Q \overset{d}{=} VG (r, θ, σ, μ)

has a form (see, e.g., [34,35])

\begin{matrix} φ_{Q} (t) & = & \frac{e^{i μ t}}{{(1 + σ^{2} t - 2 i θ t)}^{r / 2}}, t \in R . \end{matrix}

(15)

We note the following properties of the variance-gamma distribution.

(i): If $Q_{1} \overset{d}{=} VG (r_{1}, θ, σ, μ_{1})$ and $Q_{2} \overset{d}{=} VG (r_{2}, θ, σ, μ_{2})$ are independent random variables then

$\begin{matrix} Q_{1} + Q_{2} & \overset{d}{=} & VG (r_{1} + r_{2}, θ, σ, μ_{1} + μ_{2}) . \end{matrix}$
(ii): If $Q \overset{d}{=} VG (r, θ, σ, μ)$ , then for any $a > 0$

$\begin{matrix} a Q & \overset{d}{=} & VG (r, a θ, a σ, a μ) . \end{matrix}$

The following proposition is crucial for our purposes.

Proposition 1.

(i) If

{(ξ_{1}, ξ_{2})}^{'} \overset{d}{=} N_{2} (0, Σ)

, where

Σ = (\begin{matrix} σ_{1}^{2} & ϱ σ_{1} σ_{2} \\ ϱ σ_{1} σ_{2} & σ_{2}^{2} \end{matrix})

, then

\begin{matrix} ξ_{1} ξ_{2} & \overset{d}{=} & VG (1, ϱ σ_{1} σ_{2}, \sqrt{1 - ϱ^{2}} σ_{1} σ_{2}, 0) . \end{matrix}

(ii) If

{(ξ_{1 j}, ξ_{2 j})}^{'}

,

j = 1, \dots, n

, are i.i.d. random vectors with common distribution

N_{2} (0, Σ)

, then

\begin{matrix} \sum_{j = 1}^{n} ξ_{1 j} ξ_{2 j} & \overset{d}{=} & VG (n, ϱ σ_{1} σ_{2}, \sqrt{1 - ϱ^{2}} σ_{1} σ_{2}, 0) \end{matrix}

and

\begin{matrix} \sum_{j = 1}^{n} ξ_{1 j} ξ_{2 j} & \overset{d}{=} & σ_{1} σ_{2} (ϱ W_{n} + \sqrt{1 - ϱ^{2}} \sqrt{W_{n}} U), \end{matrix}

where

W_{n} \overset{d}{=} Γ (n / 2, 1 / 2)

and

U \overset{d}{=} N (0, 1)

are independent random variables.

(iii) Assume that

{(ξ_{1 j}^{(1)}, \dots, ξ_{1 j}^{(p)}, ξ_{2 j})}^{'}

,

j = 1, \dots, n

, are i.i.d. copies of

{(ξ_{1}^{(1)}, \dots, ξ_{1}^{(p)}, ξ_{2})}^{'} \overset{d}{=} N_{p + 1} (0, Σ^{(p)})

and let

ϱ^{(k l)} : = Corr (ξ_{1}^{(k)}, ξ_{1}^{(l)})

,

ϱ^{(k)} : = Corr (ξ_{1}^{(k)}, ξ_{2})

,

{(σ_{1}^{(k)})}^{2} : = Var (ξ_{1}^{(k)})

,

σ_{2}^{2} : = Var (ξ_{2})

,

k, l = 1, \dots, p

. Then

(\begin{matrix} \sum_{j = 1}^{n} ξ_{1 j}^{(1)} ξ_{2 j} \\ ⋮ \\ \sum_{j = 1}^{n} ξ_{1 j}^{(p)} ξ_{2 j} \end{matrix}) \overset{d}{=} (\begin{matrix} σ_{1}^{(1)} σ_{2} (ϱ^{(1)} W_{n} + \sqrt{1 - {(ϱ^{(1)})}^{2}} \sqrt{W_{n}} U_{1}) \\ ⋮ \\ σ_{1}^{(p)} σ_{2} (ϱ^{(p)} W_{n} + \sqrt{1 - {(ϱ^{(p)})}^{2}} \sqrt{W_{n}} U_{p}) \end{matrix}),

where

{(U_{1}, \dots, U_{p})}^{'} \overset{d}{=} N_{p} (0, Σ_{U})

,

Σ_{U} = (σ_{U}^{(k l)})

with

\begin{matrix} σ_{U}^{(k, l)} & = & E U_{k} U_{l} = \frac{ϱ^{(k l)} - ϱ^{(k)} ϱ^{(l)}}{\sqrt{1 - {(ϱ^{(k)})}^{2}} \sqrt{1 - {(ϱ^{(l)})}^{2}}}, k, l = 1, \dots, p . \end{matrix}

(16)

Proof.

The statements in (i), (ii) are well known, see e.g., [16]. The proof of part (iii) follows from Lemma 1. □

Lemma 1.

Assume that

{(ξ_{1}^{(1)}, \dots, ξ_{1}^{(p)}, ξ_{2})}^{'}

has distribution

N_{p + 1} (0, Σ^{(p)})

and let

ϱ^{(k l)} : = Corr (ξ_{1}^{(k)}, ξ_{1}^{(l)})

,

ϱ^{(k)} : = Corr (ξ_{1}^{(k)}, ξ_{2})

,

{(σ_{1}^{(k)})}^{2} : = Var (ξ_{1}^{(k)})

,

σ_{2}^{2} : = Var (ξ_{2})

,

k, l = 1, \dots, p

. Then

(\begin{matrix} ξ_{1}^{(1)} ξ_{2} \\ ⋮ \\ ξ_{1}^{(p)} ξ_{2} \end{matrix}) \overset{d}{=} (\begin{matrix} σ_{1}^{(1)} σ_{2} (ϱ^{(1)} W_{1} + \sqrt{1 - {(ϱ^{(1)})}^{2}} \sqrt{W_{1}} U_{1}) \\ ⋮ \\ σ_{1}^{(p)} σ_{2} (ϱ^{(p)} W_{1} + \sqrt{1 - {(ϱ^{(p)})}^{2}} \sqrt{W_{1}} U_{p}) \end{matrix}),

where

W_{1} \overset{d}{=} Γ (1 / 2, 1 / 2)

,

{(U_{1}, \dots, U_{p})}^{'}

is, independent of

W_{1}

, zero mean normal vector with covariances in (16).

Proof.

It suffices to prove that for any

(t_{1}, \dots, t_{p}) \in R^{p}

it holds

\begin{matrix} (\sum_{k = 1}^{p} t_{k} ξ_{1}^{(k)}) ξ_{2} & \overset{d}{=} & σ_{2} \sum_{k = 1}^{p} t_{k} σ_{1}^{(k)} (ϱ^{(k)} W_{1} + \sqrt{1 - {(ϱ^{(k)})}^{2}} \sqrt{W_{1}} U_{k}) . \end{matrix}

(17)

Since

\begin{matrix} \sum_{k = 1}^{p} t_{k} ξ_{1}^{(k)} & \overset{d}{=} & N (0, \sum_{k, l = 1}^{p} t_{k} t_{l} ϱ^{(k l)} σ_{1}^{(k)} σ_{1}^{(l)}), ξ_{2} \overset{d}{=} N (0, σ_{2}^{2}), \end{matrix}

by Proposition 1(i) we obtain that

\begin{matrix} (\sum_{k = 1}^{p} t_{k} ξ_{1}^{(k)}) ξ_{2} & \overset{d}{=} & VG (1, σ_{2} \sum_{k = 1}^{p} t_{k} ϱ^{(k)} σ_{1}^{(k)}, σ_{2} \sqrt{\sum_{k, l = 1}^{p} t_{k} t_{l} σ_{1}^{(k)} σ_{1}^{(l)} (ϱ^{(k l)} - ϱ^{(k)} ϱ^{(l)})}, 0) . \end{matrix}

(18)

For the right-hand side of (17) write

\begin{matrix} σ_{2} \sum_{k = 1}^{p} t_{k} σ_{1}^{(k)} (ϱ^{(k)} W_{1} + \sqrt{1 - {(ϱ^{(k)})}^{2}} \sqrt{W_{1}} U_{k}) \\ = & (σ_{2} \sum_{k = 1}^{p} t_{k} σ_{1}^{(k)} ϱ^{(k)}) W_{1} + (σ_{2} \sum_{k = 1}^{p} t_{k} σ_{1}^{(k)} \sqrt{1 - {(ϱ^{(k)})}^{2}} U_{k}) \sqrt{W_{1}} . \end{matrix}

Here, by (16),

\begin{matrix} σ_{2} \sum_{k = 1}^{p} t_{k} σ_{1}^{(k)} \sqrt{1 - {(ϱ^{(k)})}^{2}} U_{k} & \overset{d}{=} & σ_{2} {(\sum_{k, l = 1}^{p} t_{k} t_{l} σ_{1}^{(k)} σ_{1}^{(l)} \sqrt{1 - {(ϱ^{(k)})}^{2}} \sqrt{1 - {(ϱ^{(l)})}^{2}} E (U_{k} U_{l}))}^{1 / 2} U_{1} \\ = & σ_{2} {(\sum_{k, l = 1}^{p} t_{k} t_{l} σ_{1}^{(k)} σ_{1}^{(l)} (ϱ^{(k l)} - ϱ^{(k)} ϱ^{(l)}))}^{1 / 2} U_{1} . \end{matrix}

Note that

U_{1} \overset{d}{=} N (0, 1)

. So that,

\begin{matrix} σ_{2} \sum_{k = 1}^{p} t_{k} σ_{1}^{(k)} (ϱ^{(k)} W_{1} + \sqrt{1 - {(ϱ^{(k)})}^{2}} \sqrt{W_{1}} U_{k}) \\ \overset{d}{=} & (σ_{2} \sum_{k = 1}^{p} t_{k} σ_{1}^{(k)} ϱ^{(k)}) W_{1} + σ_{2} {(\sum_{k, l = 1}^{p} t_{k} t_{l} σ_{1}^{(k)} σ_{1}^{(l)} (ϱ^{(k l)} - ϱ^{(k)} ϱ^{(l)}))}^{1 / 2} \sqrt{W_{1}} U_{1}, \end{matrix}

which, by representation (14), has the same VG distribution as that in (18). This proves (17). □

4. Some Auxiliary Lemmas

In this section we establish some auxiliary results that will be used in the proofs of Theorems 1 and 2. Here and throughout the paper we remove the upper indices when working with triangular schemes of random variables, e.g.,

(V_{1}, \dots, V_{p}) \equiv (V_{1}^{(p)}, \dots, V_{p}^{(p)})

, whenever it is clear from the context.

Lemma 2.

Let

V = {(V_{1}, \dots, V_{p})}^{'} \overset{d}{=} N_{p} (0, Σ_{V}^{(p)})

, where

Σ_{V}^{(p)}

is positive definite covariance matrix and

tr ({(Σ_{V}^{(p)})}^{2}) = o (p^{2})

,

p \to \infty

. Then

\begin{matrix} \frac{1}{p} \sum_{k = 1}^{p} (V_{k}^{2} - E V_{k}^{2}) & \overset{P}{\to} & 0 a s p \to \infty . \end{matrix}

(19)

If, in addition,

p^{- 1} tr (Σ_{V}^{(p)}) \to 1

, then

\begin{matrix} \frac{1}{p} \sum_{k = 1}^{p} V_{k}^{2} & \overset{P}{\to} & 1 a s p \to \infty . \end{matrix}

(20)

Proof.

Due to the Spectral Theorem, we have

\begin{matrix} V^{'} V & = & \sum_{k = 1}^{p} V_{k}^{2} \overset{d}{=} \sum_{j = 1}^{p} λ_{j}^{(p)} {\tilde{Z}}_{j}^{2}, \end{matrix}

(21)

where

{\tilde{Z}}_{j}

are i.i.d. standard normal variables and

λ_{1}^{(p)}, \dots, λ_{p}^{(p)}

are the eigenvalues of

Σ_{V}^{(p)}

. Observe from (21) that

\begin{matrix} E V^{'} V & = & \sum_{j = 1}^{p} λ_{j}^{(p)} = tr (Σ_{V}^{(p)}), \end{matrix}

(22)

\begin{matrix} Var (V^{'} V) & = & Var (\sum_{j = 1}^{p} λ_{j}^{(p)} {\tilde{Z}}_{j}^{2}) = 2 \sum_{j = 1}^{p} {(λ_{j}^{(p)})}^{2} = 2 tr ({(Σ_{V}^{(p)})}^{2}) . \end{matrix}

(23)

Thus, by (22) and (23), for any

ϵ > 0

\begin{matrix} P (| \frac{1}{p} (V^{'} V - E V^{'} V) | > ϵ) & \leq & \frac{Var (V^{'} V)}{p^{2} ϵ^{2}} \to 0, p \to \infty, \end{matrix}

and the relation in (19) follows due to assumption

tr ({(Σ_{V}^{(p)})}^{2}) = o (p^{2})

. Finally, if

p^{- 1} tr (Σ_{V}^{(p)}) \to 1

, by (22), the result (19) leads to (20). □

Remark 2.

The assumption on matrix

Σ_{V} = Σ_{V}^{(p)}

in Lemma 2, requiring that

tr (Σ_{V}^{2}) = o (p^{2})

, is not overly restrictive: assume, for example, that

Σ_{V} = (σ^{(i, j)})

is any KMS type covariance matrix, as in (2). Then, it can be seen that

\begin{matrix} tr (Σ_{V}^{2}) & = & \sum_{i, j = 1}^{p} {(σ^{(i, j)})}^{2} = \sum_{i, j = 1}^{p} ϱ^{2 | i - j |} \\ = & \sum_{| m | < p} (p - | m |) ϱ^{2 | m |} \leq p \sum_{| m | < p} | m | ϱ^{2 | m |} = O (p) . \end{matrix}

Lemma 3.

Assume that

{\tilde{Z}}_{1}, {\tilde{Z}}_{2}, \dots

are i.i.d.

N (0, 1)

random variables. For any

p \in N

define

\begin{matrix} ζ_{j}^{(p)} & : = & ν_{j}^{(p)} ({\tilde{Z}}_{j}^{2} - 1) + γ_{j}^{(p)} \sqrt{p} {\tilde{Z}}_{j}, j = 1, \dots, p, \end{matrix}

(24)

where

ν_{j}^{(p)}, j = 1, \dots, p

, are positive scalars, and

γ_{j}^{(p)}

,

j = 1, \dots, p

, are real scalars, such that

\begin{matrix} \sum_{j = 1}^{p} {(ν_{j}^{(p)})}^{3} & = & o ({(\sum_{j = 1}^{p} Var (ζ_{j}^{(p)}))}^{3 / 2}), \end{matrix}

(25)

\begin{matrix} p \sum_{j = 1}^{p} {(γ_{j}^{(p)})}^{2} ν_{j}^{(p)} & = & o ({(\sum_{j = 1}^{p} Var (ζ_{j}^{(p)}))}^{3 / 2}) \end{matrix}

(26)

with

Var (ζ_{j}^{(p)}) = 2 {(ν_{j}^{(p)})}^{2} + p {(γ_{j}^{(p)})}^{2} .

Then, as

p \to \infty

,

\begin{matrix} \frac{\sum_{j = 1}^{p} ζ_{j}^{(p)}}{\sqrt{\sum_{j = 1}^{p} Var (ζ_{j}^{(p)})}} & \overset{d}{\to} & N (0, 1) . \end{matrix}

(27)

Proof.

The proof uses the method of cumulants and is structured as follows:

(i): We establish the moment-generating function of $ζ_{j}^{(p)}$ , $M_{ζ_{j}^{(p)}} (t) : = E e^{t ζ_{j}^{(p)}}$ , and $log (M_{ζ_{j}^{(p)}} (t))$ ;
(ii): We find $G (t; p)$ which corresponds to the cumulant generating function of the sum $\sum_{j = 1}^{p} ζ_{j}^{(p)}$ ;
(iii): We find $K (t; p) : = G (\frac{t}{\sqrt{\sum_{j = 1}^{p} (2 {(ν_{j}^{(p)})}^{2} + p {(γ_{j}^{(p)})}^{2})}}; p)$ , which corresponds to the cumulant generating function of the left hand side of (27);
(iv): Finally, in order to prove (27), we show that the cumulants $ϰ_{j}^{(p)}$ , generated by $K (t; p)$ , satisfy $ϰ_{1}^{(p)} = 0$ , $ϰ_{2}^{(p)} = 1$ and $ϰ_{d}^{(p)} \to 0, d = 3, 4, \dots$ , as $p \to \infty$ .

Step 1. First, rewrite

\begin{matrix} ζ_{j}^{(p)} & = & ν_{j}^{(p)} {({\tilde{Z}}_{j} + \frac{γ_{j}^{(p)} \sqrt{p}}{2 ν_{j}^{(p)}})}^{2} - ν_{j}^{(p)} - \frac{{(γ_{j}^{(p)})}^{2} p}{4 ν_{j}^{(p)}} . \end{matrix}

(28)

Here,

ψ_{j}^{(p)} : = {({\tilde{Z}}_{j} + \frac{γ_{j}^{(p)} \sqrt{p}}{2 ν_{j}^{(p)}})}^{2}

has a noncentral chi-squared distribution with the following moment-generating function:

\begin{matrix} M_{ψ_{j}^{(p)}} (t) & : = & E e^{t ψ_{j}^{(p)}} = {(1 - 2 t)}^{- 1 / 2} exp \{{(\frac{γ_{j}^{(p)}}{2 ν_{j}^{(p)}})}^{2} t p {(1 - 2 t)}^{- 1}\}, | t | < \frac{1}{2} . \end{matrix}

(29)

Therefore, by (28) and (29),

\begin{matrix} M_{ζ_{j}^{(p)}} (t) & = & M_{ψ_{j}^{(p)}} (ν_{j}^{(p)} t) exp \{- t ν_{j}^{(p)} - t p {(\frac{γ_{j}^{(p)}}{2 ν_{j}^{(p)}})}^{2}\} \\ = & {(1 - 2 ν_{j}^{(p)} t)}^{- \frac{1}{2}} exp \{\frac{{(γ_{j}^{(p)})}^{2}}{4 ν_{j}^{(p)}} t p {(1 - 2 ν_{j}^{(p)} t)}^{- 1} - t (ν_{j}^{(p)} + \frac{{(γ_{j}^{(p)})}^{2} p}{4 ν_{j}^{(p)}})\}, \end{matrix}

for

| t | < {(2 ν_{j}^{(p)})}^{- 1}

, and

\begin{matrix} log (M_{ζ_{j}^{(p)}} (t)) & = & {(\frac{γ_{j}^{(p)}}{2 ν_{j}^{(p)}})}^{2} p t ν_{j}^{(p)} {(1 - 2 ν_{j}^{(p)} t)}^{- 1} - \frac{1}{2} log (1 - 2 ν_{j}^{(p)} t) - t (ν_{j}^{(p)} + \frac{{(γ_{j}^{(p)})}^{2} p}{4 ν_{j}^{(p)}}) \\ = \frac{1}{2} ({(γ_{j}^{(p)})}^{2} p + 2 {(ν_{j}^{(p)})}^{2}) t^{2} + \frac{{(γ_{j}^{(p)})}^{2} p}{2} \sum_{k = 3}^{\infty} t^{k} 2^{k - 2} {(ν_{j}^{(p)})}^{k - 2} + \frac{1}{2} \sum_{k = 3}^{\infty} \frac{2^{k} {(ν_{j}^{(p)})}^{k} t^{k}}{k} . \end{matrix}

Step 2. Since

ζ_{1}^{(p)}, \dots, ζ_{j}^{(p)}

are independent, we have that

\begin{matrix} G (t; p) & = & \sum_{j = 1}^{p} log M_{ζ_{j}^{(p)}} (t) = \frac{t^{2}}{2} \sum_{j = 1}^{p} ({(γ_{j}^{(p)})}^{2} p + 2 {(ν_{j}^{(p)})}^{2}) \\ + \frac{p}{2} \sum_{k = 3}^{\infty} 2^{k - 2} t^{k} \sum_{j = 1}^{p} {(γ_{j}^{(p)})}^{2} {(ν_{j}^{(p)})}^{k - 2} + \frac{1}{2} \sum_{k = 3}^{\infty} \frac{2^{k}}{k} t^{k} \sum_{j = 1}^{p} {(ν_{j}^{(p)})}^{k} . \end{matrix}

Step 3. It can be observed that

\begin{matrix} K (t; p) & = & G (\frac{t}{\sqrt{\sum_{j = 1}^{p} (2 {(ν_{j}^{(p)})}^{2} + p {(γ_{j}^{(p)})}^{2})}}; p) \\ = & \frac{t^{2}}{2} + \frac{1}{2} \sum_{k = 3}^{\infty} 2^{k - 2} t^{k} \frac{p \sum_{j = 1}^{p} {(γ_{j}^{(p)})}^{2} {(ν_{j}^{(p)})}^{k - 2}}{{(\sum_{j = 1}^{p} (2 {(ν_{j}^{(p)})}^{2} + {(γ_{j}^{(p)})}^{2} p))}^{k / 2}} \\ + \frac{1}{2} \sum_{k = 3}^{\infty} \frac{2^{k}}{k} t^{k} \frac{\sum_{j = 1}^{p} {(ν_{j}^{(p)})}^{k}}{{(\sum_{j = 1}^{p} (2 {(ν_{j}^{(p)})}^{2} + {(γ_{j}^{(p)})}^{2} p))}^{k / 2}} = \sum_{k = 1}^{\infty} ϰ_{k}^{(p)} \frac{t^{k}}{k!}, \end{matrix}

where

ϰ_{1}^{(p)} = 0

,

ϰ_{2}^{(p)} = 1

, and for

k \geq 3

,

\begin{matrix} ϰ_{k}^{(p)} & = & \frac{k! 2^{k - 3} p \sum_{j = 1}^{p} {(γ_{j}^{(p)})}^{2} {(ν_{j}^{(p)})}^{k - 2} + (k - 1)! 2^{k - 1} \sum_{j = 1}^{p} {(ν_{j}^{(p)})}^{k}}{{(\sum_{j = 1}^{p} (2 {(ν_{j}^{(p)})}^{2} + {(γ_{j}^{(p)})}^{2} p))}^{k / 2}} . \end{matrix}

(30)

Step 4. In order to prove that (27) holds, it remains to show that, as

p \to \infty

,

ϰ_{d}^{(p)} \to 0

for all

d \geq 3

. By (30), it is equivalent to showing that for any fixed

k \geq 3

, as

p \to \infty

,

\begin{matrix} \frac{\sum_{j = 1}^{p} {(ν_{j}^{(p)})}^{k}}{{(\sum_{j = 1}^{p} (2 {(ν_{j}^{(p)})}^{2} + {(γ_{j}^{(p)})}^{2} p))}^{k / 2}} & \to & 0, \end{matrix}

(31)

\begin{matrix} \frac{p \sum_{j = 1}^{p} {(γ_{j}^{(p)})}^{2} {(ν_{j}^{(p)})}^{k - 2}}{{(\sum_{j = 1}^{p} (2 {(ν_{j}^{(p)})}^{2} + {(γ_{j}^{(p)})}^{2} p))}^{k / 2}} & \to & 0 . \end{matrix}

(32)

In order to prove (31) we use induction. The case for

k = 3

holds by assumption. Assuming that (31) holds for fixed

k \geq 3

, we have

\begin{matrix} \frac{\sum_{j = 1}^{p} {(ν_{j}^{(p)})}^{k + 1}}{{(\sum_{j = 1}^{p} (2 {(ν_{j}^{(p)})}^{2} + {(γ_{j}^{(p)})}^{2} p))}^{(k + 1) / 2}} & \leq & \frac{{(\sum_{j^{'} = 1}^{p} {(ν_{j^{'}}^{(p)})}^{2})}^{1 / 2} \sum_{j = 1}^{p} {(ν_{j}^{(p)})}^{k}}{{(\sum_{j = 1}^{p} (2 {(ν_{j}^{(p)})}^{2} + {(γ_{j}^{(p)})}^{2} p))}^{(k + 1) / 2}} \\ \leq & \frac{{(\sum_{j^{'} = 1}^{p} (2 {(ν_{j^{'}}^{(p)})}^{2} + {(γ_{j^{'}}^{(p)})}^{2} p))}^{1 / 2} \sum_{j = 1}^{p} {(ν_{j}^{(p)})}^{k}}{{(\sum_{j = 1}^{p} (2 {(ν_{j}^{(p)})}^{2} + {(γ_{j}^{(p)})}^{2} p))}^{(k + 1) / 2}} \\ = & \frac{\sum_{j = 1}^{p} {(ν_{j}^{(p)})}^{k}}{{(\sum_{j = 1}^{p} (2 {(ν_{j}^{(p)})}^{2} + {(γ_{j}^{(p)})}^{2} p))}^{k / 2}} \to 0, \end{matrix}

concluding that (31) holds for all

k \geq 3

. The proof for (32) is analogous: the case for

k = 3

holds by assumption; thus, we repeat the same arguments as with (31) and conclude that (32) holds for all

k \geq 3

. This concludes the proof of the lemma. □

5. Proof of the Main Results

In this section we give the proofs of Theorems 1 and 2. Throughout the proofs, we express corresponding constants in terms of

κ_{i, p}

and

κ_{i}

,

i = 1, 2, 3

, introduced in (3)–(5). Recall that

κ_{i, p} \geq 0

, and, by Remark 3,

κ_{i} < \infty

, for

i = 1, 2, 3

.

Proof of Theorem 1.

Write

\begin{matrix} ∥ X^{'} {Y ∥}_{2}^{2} & = & H_{1}^{2} + \dots + H_{p}^{2} = : H, \end{matrix}

where

\begin{matrix} H_{k} & : = & \sum_{j = 1}^{n} X_{k, j} (\sum_{l = 1}^{p} β_{l} X_{l, j} + ε_{j}), k = 1, \dots, p . \end{matrix}

Denote

Z_{j} : = \sum_{l = 1}^{p} β_{l} X_{l, j} + ε_{j}

,

j = 1, \dots, n

. By covariance structure (2) and

X_{k, j} \overset{d}{=} N (0, 1)

,

ε_{j} \overset{d}{=} N (0, σ_{ε}^{2})

, we have

Z_{j} \overset{d}{=} N (0, σ_{Z}^{2})

, where

σ_{Z}^{2} = \sum_{l, l^{'} = 1}^{p} β_{l} β_{l^{'}} ϱ^{| l - l^{'} |} + σ_{ε}^{2}

and

Cov (X_{k, j}, Z_{j}) = \sum_{l = 1}^{p} β_{l} ϱ^{| k - l |}

.

Applying Proposition 1(iii) with

ξ_{1 j}^{(k)} = X_{k, j}

,

ξ_{2 j} = Z_{j}

, and

σ_{1}^{(k)} = 1

,

σ_{2, p} = σ_{Z}

,

θ_{k}^{(p)} : = ϱ^{(k)} = σ_{Z}^{- 1} \sum_{l = 1}^{p} β_{l} ϱ^{| k - l |}

, where

ϱ^{(k l)} = ϱ^{| k - l |}

, we obtain that

\begin{matrix} ∥ X^{'} {Y ∥}_{2}^{2} & \overset{d}{=} & σ_{2, p}^{2} \sum_{k = 1}^{p} {(θ_{k}^{(p)} W_{n} + \sqrt{1 - {(θ_{k}^{(p)})}^{2}} \sqrt{W_{n}} U_{k})}^{2}, \end{matrix}

where

W_{n} \overset{d}{=} Γ (n / 2, 1 / 2)

and

{(U_{1}, \dots, U_{p})}^{'} \overset{d}{=} N_{p} (0, Σ_{U}^{(p)})

with

Σ_{U}^{(p)} = (σ_{U}^{(k, l)})

defined as (see (16)):

\begin{matrix} σ_{U}^{(k, l)} & = & \frac{ϱ^{| k - l |} - θ_{k}^{(p)} θ_{l}^{(p)}}{\sqrt{1 - {(θ_{k}^{(p)})}^{2}} \sqrt{1 - {(θ_{l}^{(p)})}^{2}}}, k, l = 1, \dots, p . \end{matrix}

(33)

By expanding the square we can write

\begin{matrix} ∥ X^{'} {Y ∥}_{2}^{2} & \overset{d}{=} & σ_{2, p}^{2} ({(W_{n} - E W_{n} + E W_{n})}^{2} \sum_{k = 1}^{p} {(θ_{k}^{(p)})}^{2} + 2 W_{n}^{3 / 2} \sum_{k = 1}^{p} θ_{k}^{(p)} \sqrt{1 - {(θ_{k}^{(p)})}^{2}} U_{k} \\ + (W_{n} - E W_{n}) \sum_{k = 1}^{p} (1 - {(θ_{k}^{(p)})}^{2}) U_{k}^{2} + E W_{n} \sum_{k = 1}^{p} (1 - {(θ_{k}^{(p)})}^{2}) U_{k}^{2}) . \end{matrix}

By further rearranging the right-hand side, we have

\begin{matrix} \frac{∥ X^{'} {Y ∥}_{2}^{2}}{n^{3 / 2}} & \overset{d}{=} & I_{1} + I_{2} + I_{3} + I_{4}, \end{matrix}

(34)

where

\begin{matrix} I_{1} & : = & \frac{σ_{2, p}^{2}}{n^{3 / 2}} {(W_{n} - E W_{n})}^{2} \sum_{k = 1}^{p} {(θ_{k}^{(p)})}^{2}, \end{matrix}

(35)

\begin{matrix} I_{2} & : = & \frac{σ_{2, p}^{2}}{n^{3 / 2}} (W_{n} - E W_{n}) (2 E W_{n} \sum_{k = 1}^{p} {(θ_{k}^{(p)})}^{2} + \sum_{k = 1}^{p} (1 - {(θ_{k}^{(p)})}^{2}) U_{k}^{2}), \end{matrix}

(36)

\begin{matrix} I_{3} & : = & \frac{σ_{2, p}^{2}}{n^{3 / 2}} 2 W_{n}^{3 / 2} \sum_{k = 1}^{p} θ_{k}^{(p)} \sqrt{1 - {(θ_{k}^{(p)})}^{2}} U_{k} + \frac{σ_{2, p}^{2}}{n^{3 / 2}} E W_{n} \sum_{k = 1}^{p} ((1 - {(θ_{k}^{(p)})}^{2}) U_{k}^{2} - 1), \end{matrix}

(37)

\begin{matrix} I_{4} & : = & \frac{σ_{2, p}^{2}}{n^{3 / 2}} (p E W_{n} + {(E W_{n})}^{2} \sum_{k = 1}^{p} {(θ_{k}^{(p)})}^{2}) . \end{matrix}

(38)

We will show that, as

p, n \to \infty

,

p / n \to c \in (0, \infty)

, the term

I_{1} = o_{P} (1)

, while the terms

I_{2}

and

I_{3}

are asymptotically normal. More precisely, we will show that

I_{2} \overset{d}{\to} N (0, s_{1}^{2})

and

I_{3} \overset{d}{\to} N (0, s_{2}^{2})

, where

s_{1}^{2}

and

s_{2}^{2}

are given by (44) and (62) below. Here, since

W_{n}

and

{(U_{1}, \dots, U_{p})}^{'}

are mutually independent for each n, it follows that

I_{2} + I_{3} \overset{d}{\to} N (0, s_{1}^{2} + s_{2}^{2})

. Finally, the term

I_{4}

defines the mean of the statistic, i.e.,

\begin{matrix} \frac{∥ X^{'} {Y ∥}_{2}^{2}}{n^{3 / 2}} - I_{4} & \overset{d}{\to} & N (0, s_{1}^{2} + s_{2}^{2}) . \end{matrix}

(39)

Thus, we will conclude by establishing that

I_{4} = \sqrt{n} (κ_{2, p} + p n^{- 1} (κ_{1, p} + σ_{ε}^{2}))

, while

s_{1}^{2} + s_{2}^{2} = s^{2}

, as in the statement of the theorem.

First, consider

I_{1}

defined in (35). We will show that

I_{1} = o_{P} (1)

. Denote

\begin{matrix} c_{2} & : = & lim_{p \to \infty} \sum_{k = 1}^{p} {(θ_{k}^{(p)})}^{2} = {(κ_{1} + σ_{ε}^{2})}^{- 1} κ_{2}, σ_{2}^{2} : = lim_{p \to \infty} σ_{2, p}^{2} = κ_{1} + σ_{ε}^{2} . \end{matrix}

(40)

It is clear that

c_{2} < \infty

and

σ_{2}^{2} < \infty

. Recall that, by CLT,

\begin{matrix} \frac{W_{n} - E W_{n}}{n^{1 / 2}} & \overset{d}{\to} & N (0, 2) . \end{matrix}

(41)

Therefore,

\begin{matrix} I_{1} & = & O (1) n^{- 1 / 2} {(\frac{W_{n} - E W_{n}}{n^{1 / 2}})}^{2} = o (1) O_{P} (1) = o_{P} (1) . \end{matrix}

(42)

Second, consider

I_{2}

, defined in (36). We will show that

\begin{matrix} I_{2} \overset{d}{\to} N (0, s_{1}^{2}) \end{matrix}

(43)

with

s_{1}^{2}

given by

\begin{matrix} s_{1}^{2} & = & 2 σ_{2}^{4} {(2 c_{2} + c)}^{2} = 8 κ_{2}^{2} + 8 c (κ_{1} + σ_{ε}^{2}) κ_{2} + 2 c^{2} {(κ_{1} + σ_{ε}^{2})}^{2} . \end{matrix}

(44)

Rewrite

\begin{matrix} I_{2} & = & σ_{2, p}^{2} \frac{W_{n} - E W_{n}}{n^{1 / 2}} (\frac{2 E W_{n}}{n} \sum_{k = 1}^{p} {(θ_{k}^{(p)})}^{2} + \frac{1}{n} \sum_{k = 1}^{p} (1 - {(θ_{k}^{(p)})}^{2}) U_{k}^{2}) . \end{matrix}

(45)

Applying (40) and (41) for the outer term of (45), we obtain

\begin{matrix} σ_{2, p}^{2} \frac{W_{n} - E W_{n}}{n^{1 / 2}} & \overset{d}{\to} & N (0, 2 σ_{2}^{4}) . \end{matrix}

We will show that the inner term of (45) approaches

2 c_{2} + c

. Since

E W_{n} = n

, by (40) and assumption

p / n \to c

it suffices to prove the convergence

\begin{matrix} \frac{1}{p} \sum_{k = 1}^{p} (1 - {(θ_{k}^{(p)})}^{2}) U_{k}^{2} & \overset{P}{\to} & 1 . \end{matrix}

(46)

Denote matrix

\begin{matrix} A & : = & diag (1 - {(θ_{1}^{(p)})}^{2}, \dots, 1 - {(θ_{p}^{(p)})}^{2}) . \end{matrix}

(47)

To prove (46), we apply Lemma 2 with

V_{j} = \sqrt{1 - {(θ_{j}^{(p)})}^{2}} U_{j}

,

j = 1, \dots, p

, and

Σ_{V}^{(p)} = A^{1 / 2} Σ_{U} A^{1 / 2}

. The conditions of Lemma 2 will hold if

tr ({(A^{1 / 2} Σ_{U} A^{1 / 2})}^{2}) = O (p)

and

p^{- 1} tr (A^{1 / 2} Σ_{U} A^{1 / 2}) \to 1

, as

p \to \infty

. Observe, that

\begin{matrix} tr ({(A^{1 / 2} Σ_{U} A^{1 / 2})}^{2}) & = & tr ({(A Σ_{U})}^{2}) \\ = & \sum_{k = 1}^{p} \sum_{k^{'} = 1}^{p} (1 - {(θ_{k}^{(p)})}^{2}) (1 - {(θ_{k^{'}}^{(p)})}^{2}) {(σ_{U}^{(k, k^{'})})}^{2} \\ = & \sum_{k = 1}^{p} \sum_{k^{'} = 1}^{p} (ϱ^{2 | k - k^{'} |} - 2 ϱ^{| k - k^{'} |} θ_{k}^{(p)} θ_{k^{'}}^{(p)} + {(θ_{k}^{(p)})}^{2} {(θ_{k^{'}}^{(p)})}^{2}) \\ = & \sum_{k = 1}^{p} \sum_{k^{'} = 1}^{p} ϱ^{2 | k - k^{'} |} - 2 {(κ_{1, p} + σ_{ε}^{2})}^{- 1} κ_{3, p} + {(κ_{1, p} + σ_{ε}^{2})}^{- 2} κ_{2, p}^{2} \\ = & \sum_{k = 1}^{p} \sum_{k^{'} = 1}^{p} ϱ^{2 | k - k^{'} |} + o (p) \sim p \frac{1 + ϱ^{2}}{1 - ϱ^{2}}, \end{matrix}

(48)

since

κ_{i} < \infty

,

i = 1, 2, 3

and

κ_{1, p} \geq 0

. Here we used (40) and the observation that

\begin{matrix} \sum_{k = 1}^{p} \sum_{k^{'} = 1}^{p} ϱ^{| k - k^{'} |} θ_{k}^{(p)} θ_{k^{'}}^{(p)} & = & \frac{κ_{3, p}}{κ_{1, p} + σ_{ε}^{2}} \to \frac{κ_{3}}{κ_{1} + σ_{ε}^{2}}, as p \to \infty . \end{matrix}

(49)

Similarly, we have

\begin{matrix} \frac{1}{p} tr (A^{1 / 2} Σ_{U} A^{1 / 2}) & = & \frac{1}{p} \sum_{k = 1}^{p} (1 - {(θ_{k}^{(p)})}^{2}) = 1 - \frac{κ_{2, p}}{p (κ_{1, p} + σ_{ε}^{2})} \to 1, \end{matrix}

since, by Lemma A4,

κ_{2, p} = o (p)

, while

κ_{1, p} \geq 0

,

κ_{1} < \infty

. This concludes the proof of (46).

Next, consider

I_{3}

, defined by (37). We will show that

\begin{matrix} I_{3} & \overset{d}{\to} & N (0, s_{2}^{2}), \end{matrix}

(50)

with

s_{2}^{2}

defined in (62). Write

\begin{matrix} I_{3} & = & σ_{2, p}^{2} (2 \frac{W_{n}^{3 / 2}}{n^{3 / 2}} b^{'} U + n^{- 1 / 2} (U^{'} A U - p)), \end{matrix}

where

U = {(U_{1}, \dots, U_{p})}^{'}

, A is defined by (47), and

\begin{matrix} b & = & {(θ_{1}^{(p)} \sqrt{1 - {(θ_{1}^{(p)})}^{2}}, \dots, θ_{p}^{(p)} \sqrt{1 - {(θ_{p}^{(p)})}^{2}})}^{'} . \end{matrix}

Observe that

n^{- 3 / 2} W_{n}^{3 / 2} \overset{P}{\to} 1

due to the Law of Large Numbers. Thus, since

W_{n}

and U are independent for any n and

p / n \to c

, it follows that

\begin{matrix} I_{3} & = & σ_{2, p}^{2} (2 b^{'} U + \sqrt{\frac{c}{p}} (U^{'} A U - p)) + o_{P} (1) . \end{matrix}

(51)

First, we consider the inner term of (51) and show that as

p \to \infty

,

\begin{matrix} 2 b^{'} U + \sqrt{\frac{c}{p}} (U A U^{'} - p) & \overset{d}{\to} & V_{2}, \end{matrix}

(52)

where

V_{2} \overset{d}{=} N (0, σ_{2}^{- 4} s_{2}^{2})

. Then, (50) readily follows from (51).

Recall, that

U \overset{d}{=} N_{p} (0, Σ_{U})

,

Σ_{U} > 0

. Further, let

\tilde{Z} \overset{d}{=} N_{p} (0, I_{p})

. Clearly, one has that

U \overset{d}{=} Σ_{U}^{1 / 2} \tilde{Z}

, where

Σ_{U}^{1 / 2}

denotes the symmetric square root of

Σ_{U}

. By the Spectral Theorem, we construct

V : = P^{'} \tilde{Z}

, where

V \overset{d}{=} N_{p} (0, I_{p})

and P is an orthogonal matrix that diagonalizes

Σ_{U}^{1 / 2} A Σ_{U}^{1 / 2}

, such that

P^{'} Σ_{U}^{1 / 2} A Σ_{U}^{1 / 2} P = Λ

, with

Λ = diag (λ_{1}^{(p)}, \dots, λ_{p}^{(p)})

comprised of the eigenvalues of

Σ_{U}^{1 / 2} A Σ_{U}^{1 / 2}

. Then,

\begin{matrix} \frac{\sqrt{c}}{\sqrt{p}} (U^{'} A U - p) + 2 b^{'} U & \overset{d}{=} & \frac{\sqrt{c}}{\sqrt{p}} (V^{'} Λ V - p) + 2 b^{'} Σ_{U}^{1 / 2} P V \\ = & \frac{\sqrt{c}}{\sqrt{p}} (\sum_{j = 1}^{p} (λ_{j}^{(p)} (V_{j}^{2} - 1) + g_{j}^{(p)} \sqrt{p} V_{j})) \\ = : & \frac{\sqrt{c}}{\sqrt{p}} \sum_{j = 1}^{p} {\tilde{V}}_{j}^{(p)}, \end{matrix}

(53)

where

(g_{1}^{(p)}, \dots, g_{p}^{(p)}) = 2 c^{- 1 / 2} b^{'} Σ_{U}^{1 / 2} P

, and

\begin{matrix} {\tilde{V}}_{j}^{(p)} & : = & λ_{j}^{(p)} (V_{j}^{2} - 1) + g_{j}^{(p)} \sqrt{p} V_{j}, j = 1, \dots, p . \end{matrix}

(54)

Clearly,

E {\tilde{V}}_{j}^{(p)} = 0

and

E {({\tilde{V}}_{j}^{(p)})}^{2} = 2 {(λ_{j}^{(p)})}^{2} + {(g_{j}^{(p)})}^{2} p

. Therefore, proving the result (52) is equivalent to showing:

\begin{matrix} \frac{\sqrt{c}}{\sqrt{p}} \sum_{j = 1}^{p} {\tilde{V}}_{j}^{(p)} & \overset{d}{\to} & N (0, σ_{2}^{- 4} s_{2}^{2}), \end{matrix}

(55)

where

\begin{matrix} σ_{2}^{- 4} s_{2}^{2} = c lim_{p \to \infty} p^{- 1} \sum_{j = 1}^{p} E {({\tilde{V}}_{j}^{(p)})}^{2} = 2 c lim_{p \to \infty} p^{- 1} \sum_{j = 1}^{p} {(λ_{j}^{(p)})}^{2} + c lim_{p \to \infty} \sum_{j = 1}^{p} {(g_{j}^{(p)})}^{2} . \end{matrix}

(56)

We prove (55) by applying Lemma 3 with

ν_{j}^{(p)} = λ_{j}^{(p)}

as the eigenvalues of

Σ_{U}^{1 / 2} A Σ_{U}^{1 / 2}

and

γ_{j}^{(p)} = g_{j}^{(p)}

. By the conditions of Lemma 3, we need to show that the following holds

\begin{matrix} \sum_{j = 1}^{p} {(λ_{j}^{(p)})}^{3} + p \sum_{j = 1}^{p} {(g_{j}^{(p)})}^{2} λ_{j}^{(p)} & = & o ({(\sum_{j = 1}^{p} (2 {(λ_{j}^{(p)})}^{2} + {(g_{j}^{(p)})}^{2} p))}^{3 / 2}) . \end{matrix}

(57)

First, observe that

p^{- 1} \sum_{j = 1}^{p} (2 {(λ_{j}^{(p)})}^{2} + {(g_{j}^{(p)})}^{2} p) \to C \in (0, \infty)

. Indeed, we have that

\sum_{j = 1}^{p} {(g_{j}^{(p)})}^{2} \to C_{g} \in (0, \infty)

, since

\begin{matrix} \sum_{j = 1}^{p} {(g_{j}^{(p)})}^{2} & = & 4 c^{- 1} (b^{'} Σ_{U}^{1 / 2} P) {(b^{'} Σ_{U}^{1 / 2} P)}^{'} = 4 c^{- 1} b^{'} Σ_{U} b \\ = & 4 c^{- 1} \sum_{j = 1}^{p} \sum_{j^{'} = 1}^{p} θ_{j}^{(p)} θ_{j^{'}}^{(p)} \sqrt{1 - {(θ_{j}^{(p)})}^{2}} \sqrt{1 - {(θ_{j^{'}}^{(p)})}^{2}} σ_{U}^{(j, j^{'})} \\ = & 4 c^{- 1} \sum_{j = 1}^{p} \sum_{j^{'} = 1}^{p} θ_{j}^{(p)} θ_{j^{'}}^{(p)} (ϱ^{| j - j^{'} |} - θ_{j}^{(p)} θ_{j^{'}}^{(p)}) \\ \to & 4 c^{- 1} {(κ_{1} + σ_{ε}^{2})}^{- 1} κ_{3} - 4 c^{- 1} {(κ_{1} + σ_{ε}^{2})}^{- 2} κ_{2}^{2} = C_{g} \end{matrix}

(58)

by (40) and (49).

Next, by (48), we find that

p^{- 1} \sum_{j = 1}^{p} {(λ_{j}^{(p)})}^{2} \to C_{λ} \in (0, \infty)

. Indeed, by (48), we have

\begin{matrix} \sum_{j = 1}^{p} {(λ_{j}^{(p)})}^{2} & = & tr ({(Σ_{U}^{1 / 2} A Σ_{U}^{1 / 2})}^{2}) = tr ({(Σ_{U} A)}^{2}) \\ = & \sum_{j = 1}^{p} \sum_{j^{'} = 1}^{p} ϱ^{2 | j - j^{'} |} + o (p) \sim p \frac{1 + ϱ^{2}}{1 - ϱ^{2}} . \end{matrix}

(59)

Thus, by (58) and (59), it follows that

p^{- 1} \sum_{j = 1}^{p} (2 c {(λ_{j}^{(p)})}^{2} + {(g_{j}^{(p)})}^{2} p) \to C \in (0, \infty)

and condition (57) reduces to:

\begin{matrix} \sum_{j = 1}^{p} {(λ_{j}^{(p)})}^{3} + p \sum_{j = 1}^{p} {(g_{j}^{(p)})}^{2} λ_{j}^{(p)} = o (p^{3 / 2}) . \end{matrix}

(60)

We show that (60) holds. For the first term of (60), we have

\begin{matrix} \sum_{j = 1}^{p} {(λ_{j}^{(p)})}^{3} & = & tr ({(Σ_{U}^{1 / 2} A Σ_{U}^{1 / 2})}^{3}) = tr ({(Σ_{U} A)}^{3}) \\ = & \sum_{i, j, k = 1}^{p} (1 - {(θ_{i}^{(p)})}^{2}) (1 - {(θ_{k}^{(p)})}^{2}) (1 - {(θ_{j}^{(p)})}^{2}) σ_{U}^{(i, j)} σ_{U}^{(i, k)} σ_{U}^{(k, j)} \\ = & \sum_{i, j, k = 1}^{p} (ϱ^{| i - j |} + θ_{i}^{(p)} θ_{j}^{(p)}) (ϱ^{| i - k |} + θ_{i}^{(p)} θ_{k}^{(p)}) (ϱ^{| k - j |} + θ_{k}^{(p)} θ_{j}^{(p)}) \\ = & o (p^{3 / 2}), \end{matrix}

(61)

where the last equality follows from Lemma A5. For the second term of (60), observe that by Hölder’s inequality and (61),

\begin{matrix} p \sum_{j = 1}^{p} {(g_{j}^{(p)})}^{2} λ_{j}^{(p)} & \leq & p {(\sum_{j = 1}^{p} {| g_{j}^{(p)} |}^{3})}^{2 / 3} {(\sum_{j = 1}^{p} {(λ_{j}^{(p)})}^{3})}^{1 / 3} \\ = & p^{3 / 2} O (1) {(\frac{\sum_{j = 1}^{p} {(λ_{j}^{(p)})}^{3}}{p^{3 / 2}})}^{1 / 3} = o (p^{3 / 2}) . \end{matrix}

This concludes with (60), ensuring that the conditions of Lemma 3 hold.

Now we can establish the expression for

s_{2}^{2}

. By (40), (56), (58) and (59),

\begin{matrix} s_{2}^{2} & = & σ_{2}^{4} lim_{p \to \infty} \sum_{j = 1}^{p} (2 p^{- 1} c {(λ_{j}^{(p)})}^{2} + c {(g_{j}^{(p)})}^{2}) \\ = & σ_{2}^{4} lim_{p \to \infty} \frac{2 c}{p} (\sum_{k = 1}^{p} \sum_{k^{'} = 1}^{p} ϱ^{2 | k - k^{'} |} + o (p)) + 4 σ_{2}^{4} {(κ_{1} + σ_{ε}^{2})}^{- 1} κ_{3} - 4 σ_{2}^{4} {(κ_{1} + σ_{ε}^{2})}^{- 2} κ_{2}^{2} \\ = & 2 c \frac{1 + ϱ^{2}}{1 - ϱ^{2}} {(κ_{1} + σ_{ε}^{2})}^{2} + 4 (κ_{1} + σ_{ε}^{2}) κ_{3} - 4 κ_{2}^{2} . \end{matrix}

(62)

By (44) and (62), recalling that

s^{2} = s_{1}^{2} + s_{2}^{2}

, we have that

\begin{matrix} s^{2} & = & 4 κ_{2}^{2} + 4 (κ_{1} + σ_{ε}^{2}) (2 κ_{2} c + κ_{3}) + 2 c {(κ_{1} + σ_{ε}^{2})}^{2} (c + \frac{1 + ϱ^{2}}{1 - ϱ^{2}}) . \end{matrix}

(63)

Finally, consider

I_{4}

, defined by (38). Since

E W_{n} = n

, we have that

\begin{matrix} I_{4} & = & \frac{κ_{1, p} + σ_{ε}^{2}}{n^{3 / 2}} (n^{2} \frac{κ_{2, p}}{κ_{1, p} + σ_{ε}^{2}} + p n) = \sqrt{n} (κ_{2, p} + \frac{p}{n} (κ_{1, p} + σ_{ε}^{2})) . \end{matrix}

(64)

By (34), having established four parts by (35)–(38), we proved that (39) holds due to (42), (43), (50), (62), with terms (63) and (64), as in the statement of the theorem, thus concluding the proof. □

Before proceeding with the proof of Theorem 2, we establish the following lemma that ensures

O (p^{- 1 / 2})

convergence rate for

κ_{1, p}

and

κ_{2, p}

, appearing in Theorem 1, under additional restrictions for the parameters

β_{j}

.

Lemma 4.

Assume that

\sum_{j = p + 1}^{\infty} β_{j}^{2} = o (p^{- 1 / 2})

and

{sup}_{j \geq 1} | β_{j} | j^{α} < \infty

,

α > 1 / 2

, and

| ϱ | < 1

. Then,

(i): $\begin{matrix} κ_{1} = κ_{1, p} + o (p^{- 1 / 2}), \end{matrix}$
(ii): $\begin{matrix} κ_{2} = κ_{2, p} + o (p^{- 1 / 2}) . \end{matrix}$

Proof.

For the proof see Supplementary Materials, Section S1. □

Proof of Theorem 2.

Rewrite the left-hand side of (10) as follows:

\begin{matrix} \frac{∥ X^{'} {Y ∥}_{2}^{2} - n^{2} (κ_{2} + c (κ_{1} + σ_{ε}^{2}))}{n^{3 / 2}} & = & \frac{∥ X^{'} {Y ∥}_{2}^{2} - n^{2} κ_{2, p} - p n (κ_{1, p} + σ_{ε}^{2})}{n^{3 / 2}} \\ + \sqrt{n} (κ_{2, p} - κ_{2}) + \sqrt{n} c (κ_{1, p} - κ_{1}) + o (1) . \end{matrix}

It remains to apply Lemma 4 and Theorem 1 in order to conclude the proof of the theorem. □

We end this section by deriving two supporting results that allows us to derive convenient alternative expressions for the terms

κ_{1}, κ_{2}

and

κ_{3}

. For this, we introduce functions

β (\cdot)

and

b (\cdot)

by Definition 3 below, which, under the assumptions of Theorem 1 and a given structure of

β_{j}

’s, requires only to evaluate the terms

β (1), β (ϱ), β (ϱ^{2})

and

b_{1} (ϱ), b_{2} (ϱ)

. Then, due to Lemma 5 below, the expressions for

κ_{1}

,

κ_{2}

and

κ_{3}

easily follow.

Definition 3.

Assume that

\sum_{j = 1}^{\infty} β_{j}^{2} < \infty

and

| ϱ | \leq 1

. Define,

\begin{matrix} β (ϱ) & : = & \sum_{j = 1}^{\infty} β_{j}^{2} ϱ^{j}, \end{matrix}

(65)

\begin{matrix} b_{1} (ϱ) & : = & \sum_{j^{'} = 2}^{\infty} \sum_{j = 1}^{j^{'} - 1} β_{j} β_{j^{'}} ϱ^{j^{'} - j}, \end{matrix}

(66)

\begin{matrix} b_{2} (ϱ) & : = & \sum_{j = 2}^{\infty} \sum_{j^{'} = 1}^{j - 1} β_{j} β_{j^{'}} ϱ^{j + j^{'}}, \end{matrix}

(67)

and define the following quantities which involve derivatives of (65)–(67):

\begin{matrix} β^{(1)} (ϱ) & : = & ϱ \frac{d β (ϱ)}{d ϱ} = \sum_{j = 1}^{\infty} j β_{j}^{2} ϱ^{j}, \end{matrix}

(68)

\begin{matrix} b_{1}^{(1)} (ϱ) & : = & ϱ \frac{d b_{1} (ϱ)}{d ϱ} = \sum_{j^{'} = 2}^{\infty} \sum_{j = 1}^{j^{'} - 1} β_{j} β_{j^{'}} ϱ^{j^{'} - j} (j^{'} - j), \end{matrix}

(69)

\begin{matrix} b_{2}^{(1)} (ϱ) & : = & ϱ \frac{d b_{2} (ϱ)}{d ϱ} = \sum_{j^{'} = 2}^{\infty} \sum_{j = 1}^{j^{'} - 1} β_{j} β_{j^{'}} ϱ^{j^{'} + j} (j^{'} + j), \end{matrix}

(70)

\begin{matrix} b^{(2)} (ϱ) & : = & ϱ^{2} \frac{d^{2} b_{1} (ϱ)}{d ϱ^{2}} + b_{1}^{(1)} (ϱ) = \sum_{j^{'} = 2}^{\infty} \sum_{j = 1}^{j^{'} - 1} β_{j} β_{j^{'}} ϱ^{j^{'} - j} {(j^{'} - j)}^{2} . \end{matrix}

(71)

Note that, by the rules of differentiation of power series, the functions (68)–(71) are well defined.

Lemma 5.

Let the assumptions of Theorem 1 hold. Let

κ_{1}

,

κ_{2}

and

κ_{3}

be given by (3)–(5), respectively. Then, under notation in Definition 3, the following identities hold:

(i): $\begin{matrix} κ_{1} & = β (1) + 2 b_{1} (ϱ), \end{matrix}$
(ii): $\begin{matrix} κ_{2} & = β (1) \frac{1 + ϱ^{2}}{1 - ϱ^{2}} - β (ϱ^{2}) \frac{1}{1 - ϱ^{2}} + 2 (b_{1}^{(1)} (ϱ) + b_{1} (ϱ) \frac{1 + ϱ^{2}}{1 - ϱ^{2}} - b_{2} (ϱ) \frac{1}{1 - ϱ^{2}}), \end{matrix}$
(iii): $\begin{matrix} κ_{3} & = \frac{1}{{(1 - ϱ^{2})}^{2}} ((1 + 4 ϱ^{2} + ϱ^{4}) (β (1) + 2 b_{1} (ϱ)) - (1 + 3 ϱ^{2}) (β (ϱ^{2}) + 2 b_{2} (ϱ))) \\ + \frac{1}{1 - ϱ^{2}} (3 b_{1}^{(1)} (ϱ) (1 + ϱ^{2}) - 2 (b_{2}^{(1)} (ϱ) + β^{(1)} (ϱ^{2}))) + b^{(2)} (ϱ) . \end{matrix}$

Proof.

See the proof in Appendix A.2. □

Remark 3.

From the assumptions of Definition 3 it follows that

β (1), | β (ϱ) |, | b_{1} (ϱ) |

,

| b_{2} (ϱ) | < \infty

for

| ϱ | < 1

. Thus, it follows from Lemma 5 that

κ_{i} < \infty

,

i = 1, 2, 3

.

Proof of Remark 3.

Cases for

β (1)

and

β (ϱ)

follow straightforwardly from the assumptions. Consider

b_{1} (ϱ)

. Note that for any p,

\begin{matrix} | b_{1} (ϱ) | & \leq & \sum_{l_{1}, l_{2} = 1}^{\infty} | β_{l_{1}} | | β_{l_{2}} {| | ϱ |}^{| l_{1} - l_{2} |} = \sum_{l_{1}, l_{2} = 1}^{\infty} (| β_{l_{1}} {| | ϱ |}^{| l_{1} - l_{2} | / 2}) (| β_{l_{2}} {| | ϱ |}^{| l_{1} - l_{2} | / 2}) \\ \leq & (1 / 2) \sum_{l_{1}, l_{2} = 1}^{\infty} (β_{l_{1}}^{2} {| ϱ |}^{| l_{1} - l_{2} |} + β_{l_{2}}^{2} {| ϱ |}^{| l_{1} - l_{2} |}) \\ = & \sum_{l_{1} = 1}^{\infty} β_{l_{1}}^{2} \sum_{l_{2} = 1}^{\infty} {| ϱ |}^{| l_{1} - l_{2} |} \leq β (1) \frac{1 + | ϱ |}{1 - | ϱ |} < \infty \end{matrix}

by (S9). In a similar manner, it can be seen that

| b_{2} (ϱ) | \leq β (1) \frac{| ϱ |}{1 - | ϱ |}

. □

6. Approximate Sparsity: An Example

In this section, we study the case when coefficients

β_{j}

decay hyperbolically, i.e.,

β_{j} = j^{- 1}, j \geq 1

. This assumption is analogous to the assumption of approximate sparsity, as defined by [21]. The authors of the aforementioned paper note that for approximately sparse models, the regression function can be well approximated by a linear combination of relatively few important regressors, which is one of the reasons of popularity of variable selection approaches such as LASSO ([36]) and its modifications (see, e.g., [37,38,39]). At the same time, approximate sparsity allows all coefficients

β_{j}

to be nonzero, which is a more plausible assumption in many real world settings.

In order to derive the quantities in Theorem 2, we apply the results of Lemma 5. For this, we establish the expressions for the quantities in Definition 3.

Define the real dilogarithm function (see, e.g., [40]):

\begin{matrix} {Li}_{2} (x) & = & - \int_{0}^{x} \frac{log (1 - u)}{u} d u, x \leq 1, x \in R . \end{matrix}

(72)

(Here and below,

\int_{0}^{x} = - \int_{x}^{0}

if

x \leq 0

.) For

| x | \leq 1

the real dilogarithm has a series representation,

\begin{matrix} {Li}_{2} (x) & = & \sum_{k = 1}^{\infty} \frac{x^{k}}{k^{2}} . \end{matrix}

(73)

Then,

\begin{matrix} β (1) = \sum_{j = 1}^{\infty} \frac{1}{j^{2}} = \frac{π^{2}}{6}, & β (ϱ) = \sum_{j = 1}^{\infty} \frac{ϱ^{j}}{j^{2}} = {Li}_{2} (ϱ) . \end{matrix}

Additionally, we have

\begin{matrix} \frac{d}{d ϱ} {Li}_{2} (ϱ) & = & - \frac{log (1 - ϱ)}{ϱ} . \end{matrix}

(74)

Thus, by (68) and (74), we establish

\begin{matrix} β^{(1)} (ϱ) & = & ϱ \frac{d}{d ϱ} β (ϱ) = ϱ \frac{d}{d ϱ} {Li}_{2} (ϱ) = - log (1 - ϱ) . \end{matrix}

Next, note that

\begin{matrix} b_{1} (ϱ) & = & \sum_{i = 2}^{\infty} \sum_{j = 1}^{i - 1} \frac{ϱ^{i - j}}{i j} = \sum_{i = 2}^{\infty} \sum_{k = 1}^{i - 1} \frac{ϱ^{k}}{i (i - k)} \\ = & \sum_{k = 1}^{\infty} ϱ^{k} \sum_{i = k + 1}^{\infty} \frac{1}{i (i - k)} = \sum_{k = 1}^{\infty} \frac{ϱ^{k}}{k} \sum_{l = 1}^{k} \frac{1}{l} \\ = & \sum_{l = 1}^{\infty} \frac{1}{l} \sum_{k = l}^{\infty} \frac{ϱ^{k}}{k} = \sum_{l = 1}^{\infty} \frac{1}{l} \int_{0}^{ϱ} \frac{x^{l - 1}}{1 - x} d x \\ = & - \int_{0}^{ϱ} \frac{log (1 - x)}{x (1 - x)} d x = \frac{{log}^{2} (1 - ϱ)}{2} + {Li}_{2} (ϱ), \end{matrix}

(75)

where we have used identities

\begin{matrix} \sum_{i = k + 1}^{\infty} \frac{1}{i (i - k)} & = & \frac{1}{k} \sum_{l = 1}^{k} \frac{1}{l}, k \geq 1, \sum_{k = l}^{\infty} \frac{ϱ^{k}}{k} = \int_{0}^{ϱ} \frac{x^{l - 1}}{1 - x} d x \end{matrix}

and (72). Then, by (69), (74) and (75),

\begin{matrix} b_{1}^{(1)} (ϱ) & = & ϱ \frac{d}{d ϱ} b_{1} (ϱ) = - \frac{log (1 - ϱ)}{1 - ϱ}, \end{matrix}

whereas by (71),

\begin{matrix} b^{(2)} (ϱ) & = & ϱ^{2} \frac{d^{2} b_{1} (ϱ)}{d ϱ^{2}} + b_{1}^{(1)} (ϱ) = \frac{ϱ - ϱ log (1 - ϱ)}{{(1 - ϱ)}^{2}} . \end{matrix}

Furthermore, note that

\begin{matrix} b_{2} (ϱ) & = & \sum_{i = 2}^{\infty} \sum_{j = 1}^{i - 1} \frac{ϱ^{i + j}}{i j} = \sum_{i = 2}^{\infty} \frac{ϱ^{i}}{i} \sum_{j = 1}^{i - 1} \frac{ϱ^{j}}{j} = \sum_{i = 2}^{\infty} \frac{ϱ^{i}}{i} \int_{0}^{ϱ} \sum_{j = 1}^{i - 1} x^{j - 1} d x \\ = & \sum_{i = 1}^{\infty} \frac{ϱ^{i + 1}}{i + 1} \int_{0}^{ϱ} \frac{1 - x^{i}}{1 - x} d x \\ = & - log (1 - ϱ) (\sum_{i = 1}^{\infty} \frac{ϱ^{i}}{i} - ϱ) - \int_{0}^{ϱ} (\sum_{i = 1}^{\infty} \frac{ϱ^{i}}{i} \frac{x^{i - 1}}{1 - x} - ϱ \frac{1}{1 - x}) d x \\ = & - log (1 - ϱ) \sum_{i = 1}^{\infty} \frac{ϱ^{i}}{i} - \int_{0}^{ϱ} \sum_{i = 1}^{\infty} \frac{{(ϱ x)}^{i}}{i} \frac{1}{x (1 - x)} d x \\ = & {log}^{2} (1 - ϱ) + \int_{0}^{ϱ} \frac{log (1 - ϱ x)}{x (1 - x)} d x \\ = & \frac{1}{2} ({log}^{2} (1 - ϱ) - {Li}_{2} (ϱ^{2})), \end{matrix}

(76)

where the last equality follows from Lemma A1. Next, by (69), (74) and (76) we have

\begin{matrix} b_{2}^{(1)} (ϱ) & = & log (1 - ϱ^{2}) - \frac{ϱ log (1 - ϱ)}{1 - ϱ} . \end{matrix}

Thus, we can apply Lemma 5(i) and arrive at the following expression for

κ_{1}

:

\begin{matrix} κ_{1} & = & \frac{π^{2}}{6} + {log}^{2} (1 - ϱ) + 2 {Li}_{2} (ϱ) . \end{matrix}

(77)

Similarly, for

κ_{2}

, by collecting and simplifying the terms, by Lemmas 5(ii) and A1, we have

\begin{matrix} κ_{2} & = & \frac{1 + ϱ^{2}}{1 - ϱ^{2}} (\frac{π^{2}}{6} + 2 {Li}_{2} (ϱ)) - \frac{2 log (1 - ϱ)}{1 - ϱ} + {log}^{2} (1 - ϱ) \frac{ϱ^{2}}{1 - ϱ^{2}} \\ = & \frac{1}{1 - ϱ^{2}} ((1 + ϱ^{2}) κ_{1} - {log}^{2} (1 - ϱ) - 2 (1 + ϱ) log (1 - ϱ)) . \end{matrix}

(78)

Lastly, for

κ_{3}

, by Lemma 5(iii), through simplification of terms, we get

\begin{matrix} κ_{3} & = & \frac{1}{{(1 - ϱ^{2})}^{2}} ((1 + 4 ϱ^{2} + ϱ^{4}) (\frac{π^{2}}{6} + 2 {Li}_{2} (ϱ)) + {log}^{2} (1 - ϱ) ϱ^{2} (1 + ϱ^{2}) \\ - (3 - ϱ + 4 ϱ^{2}) (1 + ϱ) log (1 - ϱ) + ϱ {(1 + ϱ)}^{2}) \\ = & κ_{2} \frac{1 + 3 ϱ^{2}}{1 - ϱ^{2}} + \frac{1}{{(1 - ϱ^{2})}^{2}} ((- 1 + ϱ + 2 ϱ^{2}) (1 + ϱ) log (1 - ϱ) + ϱ {(1 + ϱ)}^{2} - 2 ϱ^{4} κ_{1}) . \end{matrix}

(79)

This allows us to apply Theorem 2 under the considered specification of the parameter

β

and conclude with the following corollary.

Corollary 2.

Assume a model (1) with (2) covariance structure and consider

β_{j} : = j^{- 1}

,

j = 1, \dots, p

. Let

p = p_{n}

satisfies

\begin{matrix} p \to \infty, & \frac{p}{n} \to c \in (0, \infty) . \end{matrix}

Then

\begin{matrix} \frac{∥ X^{'} {Y ∥}_{2}^{2} - n^{2} (κ_{2} + c (κ_{1} + σ_{ε}^{2}))}{n^{3 / 2}} & \overset{d}{\to} & N (0, s^{2}), \end{matrix}

(80)

where

\begin{matrix} s^{2} & = & 4 κ_{2}^{2} + 4 (κ_{1} + σ_{ε}^{2}) (2 κ_{2} c + κ_{3}) + 2 c {(κ_{1} + σ_{ε}^{2})}^{2} (c + \frac{1 + ϱ^{2}}{1 - ϱ^{2}}), \end{matrix}

(81)

and

κ_{1}

,

κ_{2}

and

κ_{3}

are defined by (77)–(79), respectively.

In order to illustrate the results of Corollary 2, we end this section with a Monte Carlo simulation study where we generate 1000 independent replications of the statistic

∥ X^{'} {Y ∥}_{2}^{2}

. The data is generated following the assumptions of Corollary 2. We consider the following parameter values:

p = 100, 500, 1000, 1500, 2000, 3000

,

c = 1, 2, 5, 10

,

σ_{ε}^{2} = 1, 2, 4, 10

. Due to the large number of resulting figures, we present only selected cases in Figure 1, Figure 2, Figure 3, Figure 4, Figure 5, Figure 6, Figure 7, Figure 8 and Figure 9, which demonstrate certain disparities in greater detail. Figures show the empirical cumulative distribution function (CDF) and the empirical probability density function (PDF), together with the limiting CDF and PDF of

N (0, s^{2})

by (80) for different parameter combinations. In addition, we present the corresponding Q-Q plots in order to inspect the tails of the resulting distributions in greater detail.

We find that for relatively small values of

ϱ

, the observed distribution of the statistic is fairly close to the limiting distribution even for small values of

p, n

and larger

σ_{ε}^{2}, c

(see, e.g., Figure 1, Figure 2, Figure 3 and Figure 4). However, slower convergence is more evident with increasing values of

ϱ

. Furthermore, for moderate values of

ϱ, c, σ_{ε}^{2}

, only with larger values of p we observe adequate convergence towards the limiting distribution (see Figure 5 and Figure 6). Similar behaviour is observed when the relation between the parameters

ϱ, c, σ_{ε}^{2}

is appropriately controlled: e.g., in Figure 7, we see comparable results to those presented by Figure 6, where the effect of the increase in parameter value

ϱ

is countered by a smaller value of

σ_{ε}^{2}

. Alternatively, analogous effects can be achieved when reducing the values of c, instead.

Figure 2. Comparison of the PDF and CDF (left) and the corresponding Q-Q plots (right) after 1000 replications from the Monte Carlo simulation of the statistic (80) with the limiting distribution

N (0, s^{2})

by the Corollary 2 (in black) for

ϱ = 0.3

,

c = 1

,

σ_{ε}^{2} = 2

and

p = 1500, 2000, 3000

.

Figure 2. Comparison of the PDF and CDF (left) and the corresponding Q-Q plots (right) after 1000 replications from the Monte Carlo simulation of the statistic (80) with the limiting distribution

N (0, s^{2})

by the Corollary 2 (in black) for

ϱ = 0.3

,

c = 1

,

σ_{ε}^{2} = 2

and

p = 1500, 2000, 3000

.

Figure 3. Comparison of the PDF and CDF (left) and the corresponding Q-Q plots (right) after 1000 replications from the Monte Carlo simulation of the statistic (80) with the limiting distribution

N (0, s^{2})

by the Corollary 2 (in black) for

ϱ = 0.3

,

c = 1

,

σ_{ε}^{2} = 10

and

p = 100, 500, 1000

.

Figure 3. Comparison of the PDF and CDF (left) and the corresponding Q-Q plots (right) after 1000 replications from the Monte Carlo simulation of the statistic (80) with the limiting distribution

N (0, s^{2})

by the Corollary 2 (in black) for

ϱ = 0.3

,

c = 1

,

σ_{ε}^{2} = 10

and

p = 100, 500, 1000

.

Figure 4. Comparison of the PDF and CDF (left) and the corresponding Q-Q plots (right) after 1000 replications from the Monte Carlo simulation of the statistic (80) with the limiting distribution

N (0, s^{2})

by the Corollary 2 (in black) for

ϱ = 0.3

,

c = 1

,

σ_{ε}^{2} = 10

and

p = 1500, 2000, 3000

.

Figure 4. Comparison of the PDF and CDF (left) and the corresponding Q-Q plots (right) after 1000 replications from the Monte Carlo simulation of the statistic (80) with the limiting distribution

N (0, s^{2})

by the Corollary 2 (in black) for

ϱ = 0.3

,

c = 1

,

σ_{ε}^{2} = 10

and

p = 1500, 2000, 3000

.

Figure 5. Comparison of the PDF and CDF (left) and the corresponding Q-Q plots (right) after 1000 replications from the Monte Carlo simulation of the statistic (80) with the limiting distribution

N (0, s^{2})

by the Corollary 2 (in black) for

ϱ = 0.7

,

c = 5

,

σ_{ε}^{2} = 4

and

p = 100, 500, 1000

.

Figure 5. Comparison of the PDF and CDF (left) and the corresponding Q-Q plots (right) after 1000 replications from the Monte Carlo simulation of the statistic (80) with the limiting distribution

N (0, s^{2})

by the Corollary 2 (in black) for

ϱ = 0.7

,

c = 5

,

σ_{ε}^{2} = 4

and

p = 100, 500, 1000

.

Figure 6. Comparison of the PDF and CDF (left) and the corresponding Q-Q plots (right) after 1000 replications from the Monte Carlo simulation of the statistic (80) with the limiting distribution

N (0, s^{2})

by the Corollary 2 (in black) for

ϱ = 0.7

,

c = 5

,

σ_{ε}^{2} = 4

and

p = 1500, 2000, 3000

.

Figure 6. Comparison of the PDF and CDF (left) and the corresponding Q-Q plots (right) after 1000 replications from the Monte Carlo simulation of the statistic (80) with the limiting distribution

N (0, s^{2})

by the Corollary 2 (in black) for

ϱ = 0.7

,

c = 5

,

σ_{ε}^{2} = 4

and

p = 1500, 2000, 3000

.

Figure 7. Comparison of the PDF and CDF (left) and the corresponding Q-Q plots (right) after 1000 replications from the Monte Carlo simulation of the statistic (80) with the limiting distribution

N (0, s^{2})

by the Corollary 2 (in black) for

ϱ = 0.9

,

c = 5

,

σ_{ε}^{2} = 1

and

p = 1500, 2000, 3000

.

Figure 7. Comparison of the PDF and CDF (left) and the corresponding Q-Q plots (right) after 1000 replications from the Monte Carlo simulation of the statistic (80) with the limiting distribution

N (0, s^{2})

by the Corollary 2 (in black) for

ϱ = 0.9

,

c = 5

,

σ_{ε}^{2} = 1

and

p = 1500, 2000, 3000

.

Finally, slow convergence is observed for large values of

ϱ, c, σ_{ε}^{2}

, as expected (see Figure 8 and Figure 9). In such cases, the simulation results suggest that even larger values of

p, n

would be needed for more accurate results.

Figure 8. Comparison of the PDF and CDF (left) and the corresponding Q-Q plots (right) after 1000 replications from the Monte Carlo simulation of the statistic (80) with the limiting distribution

N (0, s^{2})

by the Corollary 2 (in black) for

ϱ = - 0.95

,

c = 10

,

σ_{ε}^{2} = 4

and

p = 100, 500, 1000

.

Figure 8. Comparison of the PDF and CDF (left) and the corresponding Q-Q plots (right) after 1000 replications from the Monte Carlo simulation of the statistic (80) with the limiting distribution

N (0, s^{2})

by the Corollary 2 (in black) for

ϱ = - 0.95

,

c = 10

,

σ_{ε}^{2} = 4

and

p = 100, 500, 1000

.

Figure 9. Comparison of the PDF and CDF (left) and the corresponding Q-Q plots (right) after 1000 replications from the Monte Carlo simulation of the statistic (80) with the limiting distribution

N (0, s^{2})

by the Corollary 2 (in black) for

ϱ = - 0.95

,

c = 10

,

σ_{ε}^{2} = 4

and

p = 1500, 2000, 3000

.

Figure 9. Comparison of the PDF and CDF (left) and the corresponding Q-Q plots (right) after 1000 replications from the Monte Carlo simulation of the statistic (80) with the limiting distribution

N (0, s^{2})

by the Corollary 2 (in black) for

ϱ = - 0.95

,

c = 10

,

σ_{ε}^{2} = 4

and

p = 1500, 2000, 3000

.

7. Discussion

In this paper, we consider a specific KMS covariance structure due to its attractive properties and wide application possibilities for working with real world datasets. Moreover, our results could be extended further by considering a wider family of Toeplitz covariance structures. For instance, under specific constraints, one could employ the approaches proposed in [3] in order to extend the application of our results towards more complex covariance structures of the data.

Furthermore, for future work, it would be interesting to expand and examine the results by removing the assumption of independence between the observations

X_{i}

,

i = 1, \dots, n

.

Finally, in this paper we have established both the exact and the asymptotic distributions of the statistic

∥ X^{'} {Y ∥}_{2}^{2}

(see (8), (10) and (34)). Both distributions could be used for estimating

β

,

σ_{ε}^{2}

or related measures (e.g., by applying the method of moments or maximum likelihood estimation) in future research. Such research direction could open up interesting avenues when compared with popular LASSO type methods in high-dimensional linear regression. Similar approach is taken by [10], who construct maximum likelihood estimators for the signal strength

{∥ β ∥}_{2}^{2}

in a high-dimensional regression context. Note that the results by [10] are achieved under certain strong restrictions, which are consistent with the related literature (see, e.g., [7,41,42]). In our case, we impose weaker assumptions; therefore, both our asymptotic or exact results could be used in order to extend the approaches in the aforementioned literature.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/math10101657/s1, Proof of Lemmas 4, A2, A5, Proof of result (A9) of Lemma 5(ii), Proof of results (A13)–(A14) of Lemma 5(iii).

Author Contributions

Conceptualization, S.J., R.L.; methodology, S.J., R.L.; investigation, S.J., R.L.; writing—original draft preparation, S.J., R.L.; writing—review and editing, S.J., R.L.; visualization, S.J., R.L. All authors have read and agreed to the published version of the manuscript.

Funding

Supported by grant No. S-MIP-20-16 from the Research Council of Lithuania.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Acknowledgments

The authors would like to thank the anonymous Referees for their very constructive and detailed comments and suggestions on the first version of the manuscript.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A

Throughout the proofs we use the notation C to mark generic constants, the specific values of which can change from line to line.

Appendix A.1. Technical Lemmas

Lemma A1.

Assume that

| ϱ | < 1

. Then,

\begin{matrix} \int_{0}^{ϱ} \frac{log (1 - ϱ x)}{x (1 - x)} d x & = & - \frac{1}{2} ({Li}_{2} (ϱ^{2}) + {log}^{2} (1 - ϱ)), \end{matrix}

where

{Li}_{2}

denotes the real dilogarithm function. (Recall, that for

ϱ < 0

, by

\int_{0}^{ϱ}

we denote

- \int_{ϱ}^{0}

.)

Proof.

Write,

\begin{matrix} \int_{0}^{ϱ} \frac{log (1 - ϱ x)}{x (1 - x)} d x & = & \int_{0}^{ϱ} \frac{log (1 - ϱ x)}{x} d x + \int_{0}^{ϱ} \frac{log (1 - ϱ x)}{1 - x} d x . \end{matrix}

By (72), we have

\begin{matrix} \int_{0}^{ϱ} \frac{log (1 - ϱ x)}{x} d x & = & - {Li}_{2} (ϱ^{2}) . \end{matrix}

(A1)

It remains to show that

\begin{matrix} \int_{0}^{ϱ} \frac{log (1 - ϱ x)}{1 - x} d x & = & \frac{1}{2} ({Li}_{2} (ϱ^{2}) - {log}^{2} (1 - ϱ)) . \end{matrix}

(A2)

Indeed, by substitution

v = ϱ - ϱ x

, we have

\begin{matrix} \int_{0}^{ϱ} \frac{log (1 - ϱ x)}{1 - x} d x & = & \int_{ϱ - ϱ^{2}}^{ϱ} \frac{log (1 - ϱ + v)}{v} d v \\ = & \int_{ϱ - ϱ^{2}}^{ϱ} \frac{log (1 + \frac{v}{1 - ϱ})}{v} d v - {log}^{2} (1 - ϱ) . \end{matrix}

(A3)

Further, by substitution

w = - \frac{v}{1 - ϱ}

, we have

\begin{matrix} \int_{ϱ - ϱ^{2}}^{ϱ} \frac{log (1 + \frac{v}{1 - ϱ})}{v} d v & = & - \int_{- \frac{ϱ}{1 - ϱ}}^{- ϱ} \frac{log (1 - w)}{w} d w \\ = & {Li}_{2} (- ϱ) - {Li}_{2} (- \frac{ϱ}{1 - ϱ}) \\ = & {Li}_{2} (- ϱ) + {Li}_{2} (ϱ) + \frac{1}{2} {log}^{2} (1 - ϱ) \end{matrix}

(A4)

\begin{matrix} = & \frac{1}{2} ({Li}_{2} (ϱ^{2}) + {log}^{2} (1 - ϱ)), \end{matrix}

(A5)

where for (A4) and (A5) we apply the easily verifyable identities (see, e.g., [43]):

\begin{matrix} {Li}_{2} (\frac{x}{x - 1}) & = & - {Li}_{2} (x) - \frac{1}{2} {log}^{2} (1 - x), x < 1, \\ {Li}_{2} (x) + {Li}_{2} (- x) & = & \frac{1}{2} {Li}_{2} (x^{2}), | x | < 1 . \end{matrix}

Thus, (A3) and (A5) imply (A2), which concludes the proof. □

Lemma A2.

Assume that

\sum_{j = 1}^{\infty} β_{j}^{2} < \infty

and

| ϱ | < 1

. Then, the following inequalities hold:

(i): $\begin{matrix} | \sum_{l = p + 1}^{\infty} \sum_{l^{'} = l + 1}^{\infty} β_{l} β_{l^{'}} ϱ^{l^{'} - l} | \leq C \sum_{l = p + 1}^{\infty} β_{l}^{2} . \end{matrix}$
(ii): $\begin{matrix} | \sum_{l = p + 1}^{\infty} \sum_{l^{'} = l + 1}^{\infty} β_{l} β_{l^{'}} ϱ^{l^{'} - l} (l^{'} - l) | \leq C \sum_{l = p + 1}^{\infty} β_{l}^{2} . \end{matrix}$
(iii): $\begin{matrix} | \sum_{l = 1}^{p} \sum_{l^{'} = p + 1}^{\infty} β_{l} β_{l^{'}} ϱ^{l^{'} - l} | \leq C \sum_{l = p + 1}^{\infty} β_{l}^{2} . \end{matrix}$
(iv): $\begin{matrix} | \sum_{l = 1}^{p} \sum_{l^{'} = p + 1}^{\infty} β_{l} β_{l^{'}} ϱ^{l^{'} + l} | \leq C \sum_{l = p + 1}^{\infty} β_{l}^{2} . \end{matrix}$

Proof.

See the proof in Supplementary Materials, Section S2. □

Lemma A3.

Assume that

{sup}_{j \geq 1} | β_{j} | j^{α} < \infty

,

α > 1 / 2

and that

| ϱ | < 1

. Then,

\begin{matrix} | \sum_{j = 1}^{p} β_{j} ϱ^{p - j} | = o (p^{- 1 / 4}) . \end{matrix}

Proof.

We have

\begin{matrix} | \sum_{j = 1}^{p} β_{j} ϱ^{p - j} | & \leq & \sum_{j = 1}^{⌊ \sqrt{p} ⌋} | β_{j} {| | ϱ |}^{p - j} + \sum_{j = ⌊ \sqrt{p} ⌋ + 1}^{p} | β_{j} {| | ϱ |}^{p - j} \\ \leq & sup_{j \geq 1} | β_{j} | \sum_{j = 1}^{⌊ \sqrt{p} ⌋} {| ϱ |}^{p - j} + p^{- α / 2} \sum_{j = ⌊ \sqrt{p} ⌋ + 1}^{p} | β_{j} | p^{α / 2} {| ϱ |}^{p - j} \\ \leq & sup_{j \geq 1} | β_{j} | \sum_{j = 1}^{⌊ \sqrt{p} ⌋} {| ϱ |}^{p - j} + p^{- α / 2} sup_{j \geq 1} | β_{j} | j^{α} \sum_{j = ⌊ \sqrt{p} ⌋ + 1}^{p} {| ϱ |}^{p - j} \\ \leq & C (\sum_{j = 1}^{⌊ \sqrt{p} ⌋} {| ϱ |}^{p - j} + p^{- α / 2} \sum_{j = ⌊ \sqrt{p} ⌋ + 1}^{p} {| ϱ |}^{p - j}) \\ \leq & C ({| ϱ |}^{p - ⌊ \sqrt{p} ⌋} + p^{- α / 2}) . \end{matrix}

Here we used the fact that

\sum_{j = ⌊ \sqrt{p} ⌋ + 1}^{p} {| ϱ |}^{p - j} \to {(1 - | ϱ |)}^{- 1} < \infty

. Thus,

\begin{matrix} p^{1 / 4} | \sum_{j = 1}^{p} β_{j} ϱ^{p - j} | \leq C (p^{1 / 4} {| ϱ |}^{p - ⌊ \sqrt{p} ⌋} + p^{\frac{1}{4} - \frac{α}{2}}) \to 0 . \end{matrix}

(A6)

□

Remark A1.

The assumption

{sup}_{j \geq 1} | β_{j} | j^{α} < \infty

, for

α > 1 / 2

, implies that

\sum_{j = 1}^{\infty} β_{j}^{2} < \infty

:

\begin{matrix} \sum_{j = 1}^{\infty} β_{j}^{2} & = & \sum_{j = 1}^{\infty} β_{j}^{2} j^{2 α} j^{- 2 α} \leq sup_{j \geq 1} β_{j}^{2} j^{2 α} \sum_{k = 1}^{\infty} k^{- 2 α} < \infty . \end{matrix}

Lemma A4.

Assume that the assumptions of Theorem 1 hold. Then,

\begin{matrix} κ_{2, p} = o (p) . \end{matrix}

Proof.

Observe, that

\begin{matrix} κ_{2, p} = \sum_{k = 1}^{p} {(\sum_{l = 1}^{p} β_{l} ϱ^{| k - l |})}^{2} & = & \sum_{k = 1}^{p} \sum_{l_{1}, l_{2} = 1}^{p} β_{l_{1}} β_{l_{2}} ϱ^{| k - l_{1} | + | k - l_{2} |} \\ \leq & \sum_{l_{1}, l_{2} = 1}^{p} | β_{l_{1}} | | β_{l_{2}} | \sum_{k = 1}^{p} {| ϱ |}^{| k - l_{1} | + | k - l_{2} |} \\ \leq & C {(\sum_{l = 1}^{p} | β_{l_{1}} |)}^{2} \\ = & o (p) \end{matrix}

(A7)

where (A7) follows from (S9). Meanwhile,

\sum_{l = 1}^{p} | β_{l_{1}} | = o (p^{1 / 2})

, since

\begin{matrix} \sum_{l = 1}^{p} | β_{l} | & = & \sum_{l = 1}^{⌊ p^{1 / 2} ⌋} | β_{l} | + \sum_{l = ⌊ p^{1 / 2} ⌋ + 1}^{p} | β_{l} | \\ \leq & p^{1 / 4} {(\sum_{l = 1}^{\infty} β_{l}^{2})}^{1 / 2} + p^{1 / 2} {(\sum_{l = ⌊ p^{1 / 2} ⌋ + 1}^{\infty} β_{l}^{2})}^{1 / 2} = o (p^{1 / 2}) . \end{matrix}

□

Lemma A5.

Assume that

\sum_{j = 1}^{\infty} β_{j}^{2} < \infty

and

| ϱ | < 1

. Define

θ_{k}^{(p)} = \sum_{j = 1}^{p} β_{j} ϱ^{| k - j |}

. Then,

\begin{matrix} | \sum_{i, j, k = 1}^{p} (ϱ^{| i - j |} + θ_{i}^{(p)} θ_{j}^{(p)}) (ϱ^{| i - k |} + θ_{i}^{(p)} θ_{k}^{(p)}) (ϱ^{| k - j |} + θ_{k}^{(p)} θ_{j}^{(p)}) | & = & o (p^{3 / 2}) . \end{matrix}

(A8)

Proof.

See the proof in Supplementary Material, Section S3. □

Appendix A.2. Proof of Lemma 5

Here and throughout the proof we employ the notation as in Definition 3.

(i) Note that, by (65) and (67), we have

\begin{matrix} κ_{1, p} & = & \sum_{k = 1}^{p} β_{k}^{2} + 2 \sum_{k = 2}^{p} \sum_{l = 1}^{k - 1} β_{k} β_{l} ϱ^{k - l} \to β (1) + 2 b_{1} (ϱ) as p \to \infty . \end{matrix}

(ii) Write

\begin{matrix} κ_{2, p} & = & \sum_{l = 1}^{p} \sum_{k = 1}^{p} β_{l}^{2} ϱ^{2 | k - l |} + 2 \sum_{l^{'} > l} \sum_{k = 1}^{p} β_{l} β_{l^{'}} ϱ^{| k - l |} ϱ^{| k - l^{'} |} . \end{matrix}

From here, it can be seen that

\begin{matrix} κ_{2, p} & \to & β (1) \frac{1 + ϱ^{2}}{1 - ϱ^{2}} - β (ϱ^{2}) \frac{1}{1 - ϱ^{2}} + 2 (b_{1}^{(1)} (ϱ) + b_{1} (ϱ) \frac{1 + ϱ^{2}}{1 - ϱ^{2}} - b_{2} (ϱ) \frac{1}{1 - ϱ^{2}}) . \end{matrix}

(A9)

Technical details of the proof of (A9) are presented in Supplementary Materials, Section S4.

(iii) Consider

\begin{matrix} κ_{3, p} & = & \sum_{l = 1}^{p} β_{l}^{2} J_{1} (l) + 2 \sum_{l < l^{'}} β_{l} β_{l^{'}} J_{2} (l, l^{'}), \end{matrix}

(A10)

where

\begin{matrix} J_{1} (l) & : = & \sum_{k, k^{'} = 1}^{p} ϱ^{| k - k^{'} |} ϱ^{| k - l |} ϱ^{| k^{'} - l |} 1_{{l = l^{'}}}, \end{matrix}

(A11)

\begin{matrix} J_{2} (l, l^{'}) & : = & \sum_{k, k^{'} = 1}^{p} ϱ^{| k - k^{'} |} ϱ^{| k - l |} ϱ^{| k^{'} - l^{'} |} 1_{{l < l^{'}}} . \end{matrix}

(A12)

Then, as

p \to \infty

, using the notation in Definition 3, we have that

\begin{matrix} \sum_{l = 1}^{p} β_{l}^{2} J_{1} (l) & \to & β (1) \frac{1 + 4 ϱ^{2} + ϱ^{4}}{{(1 - ϱ^{2})}^{2}} - β (ϱ^{2}) \frac{1 + 3 ϱ^{2}}{{(1 - ϱ^{2})}^{2}} - \frac{2}{1 - ϱ^{2}} β^{(1)} (ϱ^{2}), \end{matrix}

(A13)

and

\begin{matrix} \sum_{l^{'} > l} β_{l}^{2} J_{2} (l, l^{'}) & \to & \frac{1}{2 {(1 - ϱ^{2})}^{2}} (b^{(2)} (ϱ) {(1 - ϱ^{2})}^{2} + 3 b_{1}^{(1)} (ϱ) (1 - ϱ^{4}) + 2 b_{1} (ϱ) (1 + 4 ϱ^{2} + ϱ^{4}) \\ - 2 b_{2}^{(1)} (ϱ) (1 - ϱ^{2}) - 2 b_{2} (ϱ) (1 + 3 ϱ^{2})) . \end{matrix}

(A14)

Technical details of the proof of (A13) and (A14) are omitted here and presented in the Supplementary Materials, Section S5. This concludes the proof.

References

Kac, M.; Murdock, W.; Szego, G. On the eigen-values of certain Hermitian forms. J. Linear Ration. Mech. Anal. 1953, 2, 767–800. [Google Scholar] [CrossRef]
Fikioris, G. Spectral properties of Kac–Murdock–Szegö matrices with a complex parameter. Linear Algebra Appl. 2018, 553, 182–210. [Google Scholar] [CrossRef] [Green Version]
Yang, Y.; Zhou, J.; Pan, J. Estimation and optimal structure selection of high-dimensional Toeplitz covariance matrix. J. Multivar. Anal. 2021, 184, 104739. [Google Scholar] [CrossRef]
Liang, K.Y.; Zeger, S.L. Longitudinal data analysis using generalized linear models. Biometrika 1986, 73, 13–22. [Google Scholar] [CrossRef]
Rangan, S. Generalized approximate message passing for estimation with random linear mixing. In Proceedings of the 2011 IEEE International Symposium on Information Theory Proceedings, St. Petersburg, Russia, 31 July–5 August 2011; pp. 2168–2172. [Google Scholar]
Vila, J.P.; Schniter, P. Expectation-maximization Gaussian-mixture approximate message passing. IEEE Trans. Signal Process. 2013, 61, 4658–4672. [Google Scholar] [CrossRef] [Green Version]
Dicker, L.H. Variance estimation in high-dimensional linear models. Biometrika 2014, 101, 269–284. [Google Scholar] [CrossRef]
Diggle, P.J.; Giorgi, E. Model-Based Geostatistics for Global Public Health: Methods and Applications; Chapman and Hall/CRC: Boca Raton, FL, USA, 2019. [Google Scholar]
Patil, A.R.; Kim, S. Combination of ensembles of regularized regression models with resampling-based lasso feature selection in high dimensional data. Mathematics 2020, 8, 110. [Google Scholar] [CrossRef] [Green Version]
Dicker, L.H.; Erdogdu, M.A. Maximum likelihood for variance estimation in high-dimensional linear models. In Proceedings of the 19th International Conference on Artificial Intelligence and Statistics, Cadiz, Spain, 9–11 May 2016. [Google Scholar]
Carpentier, A.; Verzelen, N. Adaptive estimation of the sparsity in the Gaussian vector model. Ann. Stat. 2019, 47, 93–126. [Google Scholar] [CrossRef] [Green Version]
Carpentier, A.; Verzelen, N. Optimal sparsity testing in linear regression model. Bernoulli 2021, 27, 727–750. [Google Scholar] [CrossRef]
Gaunt, R.E. Rates of Convergence of Variance-Gamma Approximations via Stein’s Method. Ph.D. Thesis, The Queen’s College, University of Oxford, Oxford, UK, 2013. [Google Scholar]
Gaunt, R.E. Variance-Gamma approximation via Stein’s method. Electron. J. Probab. 2014, 19, 1–33. [Google Scholar] [CrossRef]
Gaunt, R.E. Products of normal, beta and gamma random variables: Stein operators and distributional theory. Braz. J. Probab. Stat. 2018, 32, 437–466. [Google Scholar] [CrossRef] [Green Version]
Gaunt, R.E. A note on the distribution of the product of zero-mean correlated normal random variables. Stat. Neerl. 2019, 73, 176–179. [Google Scholar] [CrossRef] [Green Version]
Ing, C.K. Model selection for high-dimensional linear regression with dependent observations. Ann. Stat. 2020, 48, 1959–1980. [Google Scholar] [CrossRef]
Cha, J.; Chiang, H.D.; Sasaki, Y. Inference in high-dimensional regression models without the exact or L^p sparsity. arXiv 2021, arXiv:2108.09520. [Google Scholar]
Shibata, R. Asymptotically Efficient Selection of the Order of the Model for Estimating Parameters of a Linear Process. Ann. Stat. 1980, 8, 147–164. [Google Scholar] [CrossRef]
Ing, C.K. Accumulated Prediction Errors, Information Criteria and Optimal Forecasting for Autoregressive Time Series. Ann. Stat. 2007, 35, 1238–1277. [Google Scholar] [CrossRef] [Green Version]
Belloni, A.; Chen, D.; Chernozhukov, V.; Hansen, C. Sparse Models and Methods for Optimal Instruments With an Application to Eminent Domain. Econometrica 2012, 80, 2369–2429. [Google Scholar] [CrossRef] [Green Version]
Javanmard, A.; Montanari, A. Confidence intervals and hypothesis testing for high-dimensional regression. J. Mach. Learn. Res. 2014, 15, 2869–2909. [Google Scholar]
Zhang, C.H.; Zhang, S.S. Confidence intervals for low dimensional parameters in high dimensional linear models. J. R. Stat. Soc. Ser. B Stat. Methodol. 2014, 76, 217–242. [Google Scholar] [CrossRef] [Green Version]
Caner, M.; Kock, A.B. Asymptotically honest confidence regions for high dimensional parameters by the desparsified conservative Lasso. J. Econom. 2018, 203, 143–168. [Google Scholar] [CrossRef] [Green Version]
Belloni, A.; Chernozhukov, V.; Chetverikov, D.; Hansen, C.; Kato, K. High-dimensional econometrics and regularized GMM. arXiv 2018, arXiv:1806.01888. [Google Scholar]
Gold, D.; Lederer, J.; Tao, J. Inference for high-dimensional instrumental variables regression. J. Econom. 2020, 217, 79–111. [Google Scholar] [CrossRef] [Green Version]
Ning, Y.; Peng, S.; Tao, J. Doubly Robust Semiparametric Difference-in-Differences Estimators with High-Dimensional Data. arXiv 2020, arXiv:2009.03151. [Google Scholar]
Guo, Z.; Ćevid, D.; Bühlmann, P. Doubly Debiased Lasso: High-Dimensional Inference under Hidden Confounding. arXiv 2021, arXiv:2004.03758. [Google Scholar]
Dai, Z.; Li, T.; Yang, M. Forecasting stock return volatility: The role of shrinkage approaches in a data-rich environment. J. Forecast. 2021, 1–17. [Google Scholar] [CrossRef]
Dai, Z.; Zhu, H.; Zhang, X. Dynamic spillover effects and portfolio strategies between crude oil, gold and Chinese stock markets related to new energy vehicle. Energy Econom. 2022, 109, 105959. [Google Scholar] [CrossRef]
Dai, Z.; Zhu, H. Time-varying spillover effects and investment strategies between WTI crude oil, natural gas and Chinese stock markets related to belt and road initiative. Energy Econ. 2022, 108, 105883. [Google Scholar] [CrossRef]
Sánchez Garca, J.; Cruz Rambaud, S. Machine Learning Regularization Methods in High-Dimensional Monetary and Financial VARs. Mathematics 2022, 10, 877. [Google Scholar] [CrossRef]
Yi, J.; Tang, N. Variational Bayesian inference in high-dimensional linear mixed models. Mathematics 2022, 10, 463. [Google Scholar] [CrossRef]
Madan, D.B.; Carr, P.P.; Chang, E.C. The Variance Gamma Process and Option Pricing. Rev. Financ. 1998, 2, 79–105. [Google Scholar] [CrossRef] [Green Version]
Kotz, S.; Kozubowski, T.; Podgórski, K. The Laplace Distribution and Generalizations: A Revisit with Applications to Communications, Economics, Engineering, and Finance; Birkhäuser: Boston, MA, USA, 2001. [Google Scholar]
Tibshirani, R. Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Ser. B Methodol. 1996, 58, 267–288. [Google Scholar] [CrossRef]
Zou, H. The adaptive lasso and its oracle properties. J. Am. Stat. Assoc. 2006, 101, 1418–1429. [Google Scholar] [CrossRef] [Green Version]
Meinshausen, N. Relaxed Lasso. Comput. Stat. Data Anal. 2007, 52, 374–393. [Google Scholar] [CrossRef]
Belloni, A.; Chernozhukov, V.; Wang, L. Square-root lasso: Pivotal recovery of sparse signals via conic programming. Biometrika 2011, 98, 791–806. [Google Scholar] [CrossRef] [Green Version]
Morris, R. The Dilogarithm Function of a Real Argument. Math. Comput. 1979, 33, 778–787. [Google Scholar] [CrossRef]
Bayati, M.; Erdogdu, M.A.; Montanari, A. Estimating lasso risk and noise level. In Proceedings of the Advances in Neural Information Processing Systems: 27th Annual Conference on Neural Information, Processing Systems 2013, Lake Tahoe, NV, USA, 5–10 December 2013; Volume 26. [Google Scholar]
Janson, L.; Barber, R.F.; Candes, E. EigenPrism: Inference for high dimensional signal-to-noise ratios. J. R. Stat. Soc. Ser. B Stat. Methodol. 2017, 79, 1037–1065. [Google Scholar] [CrossRef] [Green Version]
Maximon, L.C. The dilogarithm function for complex argument. Proc. R. Soc. Lond. Ser. A Math. Phys. Eng. Sci. 2003, 459, 2807–2819. [Google Scholar] [CrossRef]

Figure 1. Comparison of the PDF and CDF (left) and the corresponding Q-Q plots (right) after 1000 replications from the Monte Carlo simulation of the statistic (80) with the limiting distribution

N (0, s^{2})

by the Corollary 2 (in black) for

ϱ = 0.3

,

c = 1

,

σ_{ε}^{2} = 2

and

p = 100, 500, 1000

.

Figure 1. Comparison of the PDF and CDF (left) and the corresponding Q-Q plots (right) after 1000 replications from the Monte Carlo simulation of the statistic (80) with the limiting distribution

N (0, s^{2})

by the Corollary 2 (in black) for

ϱ = 0.3

,

c = 1

,

σ_{ε}^{2} = 2

and

p = 100, 500, 1000

.

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Jokubaitis, S.; Leipus, R. Asymptotic Normality in Linear Regression with Approximately Sparse Structure. Mathematics 2022, 10, 1657. https://doi.org/10.3390/math10101657

AMA Style

Jokubaitis S, Leipus R. Asymptotic Normality in Linear Regression with Approximately Sparse Structure. Mathematics. 2022; 10(10):1657. https://doi.org/10.3390/math10101657

Chicago/Turabian Style

Jokubaitis, Saulius, and Remigijus Leipus. 2022. "Asymptotic Normality in Linear Regression with Approximately Sparse Structure" Mathematics 10, no. 10: 1657. https://doi.org/10.3390/math10101657

APA Style

Jokubaitis, S., & Leipus, R. (2022). Asymptotic Normality in Linear Regression with Approximately Sparse Structure. Mathematics, 10(10), 1657. https://doi.org/10.3390/math10101657

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Asymptotic Normality in Linear Regression with Approximately Sparse Structure

Abstract

1. Introduction

2. Main Results

3. Properties of the Variance-Gamma Distribution

4. Some Auxiliary Lemmas

5. Proof of the Main Results

6. Approximate Sparsity: An Example

7. Discussion

Supplementary Materials

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Appendix A

Appendix A.1. Technical Lemmas

Appendix A.2. Proof of Lemma 5

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI