Yule–Walker Equations Using a Gini Covariance Matrix for the High-Dimensional Heavy-Tailed PVAR Model

Jin Zou; Dong Han

doi:10.3390/math9060614

and

School of Mathematics Science, Shanghai Jiao Tong University, Shanghai 200240, China

^*

Author to whom correspondence should be addressed.

Mathematics2021, 9(6), 614;https://doi.org/10.3390/math9060614

Version Notes

Order Reprints

Review Reports

Abstract

Gini covariance plays a vital role in analyzing the relationship between random variables with heavy-tailed distributions. In this papaer, with the existence of a finite second moment, we establish the Gini–Yule–Walker equation to estimate the transition matrix of high-dimensional periodic vector autoregressive (PVAR) processes, the asymptotic results of estimators have been established. We apply this method to study the Granger causality of the heavy-tailed PVAR process, and the results show that the robust transfer matrix estimation induces sign consistency in the value of Granger causality. Effectiveness of the proposed method is verified by both synthetic and real data.

Keywords:

high dimensional time series; transition matrix estimation; Gini covariance matrix; U-statistics

1. Introduction

The heavy-tailed distribution is graphically thicker in the tails and sharper in the peaks than the normal distribution. Intuitively, this means that the probability of extreme values is greater for these data than for normally distributed data. The heavy-tailed phenomenon has been encountered empirically in various fields: physics, earth sciences, economics and political science, etc. Periodicity is a wave-like or oscillatory movement around a long-term trend presented in a time series. It is well-known that cyclicality caused by business and economic activities is usually different from trend movements, not in a single direction of continuous movement, but in alternating fluctuations between ups and downs. When these components, trend and cyclicality, do not evolve independently, traditional differencing filters may not be suitable (see for example, Franses and Paap [1], Bell et al. [2], Aliat and Hamdi [3]). Periodic time series models are used to model periodically stationary time series. Periodic vector autoregressive (PVAR) models extend the classical vector autoregressive (VAR) models by allowing the parameters to vary with the cyclicality.

X_{i T + v} = \sum_{k = 1}^{q (v)} α_{k} (v) X_{i T + v - k} + ϵ_{i T + v} .

(1)

For fixed v and predetermined value T, the random vector

X_{i T + v}

represents the realization during the vth season, with

v \in {1, \dots, T}

, at year

i, i \in {1, \dots, n}

. The PVAR model order at season v is given by

q (v)

, whereas

α_{k} (v), k = 1, \dots, v

, represent the PVAR model coefficients during season v. The error process

ϵ = \{ϵ_{i T + v}\}

corresponds to a periodic white noise, with

E (ϵ_{i T + v}) = 0

and

var (ϵ_{i T + v}) = σ^{2} (v) I

, where

0

is a zero vector of order d and

I

is a unit matrix of order d.

This paper seeks to establish a PVAR model to simulate the heavy-tailed time series. Let random vectors

X_{1}, . . ., X_{n}

be from a stationary stochastic process

{(X_{k})}_{k = - \infty}^{\infty}

,

X_{k} = {(X_{k T + 1}, \dots, X_{k T + T})}^{⊤},

where the transpose (indicated by ⊤) of a row vector is a column vector.

We consider first-order PVAR model,

X_{k T + v} = A X_{k T + v - 1} + E_{k T + v},

(2)

where

k \in {1, \dots, n}, v \in {2, \dots, T}

, T is the period of

X_{k}

,

E_{k T + v}

denotes the latent heavy-tailed innovation, and A is the transition matrix. In this paper, we assume that the entries of the heavy-tailed innovation

E_{k T + v} = {(E_{k T + v 1}, \dots, E_{k T + v d})}^{⊤}

obey a power-law distribution with

P (| E_{k T + v i} | > x) \sim C x^{- α_{i}}

and

α_{i} \leq 2

, then the second moment is finite and the third moment is infinite. From the Equation (2) we know that, given k, the vector

X_{k}

follows a first-order vector autoregressive (VAR). We call the first-order PVAR process stable if all eigenvalues of transition matrix A have modulus less than 1, the condition is equivalent to

det (I_{d} - A z) \neq 0

for all

| z | \leq 1

(see for example, Section 2.1 in Lütkepohl [4]). The transition matrix characterizes the temporal dependence for sequence data, and plays a fundamental role in forecasting. Moreover, the zero and nonzero entries of the transition matrix is often closely related to Granger causality. This manuscript focuses on estimating the transition matrix of high-dimensional heavy-tailed PVAR processes.

PVAR models have been extensively studied under the Gaussian assumption, the Gaussian PVAR models assume that the latent innovations are independent identity distribution Gaussian random vectors. Under this model, there are two kinds of methods to estimate the transition matrix under high dimensional setting, one is the Lasso-based estimation procedures, see [5,6,7,8], and the other is Dantzig-selector-type estimators, see [9,10,11,12]. Under the non-Gaussian VAR process, Qiu et al. [13] proposed a quantile-based dantzig-selector-type estimator of the transition matrix for elliptical VAR process. Wong et al. [14] provided an alternative proof of the consistency of the Lasso for sparse non-Gaussian VAR models. Maleki et al. [15] extended the multivariate setting of autoregressive process, by considering the multivariate scale mixture of skew-normal distributions for VAR innovations.

The statistical second-order information contained in the data is usually expressed by the variance and covariance, most of the literature dealing with time series measure dependence using the variance and covariance. To investigate the validity of the variance estimates, we need the presence of the fourth order moment of the random variables. For the heavy-tailed nature of financial data, the third-order moments of random variables are usually non-existent. Schechtman and Yitzhaki [16] proposed the concept of Gini covariance, which has been used widely to measure the dependence of heavy-tailed distributions. Let H be the joint distribution of the random variables X and Y with the marginal distribution functions

F_{1}

and

F_{2}

, respectively. The standard Gini covariance is defined as

Gcov (X, Y) = cov (X, F_{2} (Y)) and Gcov (Y, X) = cov (Y, F_{1} (X)),

(3)

assuming the random variables with only a finite first moment. The Gini covariance has an advantage when analyzing bivariate data defined by both variate values and ranks of the values. The representation of Gini covariance

Gcov (X, Y)

indicates that it has mixed properties of the variable X and the rank of the variable Y, and thus complements the usual covariance and rank covariance [16,17,18]. In terms of balance between effciency and robustness, Gini covariance plays an important role in measuring association for variables from heavy-tailed distributions [19].

The Yule–Walker equations arise naturally in the problem of linear prediction of any zero-mean weakly stationary process based on a finite number of contiguous observations. The Yule–Walker equations provide a straightforward connection between the autoregressive model parameters and the covariance function of the process. In this paper, relaxing the strong assumption of the existence of higher order moments of the regressors, we use a non-parametric method to estimate the Gini covariance matrix, establish the Gini–Yule–Walker equation to estimate the sparse transition matrix of stationary PVAR processes. The estimator falls into the category of Dantzig-selector-type estimators. With existence of a finite second moment, we investigate the asymptotic behavior of the estimator in high dimensions.

The paper is orginazed as follows: In Section 2, we establish the Gini–Yule–Walker equation and estimate the sample Gini covariance matrix. In Section 3, we derive the convergence rate of transfer matrix estimation. In Section 4, we discuss the characterization and estimation of Granger causality under the heavy-tailed PVAR model. In Section 5, both synthetic and real data are used to demonstrate the empirical performance of the proposed methodology.

2. Model

In this section, the notations are set. Then, we establish the Gini–Yule–Walker equation, obtain simple non-parametric estimators for Gini covariance matrix and investigate the convergence rate of the sample Gini covariance matrix.

2.1. Notation

Let

v = {(v_{1}, . . ., v_{d})}^{⊤}

be a d-dimensional real vector, and

M = [M_{j i}] \in R^{d_{1} \times d_{2}}

be a

d_{1} \times d_{2}

matrix. For

0 < q < \infty

, we define the vector

l_{q}

norm of v as

{|| v ||}_{q} = {(\sum_{j = 1}^{d} ∣ v_{j} ∣^{q})}^{1 / q}

, and the vector

l_{\infty}

norm of v as

{|| v ||}_{\infty} = \max_{j = 1}^{d} ∣ v_{j} ∣

. Let the matrix

l_{1}

norm of M as

{|| M ||}_{1} = \max_{i} \sum_{j = 1}^{d} ∣ M_{j i} ∣

, the matrix

l_{\infty}

norm

{|| M ||}_{\infty} = \max_{j} \sum_{i = 1}^{d} ∣ M_{j i} ∣

,

X_{k} = {(X_{1 k}, . . ., X_{d k})}^{⊤}

and

Y_{k} = {(Y_{1 k}, . . ., Y_{d k})}^{⊤}

be two random vectors.

2.2. Gini–Yule–Walker Equation

In this paper, we model the time series vector by a stationary PVAR process under the existence of second moment. For each

k, 1 \leq k \leq n

,

X_{1 k}, . . ., X_{T k} \in R^{d}

follow a lag-one VAR process, with

E_{t k}

independent of

X_{(t - 1) k}

, and

E (X_{t k}) = 0, E (E_{t k}) = 0

.

We define

X_{k} = {(X_{1 k}, \cdot \cdot \cdot, X_{(T - 1) k})}^{⊤},

this VAR process may be performed by concatenating the equation systems to analyze the following equation,

y_{k} = Y_{k} β + ε_{k},

(4)

where

Y_{k} = X_{k} \otimes I_{d}

, and

y_{k} = [\begin{matrix} y_{1 k} \\ y_{2 k} \\ ⋮ \\ y_{d k} \end{matrix}], y_{i k} = [\begin{matrix} X_{2 i k} \\ X_{3 i k} \\ ⋮ \\ X_{T i k} \end{matrix}], β = [\begin{matrix} β_{1} \\ β_{2} \\ ⋮ \\ β_{d} \end{matrix}], β_{i} = [\begin{matrix} A_{i 1} \\ A_{i 2} \\ ⋮ \\ A_{i d} \end{matrix}], ε_{k} = [\begin{matrix} ε_{1 k} \\ ε_{2 k} \\ ⋮ \\ ε_{d k} \end{matrix}], ε_{i k} = [\begin{matrix} E_{2 i k} \\ E_{3 i k} \\ ⋮ \\ E_{T i k} \end{matrix}],

the ⊗ is a Kronecker product operator.

Since the matrix

Y_{k}

is a block diagonal matrix, the estimation problem can be decomposed into d independent sub-problems. Let us consider the i-th equation of the system.

[\begin{matrix} X_{2 i k} \\ X_{3 i k} \\ ⋮ \\ X_{T i k} \end{matrix}] = [\begin{matrix} X_{11 k} & X_{12 k} & \dots & X_{1 d k} \\ X_{21 k} & X_{22 k} & \dots & X_{2 d k} \\ ⋮ & ⋮ & ⋮ & ⋮ \\ X_{(T - 1) 1 k} & X_{(T - 1) 2 k} & \dots & X_{(T - 1) d k} \end{matrix}] [\begin{matrix} A_{i 1} \\ A_{i 2} \\ ⋮ \\ A_{i d} \end{matrix}] + [\begin{matrix} E_{2 i k} \\ E_{3 i k} \\ ⋮ \\ E_{T i k} \end{matrix}],

the above equation system can be considered as a multiple regression

X_{t i k} = X_{(t - 1) 1 k} A_{i 1} + X_{(t - 1) 2 k} A_{i 2} + . . . + X_{(t - 1) d k} A_{i d} + ϵ_{t i k}, t \in {2, 3, \dots, T},

this equation system can be abbreviated as

z_{i k} = x_{1 k} A_{i 1} + x_{2 k} A_{i 2} + . . . + x_{d k} A_{i d} + {\tilde{ϵ}}_{i k},

(5)

with

T - 1

samples to estimate the i-th line of transition matrix A.

Let

F (x_{i k})

be the distribution of

x_{i k}

, we assume the independence between

{\tilde{ϵ}}_{i k}

and

F (x_{i k})

, for

i = 1, 2, . . ., d

. Then we get the Gini covariance matrix equation issue from Equation (5),

[\begin{matrix} cov (z_{i k}, F (x_{1 k})) \\ cov (z_{i k}, F (x_{2 k})) \\ ⋮ \\ cov (z_{i k}, F (x_{d k})) \end{matrix}] = [\begin{matrix} cov (x_{1 k}, F (x_{1 k})) & cov (x_{2 k}, F (x_{1 k})) & \dots & cov (x_{d k}, F (x_{1 k})) \\ cov (x_{1 k}, F (x_{2 k})) & cov (x_{2 k}, F (x_{2 k})) & \dots & cov (x_{d k}, F (x_{2 k})) \\ ⋮ & ⋮ & ⋮ & ⋮ \\ cov (x_{1 k}, F (x_{d k})) & cov (x_{2 k}, F (x_{k d})) & \dots & cov (x_{d k}, F (x_{d k})) \end{matrix}] [\begin{matrix} A_{i 1} \\ A_{i 2} \\ ⋮ \\ A_{i d} \end{matrix}] .

From the above equation, we obtain the so called Gini–Yule–Walker Equation

G = \tilde{G} A^{⊤},

(6)

where

G = \frac{1}{n} \sum_{k = 1}^{n} G_{k}

,

\tilde{G} = \frac{1}{n} \sum_{k = 1}^{n} {\tilde{G}}_{k}

. The entries of

G_{k}

are given by

G_{i j k} = Gcov (z_{j k}, x_{i k})

, and the entries of

{\tilde{G}}_{k}

are given by

{\tilde{G}}_{i j k} = Gcov (x_{j k}, x_{i k})

.

2.3. Sample Gini Covariance Matrix

We use a

U -

statistic method to estimate the Gini covariance matrix

G_{k}

and

{\tilde{G}}_{k}

. From Equation (6), the elements of the covariance matrix

G_{k}

and

{\tilde{G}}_{k}

can be divided into two categories: Gini covariance

Gcov ({\tilde{X}}_{k}, {\tilde{Y}}_{k})

and Gini mean difference

Gcov ({\tilde{Y}}_{k}, {\tilde{Y}}_{k})

,

{\tilde{X}}_{k} \neq {\tilde{Y}}_{k}

with

{\tilde{X}}_{k} \in \{z_{1 k}, z_{2 k}, \dots, z_{d k}, x_{1 k}, x_{2 k}, \dots, x_{d k}\}, {\tilde{Y}}_{k} \in \{x_{1 k}, x_{2 k}, \dots, x_{d k}\}

.

For the Gini covariance

Gcov ({\tilde{X}}_{k}, {\tilde{Y}}_{k})

, we have sample space

{({\tilde{X}}_{1 k}, {\tilde{Y}}_{1 k}), ({\tilde{X}}_{2 k}, {\tilde{Y}}_{2 k}), \dots, ({\tilde{X}}_{(T - 1) k}, {\tilde{Y}}_{(T - 1) k})} .

The i-th ordered variable of

{\tilde{X}}_{k}

is expressed by

{\tilde{X}}_{i : (T - 1) k}

and the associated variable of

{\tilde{Y}}_{k}

(matched with

{\tilde{X}}_{i : (T - 1) k}

) is expressed by

{\tilde{Y}}_{[i : (T - 1)] k}

, which is the concomitant of the i-th order statistic. In this set-up, in the context of non-parametric estimation of a regression function, Yang [20] proposed a statistic of the form

R (F_{(T - 1)} ({\tilde{X}}_{k}, {\tilde{Y}}_{k})) = \frac{1}{(T - 1)} \sum_{i = 1}^{(T - 1)} J (\frac{i}{(T - 1)}) h ({\tilde{X}}_{i : (T - 1) k}, {\tilde{Y}}_{[i : (T - 1)] k}),

(7)

where

J (.)

is a bounded smooth function,

h (.)

is a real valued function of

({\tilde{X}}_{k}, {\tilde{Y}}_{k}),

and

F_{T - 1} (.)

is the empirical distribution function corresponding to

F_{{\tilde{X}}_{k}, {\tilde{Y}}_{k}} (.)

. The Gini covariance defined in Equation (3) can be rewritten as

Gcov ({\tilde{X}}_{k}, {\tilde{Y}}_{k}) = \int_{0}^{\infty} \int_{0}^{\infty} \frac{1}{2} \tilde{x} (2 F_{{\tilde{Y}}_{k}} (\tilde{y}) - 1) d F_{{\tilde{X}}_{k}, {\tilde{Y}}_{k}} (\tilde{x}, \tilde{y}) .

Choosing

J (F_{{\tilde{Y}}_{k}} (\tilde{y})) = 2 F_{{\tilde{Y}}_{k}} (\tilde{y}) - 1

and

h (\tilde{x}, \tilde{y}) = \frac{1}{2} \tilde{x},

from Equation (7), we obtain an estimator of

G ({\tilde{X}}_{k}, {\tilde{Y}}_{k})

as

U ({\tilde{X}}_{k}, {\tilde{Y}}_{k}) = \frac{2}{{(T - 1)}^{2}} \sum_{i = 1}^{T - 1} (2 i - T) {\tilde{X}}_{[i : (T - 1)] k} .

(8)

For the Gini mean difference

Gcov ({\tilde{Y}}_{k}, {\tilde{Y}}_{k})

, we have sample space

{{\tilde{Y}}_{1 k}, {\tilde{Y}}_{2 k}, . . ., {\tilde{Y}}_{(T - 1) k}}

. Let

{\tilde{Y}}_{1 k}

and

{\tilde{Y}}_{2 k}

be two independent random variables with distribution function

F_{{\tilde{Y}}_{k}}

, the Gini mean difference

Gcov ({\tilde{Y}}_{k}, {\tilde{Y}}_{k})

can be expressed as

\begin{matrix} Gcov ({\tilde{Y}}_{k}, {\tilde{Y}}_{k}) & = \frac{1}{2} (\int_{0}^{\infty} 2 \tilde{y} F_{{\tilde{Y}}_{k}} (\tilde{y}) d F_{{\tilde{Y}}_{k}} (\tilde{y}) - \int_{0}^{\infty} \tilde{y} d F_{{\tilde{Y}}_{k}} (\tilde{y})) \\ = \frac{1}{2} E (max ({\tilde{Y}}_{1 k}, {\tilde{Y}}_{2 k}) - {\tilde{Y}}_{1 k}) . \end{matrix}

The estimator of

Gcov ({\tilde{Y}}_{k}, {\tilde{Y}}_{k})

based on U-statistics is given by

U ({\tilde{Y}}_{k}, {\tilde{Y}}_{k}) = \frac{1}{(\binom{T - 1}{2})} \sum_{i = 1}^{T - 1} \sum_{i < j, j = 2}^{T - 1} k ({\tilde{Y}}_{i k}, {\tilde{Y}}_{j k}),

where

k ({\tilde{Y}}_{1 k}, {\tilde{Y}}_{2 k}) = \frac{1}{4} (2 max ({\tilde{Y}}_{1 k}, {\tilde{Y}}_{2 k}) - {\tilde{Y}}_{1 k} - {\tilde{Y}}_{2 k})

. After some simplification, we obtain

U ({\tilde{Y}}_{k}, {\tilde{Y}}_{k}) = \frac{1}{2 (T - 1) (T - 2)} \sum_{i = 1}^{T - 1} (2 i - T - 2) {\tilde{Y}}_{i : (T - 1) k},

(9)

where

{\tilde{Y}}_{i : (T - 1) k}

is the i -th order statistic based on the sample space

{{\tilde{Y}}_{1 k}, \dots, {\tilde{Y}}_{(T - 1) k}}

.

2.4. Convergence Rates of the Estimator $U ({\tilde{X}}_{k}, {\tilde{Y}}_{k})$ and $U ({\tilde{Y}}_{k}, {\tilde{Y}}_{k})$

In this subsection, with the truncation method, we use the Bernstein’s inequality to investigate the convergence rates of the estimator

U ({\tilde{X}}_{k}, {\tilde{Y}}_{k})

and

U ({\tilde{Y}}_{k}, {\tilde{Y}}_{k})

. From Equations (8) and (9), we define

γ_{k} = U ({\tilde{X}}_{k}, {\tilde{Y}}_{k}), η_{k} = U ({\tilde{Y}}_{k}, {\tilde{Y}}_{k}), 1 \leq k \leq n

, and

Δ = \max {D (γ_{k}), D (η_{k})}

, where

D (γ_{k})

is the variance of the variable

γ_{k}

.

For analysis, we require the following three assumptions on the time series and the size of variables

(d, T, n)

:

Assumption A1.

From Equation (2), suppose that the entries of the heavy-tailed innovation

E_{k T + v} = {(E_{k T + v 1}, \dots, E_{k T + v d})}^{⊤}

obey a power-law distribution with

P (| E_{k T + v i} | > x) \sim C x^{- α_{i}}

and

α_{i} \leq 2

, then the second moment is finite and the third moment is infinite.

Assumption A2.

Suppose that

\frac{log n}{T} \to 0

,

\frac{d T}{\sqrt{n}} \leq c

, c is a finite constant, for

T \to \infty, n \to \infty

.

Assumption A3.

Suppose

\frac{d}{n} \to 0

, for

n \to \infty .

Lemma 1.

Let

{X_{k}}_{k \in Z}

be a stationary PVAR process from Equation (2), and

X_{1}, \dots, X_{n}

be a sequence of observations from

{X_{k}}_{k \in Z}

. Suppose that Assumptions (A1)–(A2) are satisfied. Then, for T and n large enough, with probability no smaller than

1 - (\frac{8 Δ log n}{d^{2} T} + \frac{2}{n^{2}}),

we have

| \frac{1}{n} \sum_{k = 1}^{n} (γ_{k} - E γ_{k}) | \leq d \sqrt{\frac{8 T log n}{n}},

(10)

| \frac{1}{n} \sum_{k = 1}^{n} (η_{k} - E η_{k}) | \leq d \sqrt{\frac{8 T log n}{n}} .

(11)

Proof.

Assuming that m and

λ

are constants greater than 0, and

{\tilde{γ}}_{k} = (γ_{k} - E γ_{k}) 1_{| γ_{k} - E γ_{k} | \leq m} .

Then,

{\tilde{γ}}_{k}

is a bounded random variable and satisfies the property of independent identical distribution, it follows from the Bernstein’s inequality that

P (| \sum_{k = 1}^{n} {\tilde{γ}}_{k} | \geq λ) \leq 2 exp (- λ^{2} {(2 σ^{2} + \frac{2}{3} m λ)}^{- 1}),

(12)

where

σ^{2} = n D ({\tilde{γ}}_{k}) \leq n Δ

.

Let

m = d \sqrt{\frac{T n}{8 log n}}, δ = d \sqrt{\frac{8 T log n}{n}}

and

λ = n δ

, we have

\begin{matrix} P (| \frac{1}{n} \sum_{k = 1}^{n} (γ_{k} - E γ_{k}) | \geq & δ) \leq n P (| γ_{k} - E γ_{k} | > m) + P (| \sum_{k = 1}^{n} {\tilde{γ}}_{k} | \geq n δ) \\ \leq & \frac{n D (γ_{k})}{m^{2}} + 2 exp (- \frac{8 d^{2} T log n}{2 D (γ_{k}) + \frac{2}{3} d^{2} T}) \\ \leq & \frac{8 Δ log n}{d^{2} T} + \frac{2}{n^{2}} . \end{matrix}

As

T \to \infty

and

n \to \infty

, assuming

\frac{log n}{T} \to 0

and

\frac{d T}{\sqrt{n}} \leq c

, then

\frac{8 Δ log n}{d^{2} T} + \frac{2}{n^{2}} \to 0

.

With similar proof methods, we obtain that

P (| \frac{1}{n} \sum_{k = 1}^{n} (η_{k} - E η_{k}) | \geq δ) \leq \frac{8 Δ log n}{d^{2} T} + \frac{2}{n^{2}} .

This completes the proof. □

From Equations (5) and (6), we define the sample estimation of Gini covariance matrix G as

\hat{G} = ({\hat{G}}_{i j}), {\hat{G}}_{i j} = \frac{1}{n} \sum_{k = 1}^{n} U (z_{j k}, x_{i k})

and the sample estimation of Gini covariance matrix

\tilde{G}

as

\hat{\tilde{G}} = ({\hat{\tilde{G}}}_{i j}), {\hat{\tilde{G}}}_{i j} = \frac{1}{n} \sum_{k = 1}^{n} U (x_{j k}, x_{i k})

,

1 \leq i, j \leq d

.

Next, we investigate the convergence rates of the estimator

\hat{G}

and

\hat{\tilde{G}}

under the

l_{\infty}

norm.

Lemma 2.

Let

{X_{k}}_{k \in Z}

be a stationary PVAR process from Equation (2), and

X_{1}, \dots, X_{n}

be a sequence of observations from

{X_{k}}_{k \in Z}

. Suppose that Assumptions (A1)–(A3) are satisfied. Then, for T and n large enough, with probability no smaller than

1 - (\frac{8 Δ log n}{T} + \frac{2 d^{2}}{n^{2}}),

we have

|| \hat{G} {- G ||}_{\infty} \leq d \sqrt{\frac{8 T log n}{n}},

(13)

|| \hat{\tilde{G}} - \tilde{G} {||}_{\infty} \leq d \sqrt{\frac{8 T log n}{n}} .

(14)

Proof.

Let

δ = d \sqrt{\frac{8 T log n}{n}}

, based on the Lemma 1, we have

\begin{matrix} P \{{∥\hat{G} - G∥}_{\infty} \geq δ\} \leq & \sum_{i, j = 1}^{d} P (| {\hat{G}}_{i j} - G_{i j} | \geq δ) \\ \leq & d^{2} P (| {\hat{G}}_{i j} - G_{i j} | \geq δ) \\ = & d^{2} P (| \frac{1}{n} \sum_{k = 1}^{n} (U (z_{j k}, x_{i k}) - Gcov (z_{j k}, x_{i k})) | \geq δ) \\ \leq & d^{2} P (| \frac{1}{n} \sum_{k = 1}^{n} (γ_{k} - E γ_{k}) | \geq δ) \\ \leq & \frac{8 Δ log n}{T} + \frac{2 d^{2}}{n^{2}} . \end{matrix}

As

T \to \infty

and

n \to \infty

, assuming

\frac{log n}{T} \to 0

,

\frac{d T}{\sqrt{n}} \leq c

and

\frac{d}{n} \to 0

, then

\frac{8 Δ log n}{T} + \frac{2 d^{2}}{n^{2}} \to 0

.

Next, we study the convergence of sample Gini covariance matrix

\hat{\tilde{G}}

.

Let

δ = d \sqrt{\frac{8 T log n}{n}}

, based on the Lemma 1,

\begin{matrix} P (| {\hat{\tilde{G}}}_{j j} - {\tilde{G}}_{j j} | \geq δ) = & P (| \frac{1}{n} \sum_{k = 1}^{n} (U (x_{j k}, x_{j k}) - Gcov (x_{j k}, x_{j k})) | \geq δ) \\ = & P (| \frac{1}{n} \sum_{k = 1}^{n} (η_{k} - E η_{k}) | \geq δ) \\ \leq & \frac{8 Δ log n}{d^{2} T} + \frac{2}{n^{2}} . \end{matrix}

(15)

Now, for the off-diagonal entries, we have

\begin{matrix} P (| {\hat{\tilde{G}}}_{i j} - {\tilde{G}}_{i j} | \geq δ) = & P (| \frac{1}{n} \sum_{k = 1}^{n} (U (x_{j k}, x_{i k}) - Gcov (x_{j k}, x_{i k})) | \geq δ) \\ = & P (| \frac{1}{n} \sum_{k = 1}^{n} (γ_{k} - E γ_{k}) | \geq δ) \\ \leq & \frac{8 Δ log n}{d^{2} T} + \frac{2}{n^{2}} . \end{matrix}

(16)

Combining Equations (15) and (16), we obtain

\begin{matrix} P \{{∥\hat{\tilde{G}} - \tilde{G}∥}_{\infty} \geq δ\} \leq & \sum_{i, j = 1}^{d} P (| {\hat{\tilde{G}}}_{i j} - {\tilde{G}}_{i j} | \geq δ) \\ \leq & d^{2} (\frac{8 Δ log n}{d^{2} T} + \frac{2}{n^{2}}) \\ = & \frac{8 Δ log n}{T} + \frac{2 d^{2}}{n^{2}} . \end{matrix}

This completes the proof. □

3. Theoretical Properties

Using the Gini covariance matrix Equation (6), we propose to estimate A by

\hat{A} = \underset{M \in R^{d \times d}}{arg min} || M_{j k} || s . t . || \hat{G} - \hat{\tilde{G}} M^{⊤} {||}_{\infty} \leq \tilde{λ} .

(17)

The above optimization problem can be further decomposed into d subproblems. The i-th row of transition matrix A by

{\hat{A}}_{i *} = \underset{v \in R^{d \times 1}}{arg min} {|| v ||}_{1} s . t . {|| {\hat{G}}_{* i} - \hat{\tilde{G}} v ||}_{\infty} \leq \tilde{λ} .

(18)

Compared to the lasso-type procedures, the proposed method can be solved in parallel and Equation (17) can be solved efficiency using the parametric simplex method for sparse learning in [21].

Based on Lemma 2, we can further deliver the rates of convergence of

\hat{A}

under the matrix

l_{\infty}

norm and

l_{1}

norm. We start with some additional notation. For

α \in [0, 1), η > 0,

and

M_{d} > 0

that may scale with d, we define the matrix class

M (α, η, M_{d}) = {M \in R^{d \times d} : max_{1 \leq j \leq d} \sum_{k = 1}^{d} | M_{j k} |^{α} \leq {η, || M ||}_{1} \leq M_{d}} .

(19)

M (α, η, M_{d})

requires the transition matrices are sparse in rows. If

α = 0

, then the maximum number of nonzeros in rows of transition matrice is at most

η

.

M (α, η, M_{d})

is also investigated in [12].

Theorem 1.

Let

{X_{k}}_{k \in Z}

be a stationary PVAR process from Equation (2). Suppose that Assumptions (A1)–(A3) are satisfied. The transition matrix

A^{⊤} \in M (α, η, M_{d})

, if we choose the tuning parameter

\tilde{λ} = (1 + M_{d}) d \sqrt{\frac{8 T log n}{n}}

. Then, for T and n large enough, with probability no small than

1 - (\frac{8 Δ log n}{T} + \frac{2 d^{2}}{n^{2}})

, we have

|| {\hat{A}}^{⊤} - A^{⊤} {||}_{\infty} \leq 2 \tilde{λ} {|| {(\tilde{G})}^{- 1} ||}_{1},

(20)

|| {\hat{A}}^{⊤} - A^{⊤} {||}_{1} \leq 4 η (2 \tilde{λ} || {\tilde{G}}^{- 1} {{||}_{1})}^{1 - α} .

(21)

Proof.

We first show that with large probability, A is feasible to the optimization problem. By the Gini–Yule–Walker equation, we have

\begin{matrix} \begin{matrix} || \hat{\tilde{G}} A^{⊤} - \hat{G} {||}_{\infty} & = & || \hat{\tilde{G}} {\tilde{G}}^{- 1} G - \hat{G} {||}_{\infty} \\ = & || \hat{\tilde{G}} {\tilde{G}}^{- 1} G - G + G - \hat{G} {||}_{\infty} \\ \leq & || (\hat{\tilde{G}} {\tilde{G}}^{- 1} - I) {G ||}_{\infty} + {|| G - \hat{G} ||}_{\infty} \\ \leq & || \hat{\tilde{G}} - \tilde{G} {||}_{\infty} || A^{⊤} {||}_{1} + {|| G - \hat{G} ||}_{\infty} \\ \leq & || \hat{\tilde{G}} - \tilde{G} {||}_{\infty} M_{d} + {|| G - \hat{G} ||}_{\infty} . \end{matrix} \end{matrix}

The last inequality is due to

A^{⊤} \in M (α, η, M_{d})

, by Lemma 2, with probability no small than

1 - (\frac{8 Δ log n}{T} + \frac{2 d^{2}}{n^{2}})

that,

|| \hat{\tilde{G}} A^{⊤} - \hat{G} {||}_{\infty} \leq (1 + M_{d}) d \sqrt{\frac{8 T log n}{n}} = \tilde{λ} .

A is feasible in the optimization equation, by checking the Equation (17), we have

|| \hat{A} || \leq ∥A∥

with probability no smaller than

1 - (\frac{8 Δ log n}{T} + \frac{2 d^{2}}{n^{2}})

.

Next, we prove Equation (20). Let

A^{⊤} = {\tilde{G}}^{- 1} G

, we have

\begin{matrix} \begin{matrix} || {\hat{A}}^{⊤} - A^{⊤} {||}_{\infty} & = & || {\hat{A}}^{⊤} - {\tilde{G}}^{- 1} {G ||}_{\infty} \\ = & || {\tilde{G}}^{- 1} (\tilde{G} {\hat{A}}^{⊤} - G) {||}_{\infty} \\ = & || {\tilde{G}}^{- 1} (\tilde{G} {\hat{A}}^{⊤} - \hat{\tilde{G}} {\hat{A}}^{⊤} + \hat{G} - G + \hat{\tilde{G}} {\hat{A}}^{⊤} - \hat{G}) {||}_{\infty} \\ \leq & || {\tilde{G}}^{- 1} {||}_{1} (|| \tilde{G} - \hat{\tilde{G}} {||}_{\infty} || {\hat{A}}^{⊤} {||}_{1} + || \hat{G} {- G ||}_{\infty} + || \hat{\tilde{G}} {\hat{A}}^{⊤} - \hat{G} {) ||}_{\infty}) \\ \leq & 2 \tilde{λ} {|| {\tilde{G}}^{- 1} ||}_{1} \end{matrix} \end{matrix}

with probability no smaller than

1 - (\frac{8 Δ log n}{T} + \frac{2 d^{2}}{n^{2}})

.

Let

λ_{1} \geq 0

and

S_{j} = {m : | A_{j m} | \geq λ_{1}}, j = 1, \dots, d

. Define

\tilde{D} (λ_{1}) = max_{j \leq d} \sum_{j = 1}^{d} min \{| A_{j m}^{⊤} | \land λ_{1}, 1\} .

Then, we have

\begin{matrix} \begin{matrix} || {({\hat{A}}^{⊤} - A^{⊤})}_{j *} {||}_{1} & \leq & || {\hat{A}}_{j, S_{j}^{c}}^{⊤} {||}_{1} + || A_{j, S_{j}^{c}}^{⊤} {||}_{1} + {|| {({\hat{A}}^{⊤} - A^{⊤})}_{j, S_{j}} ||}_{1} \\ = & || A_{j *}^{⊤} {||}_{1} - || {\hat{A}}_{j, S_{j}}^{⊤} {||}_{1} + || A_{j, S_{j}^{c}}^{⊤} {||}_{1} + {|| {({\hat{A}}^{⊤} - A^{⊤})}_{j, S_{j}} ||}_{1} \\ \leq & 2 || A_{j, S_{j}^{c}}^{⊤} {||}_{1} + 2 {|| {({\hat{A}}^{⊤} - A^{⊤})}_{j, S_{j}} ||}_{1} \\ \leq & 2 || A_{j, S_{j}^{c}}^{⊤} {||}_{1} + 2 || {\hat{A}}^{⊤} - A^{⊤} {||}_{\infty} \cdot | S_{j} | \\ \leq & 2 \tilde{D} (λ_{1}) (λ_{1} + 2 \tilde{λ} || {\tilde{G}}^{- 1} {||}_{1}) . \end{matrix} \end{matrix}

Suppose

{max}_{j} \sum_{i = 1}^{d} {|{(A^{⊤})}_{i j}|}^{α} \leq η

and setting

λ_{1} = 2 \tilde{λ} {|| {\tilde{G}}^{- 1} ||}_{1},

we have

λ_{1} S_{j} = max_{1 \leq j \leq d} \sum_{i = 1}^{d} min \{|{(A)}_{i j}|, λ_{1}\} \leq λ_{1} max_{1 \leq j \leq d} \sum_{i = 1}^{d} min \{{|{(A)}_{i j}|}^{α} / λ_{1}^{α}, 1\} \leq λ_{1}^{1 - α} η .

Therefore, we have

|| {({\hat{A}}^{⊤} - A^{⊤})}_{j *} {||}_{1} \leq 4 λ_{1} \tilde{D} (λ_{1}) \leq 4 η λ_{1}^{1 - α} = 4 η (2 \tilde{λ} || {\tilde{G}}^{- 1} {{||}_{1})}^{1 - α} .

Since the above equation holds for any

j = 1, \dots, d

, we complete the proof.

□

4. Granger Causality

In this section, the practical example is conducted to verify the effectiveness of the proposed methods, moreover, the characterization and estimation of Granger causality under the heavy-tailed PVAR model are discussed. Firstly, we give the definition of Granger causality.

Definition 1.(Granger [22]) Let

\{X_{t}\} \in Z

be a stationary process, where

X_{t} = {(X_{t 1}, \dots, X_{t d})}^{T}

. For

j \neq k \in {1, \dots, d},

{\{X_{t k}\}}_{t \in Z}

Granger causes

{\{X_{t j}\}}_{t \in Z}

if and only if there exists a measurable set A such that

P (X_{(t + 1) j} \in A ∣ {\{X_{s}\}}_{s \leq t}) \neq P (X_{(t + 1) j} \in A ∣ {\{X_{s, \ k}\}}_{s \leq t})

for all

t \in Z,

where

X_{s, \ k}

is the subvector obtained by removing

X_{s k}

from

X_{s}

For a Gaussian VAR process

{\{X_{t}\}}_{t \in Z},

we have that

{\{X_{t k}\}}_{t \in Z}

Granger causes

{\{X_{t j}\}}_{t \in Z}

if and only if the

(j, k)

entry of the transition matrix is non-zero [4]. For the heavy-tailed PVAR process, let

{\{X_{k}\}}_{k \in Z}

be a stationary PVAR process from Equation (2), we define

{\overset{ˇ}{X}}_{t} = \frac{1}{n} \sum_{k = 1}^{n} X_{t k}, {\overset{ˇ}{E}}_{t} = \frac{1}{n} \sum_{k = 1}^{n} E_{t k}, 1 \leq t \leq T .

In the next theorem, we show that a similar property holds for the heavy-tailed PVAR process.

Theorem 2.

Let

{\{X_{k}\}}_{k \in Z}

be a stationary PVAR process from Equation (2). Suppose that Assumptions (A1)–(A3) are satisfied, and

Gcov {({\overset{ˇ}{X}}_{t i}, {\overset{ˇ}{X}}_{s i} ∣ {\overset{ˇ}{X}}_{s, \ t})}_{s \leq t} \neq 0

for any

i \in {1, \dots, d} .

Then, for

j \neq i \in {1,

\dots, d},

we have

1.: If $A_{j i} \neq 0,$ then ${\{{\overset{ˇ}{X}}_{t i}\}}_{t \in Z}$ Granger causes ${\{{\overset{ˇ}{X}}_{t j}\}}_{t \in Z}$ .
2.: If we further assume that ${\overset{ˇ}{E}}_{t + 1}$ is independent of ${\{{\overset{ˇ}{X}}_{s}\}}_{s \leq t}$ for any $t \in Z,$ we have that ${\{{\overset{ˇ}{X}}_{t i}\}}_{t \in Z}$ Granger causes ${\{{\overset{ˇ}{X}}_{t j}\}}_{t \in Z}$ if and only if $A_{j i} \neq 0$ .

Proof.

In order to prove Issue

1,

we only need to prove that

{\{{\overset{ˇ}{X}}_{t i}\}}_{t \in Z}

doesn’t Granger cause

{\{{\overset{ˇ}{X}}_{t j}\}}_{t \in Z}

implies

A_{j i} = 0 .

Suppose for some

t \in Z,

we have

P ({\overset{ˇ}{X}}_{(t + 1) j} \in A ∣ {\{{\overset{ˇ}{X}}_{s}\}}_{s ⩽ t}) = P ({\overset{ˇ}{X}}_{(t + 1) j} \in A ∣ {\{{\overset{ˇ}{X}}_{s, \ i}\}}_{s ⩽ t})

for any measurable set A. The above equation implies that conditioning on

{\{{\overset{ˇ}{X}}_{s, \ i}\}}_{s ⩽ t}

,

{\overset{ˇ}{X}}_{(t + 1) j}

is independent of

{\{{\overset{ˇ}{X}}_{s j}\}}_{s ⩽ t} .

Hence, we have

Gcov ({\overset{ˇ}{X}}_{(t + 1) j}, {\overset{ˇ}{X}}_{t i} ∣ {\{{\overset{ˇ}{X}}_{s, \ i}\}}_{s ⩽ t}) = 0 .

Plugging

{\overset{ˇ}{X}}_{(t + 1) j} = \sum_{i = 1}^{d} A_{j i} {\overset{ˇ}{X}}_{t i} + {\overset{ˇ}{E}}_{(t + 1) j}

into the above equation, we have

\begin{matrix} 0 = Gcov (A_{j i} {\overset{ˇ}{X}}_{t i}, {\overset{ˇ}{X}}_{t i} ∣ {\{{\overset{ˇ}{X}}_{s, \ i}\}}_{s ⩽ t}) + \\ Gcov (\sum_{i \neq j} A_{j i} {\overset{ˇ}{X}}_{t i}, {\overset{ˇ}{X}}_{t j} ∣ {\{{\overset{ˇ}{X}}_{s, \ i}\}}_{s ⩽ t}) + \\ Gcov ({\overset{ˇ}{E}}_{(t + 1) j}, {\overset{ˇ}{X}}_{t i} ∣ {\{{\overset{ˇ}{X}}_{s, i}\}}_{s ⩽ t}) . \end{matrix}

The second term on the right hand side is

0,

since given

{\{{\overset{ˇ}{X}}_{s i}\}}_{s ⩽ t}, \sum_{i \neq j} A_{j i} {\overset{ˇ}{X}}_{t i}

is constant. Since

{\overset{ˇ}{E}}_{t k}

and

{({\overset{ˇ}{X}}_{s})}_{s < t}

are independent for any

t \in Z

, we have

Gcov ({\overset{ˇ}{E}}_{(t + 1) j}, {\overset{ˇ}{X}}_{s i}) = 0

for any

s ⩽ t

. Using Theorem 2.18 in Fang et al. [23], we have the third term is also

0 .

Thus, we have

Gcov ({\overset{ˇ}{X}}_{t i}, {\overset{ˇ}{X}}_{t i} ∣ {\{{\overset{ˇ}{X}}_{s, \ i}\}}_{s ⩽ t}) = 0

and hence

A_{j i} = 0 .

This proves Issue 1.

Given Issue

1,

to prove Issue

2,

it remains to prove that

A_{j i} = 0

implies that

{\{{\overset{ˇ}{X}}_{t i}\}}_{t \in Z}

doesn’t Granger cause

{\{{\overset{ˇ}{X}}_{t j}\}}_{t \in Z} .

Since

A_{j i} = 0,

we have

\begin{matrix} p ({\overset{ˇ}{X}}_{(t + 1) j}, {\{{\overset{ˇ}{X}}_{s i}\}}_{s ⩽ t} ∣ {\{{\overset{ˇ}{X}}_{s, \ i}\}}_{s ⩽ t}) \\ = & p ({\overset{ˇ}{X}}_{(t + 1) j} ∣ {\{{\overset{ˇ}{X}}_{s, \ i}\}}_{s ⩽ t}) p ({\{{\overset{ˇ}{X}}_{s i}\}}_{s ⩽ t} ∣ {\{{\overset{ˇ}{X}}_{s, \ i}\}}_{s ⩽ t}) . \end{matrix}

(22)

Here p is the conditional probability density function. The last equation is because

E_{t + 1 k}

is independent of

{\{{\overset{ˇ}{X}}_{s}\}}_{s ⩽ t},

and the fact that

\sum_{i \neq k} A_{j i} {\overset{ˇ}{X}}_{t i}

is constant given

{\{{\overset{ˇ}{X}}_{s} \ i\}}_{s \leq t} .

Hence, we have

\begin{matrix} p ({\overset{ˇ}{X}}_{(t + 1) j}, {\{{\overset{ˇ}{X}}_{s i}\}}_{s ⩽ t} ∣ {\{{\overset{ˇ}{X}}_{s, \ i}\}}_{s ⩽ t}) \\ = & p ({\overset{ˇ}{X}}_{(t + 1) j} ∣ {\{{\overset{ˇ}{X}}_{s, \ i}\}}_{s ⩽ t}) p ({\{{\overset{ˇ}{X}}_{s i}\}}_{s ⩽ t} ∣ {\{{\overset{ˇ}{X}}_{s, \ i}\}}_{s ⩽ t}) \end{matrix}

(23)

and thus

p ({\overset{ˇ}{X}}_{(t + 1) j} ∣ {\{{\overset{ˇ}{X}}_{s}\}}_{s ⩽ t}) = p ({\overset{ˇ}{X}}_{(t + 1) j} ∣ {\{{\overset{ˇ}{X}}_{s, \ i}\}}_{s ⩽ t}) .

□

Remark 1.

The assumption that

Gcov {({\overset{ˇ}{X}}_{t i}, {\overset{ˇ}{X}}_{s \ i} ∣ {\overset{ˇ}{X}}_{s, \ i})}_{s \leq t} \neq 0

requires that

{\overset{ˇ}{X}}_{t i}

cannot be perfectly predictable from the past or from the other observed random variables at time t. Otherwise, we can simply remove

{\{{\overset{ˇ}{X}}_{t i}\}}_{t \in Z}

from the process

{\{{\overset{ˇ}{X}}_{t}\}}_{t \in Z},

since predicting

{\{{\overset{ˇ}{X}}_{t i}\}}_{t \in Z}

is trivial. Assuming that

{\overset{ˇ}{E}}_{t + 1}

is independent of

{\{{\overset{ˇ}{X}}_{s}\}}_{s \leq t}

for any

t \in Z,

the Granger causality relations among the processes

\{{\{{\overset{ˇ}{X}}_{j t}\}}_{t \in Z} : j = 1, \dots, d\}

is characterized by the non-zero entries of

A .

To estimate the Granger causality relations, we define

\tilde{A} = [{\tilde{A}}_{j i}],

where

{\tilde{A}}_{j i} : = {\hat{A}}_{j i} I (|{\hat{A}}_{j i}| \geq γ)

for some threshold parameter γ. To evaluate the consistency between

\tilde{A}

and A regarding sparsity pattern, we define function

sign (x) : = I (x > 0) - I (x < 0)

. For a matrix M, define

sign (M) : = [sign (M_{j i})]

. The next theorem gives the rate of γ such that

\tilde{A}

recovers the sparsity pattern of A with high probability.

Theorem 3.

Let

{\{X_{k}\}}_{k \in Z}

be a stationary PVAR process from Equation (2). Suppose that Assumptions 1–3 are satisfied. The transition matrix

A^{⊤} \in M (α, η, M_{d})

, if we set

γ = 2 \tilde{λ} {|| {(\tilde{G})}^{- 1} ||}_{1},

(24)

then, with probability no smaller than

1 - (\frac{8 Δ log n}{T} + \frac{2 d^{2}}{n^{2}})

, we have

sign (\tilde{A}) = sign (A),

provided that

min_{\{(j, i) : A_{j i} > 0\}} |A_{j i}| \geq 2 γ .

(25)

Proof.

The proof is a consequence of Theorem 1. In detail, if

A_{j i} > 0,

by Equation (25), we have

A_{j i} ⩾ 2 γ .

By Theorem 1, with probability no smaller than

1 - (\frac{8 Δ log n}{T} + \frac{2 d^{2}}{n^{2}})

, we have

|{\hat{A}}_{j i} - A_{j i}| ⩽ γ

. Thus, we have

{\hat{A}}_{j i} ⩾ γ

with probability no smaller than

1 - (\frac{8 Δ log n}{T} + \frac{2 d^{2}}{n^{2}})

. By the definition of A, we have

{\tilde{A}}_{j i} = {\hat{A}}_{j i} ⩾ γ > 0

.

If

A_{j i} < 0,

by Equation (25), we have

A_{j i} ⩽ - 2 γ .

Using Theorem 1, we have

{\hat{A}}_{j i} ⩽ - γ

with probability no smaller than

1 - (\frac{8 Δ log n}{T} + \frac{2 d^{2}}{n^{2}})

, By the definition of

\tilde{A},

we have

{\tilde{A}}_{j i} = {\hat{A}}_{j i} ⩽ - γ < 0

.

If

A_{j i} = 0

, using Theorem 1, we have

{\hat{A}}_{j i} < γ

with probability no smaller than

1 - (\frac{8 Δ log n}{T} + \frac{2 d^{2}}{n^{2}})

, since

P ({\hat{A}}_{j i} = γ) =

0. By the definition of

\tilde{A},

we have

{\tilde{A}}_{j i} = 0

.

□

5. Experiments

This section provides some numerical results on synthetic and real data. We consider Lasso and Dantzig selector for comparison.

Lasso: an $l_{1}$ regularized estimator defined as

$\hat{A} = \underset{M \in R^{d \times d}}{argmin \frac{1}{n} \sum_{k = 1}^{n}} {|| Y_{k} - M X_{k} ||}_{F}^{2} + ρ \sum_{i j} M_{i j},$

where $Y_{k} : = (X_{1 k}, \dots, X_{(T - 1) k}) \in R^{d \times (T - 1)}$ , $X_{k} : = (X_{2 k}, \dots, X_{T k}) \in R^{d \times (T - 1)}$ .
Dantzig selector: the estimator proposed in [12],

$\hat{A} = \underset{M \in R^{d \times d}}{arg min} \sum_{j k} | M_{j k} | s . t . \frac{1}{n} \sum_{k = 1}^{n} {|| S_{k} M - S_{1 k}^{⊤} ||}_{m a x} \leq λ,$

$S_{k}$ and $S_{1 k}$ be the marginal and lag one sample covariance matrices of Equation (2).
G-Dantzig selector: the estimator described in Equation (17).

5.1. Synthetic Data

In this subsection, we compare the performance of our method with the lasso and Dantzig under synthetic data. The “flare” package in R is used to create the transition matrix according to three patterns: cluster, hub, random. We rescale A such that

{|| A ||}_{2} = 0.8

, generate

Σ

such that

{|| Σ ||}_{2} = 2 {|| A ||}_{2}

, then calculate the covariance matrix

Ψ

of the noise vector

E_{t}

as

Ψ = Σ - A^{T} Σ A

. Let the periodical time series length

n = 2000

, the period length

T = 100

and the dimension

d = 150

, with

A, Σ

and

Ψ

, we simulate periodical time series according to the model described in Equation (2). Specifically, we consider three models:

Model 1: Data generated from Equation (2), where the errors $E_{t} \sim N (0, Ψ)$ . There is no outliers under Model 1.
Model 2: Data generated from Equation (2), where the errors $E_{t} \sim t (0, Ψ, 2)$ . No outliers appear and the tail of the error distribution is heavier than that of the normal distribution.
Model 3: Data generated from Equation (2) with the same error distribution as Model 1. Then replace $X_{i}$ by $X_{i} + {(- 1)}^{I (U_{1} < 1 / 2)} (20 + 10 U_{2}) I$ for $i = 1, . . ., T / 5$ , where $U_{1}$ and $U_{2}$ are independently generated from $U [0, 1]$ distribution, there are $T / 5$ outliers that deviate from the majority of the observations.

The tuning parameter

λ

is chosen by cross validation. We construct 20,000 replicates and compare the three methods described above. Table 1 presents averaged estimation errors under three matrix norms. From this table, we have the following findings: Under the Gaussian model (Model 1), G-Dantzig selector has comparable performance as Dantzig selector and out performs Lasso. Under Model 2 and 3, our methods are more stable than Lasso and Dantzig selector. Thus, we conclude that the G-Dantzig selector is robust to the heavy-tailedness of data and the possible presence of outliers.

Table 1. Comparison of estimation errors of three methods under different setups. The standard deviations are shown in parentheses. Here

L_{1}

,

L_{\infty}

,

L_{F}

are the

L_{1}

,

L_{\infty}

and Frobenius matrix norms respectively.

Figure 1 plots the prediction errors against sparsity for the three transition matrix estimators. We observe that G-Dantzig selector achieves smaller prediction errors compared to Lasso and Dantzig selector.

Figure 1. Prediction errors in stock prices plotted against the sparsity of the estimated transition matrix.

5.2. Real Data

We further compared the three methods on a real equity data from Yahoo Finance. We collected the 10-minutes time intervals price in Monday of each week for 50 stocks with the highest volatility that were consistently in the

S & P

500 index from 1 January 2008 to 31 December 2020. Then, the periodical time series length was

n = 672

, the period length was

T = 36

and the dimension was

d = 50

. Since we chose the data points on the Monday of each week, the data points can be seen as independent of each other. Estimations of the transition matrix

{\hat{A}}_{s}

were obtained by the Lasso, Dantzig selector and G-Dantzig selector, where

s \in [0, 1]

is the fraction of non-zero entries of

{\hat{A}}_{s}

and can be controled by the tuning parameters

λ

and

ρ

. We define the prediction error associated with

{\hat{A}}_{s}

to be

ϵ_{s} : = \frac{1}{n} \sum_{k = 1}^{n} (\frac{1}{T - 1} \sum_{t = 2}^{T} {|| X_{t k} - {\hat{A}}_{s} X_{(t - 1) k} ||}_{2}) .

6. Conclusions

In this paper, we developed a Gini–Yule–Walker equation for modeling and estimating the heavy-tailedness data and the possible presence of outliers in high dimensions. Our contributions are three-fold. (i) At the model level, we generalized the Gaussian process to time series with the existence of merely second order moments. (ii) Methodologically, we proposed a Gini–Yule–Walker-based estimator of the transition matrix. Experimental results demonstrate that the proposed estimator is robust to heavy-tailedness of data and the possible presence of outliers. (iii) Theoretically, we proved that the adopted method yields a parametric convergence rate in the matrix

l_{\infty} and l_{1}

norm. In this manuscript, we focused on the stationary vector autoregressive model and our method is designed for such stationary process. The stationary requirement is a common assumption in analysis and is adopted by most recent works, for example, see and [14,24]. We notice that there are works in handling time-varying PVAR models, checking for example in [25]. We would like to explore this problem in the future.

Author Contributions

Conceptualization, J.Z. and D.H.; methodology, D.H. and J.Z.; software, J.Z.; validation, J.Z.; formal analysis, J.Z.; investigation, J.Z.; resources, J.Z.; data curation, J.Z.; writing—original draft preparation, J.Z.; writing—review and editing, J.Z.; visualization, J.Z.; supervision, J.Z. and D.H.; project administration, J.Z.; funding acquisition, D.H. All authors have read and agreed to the published version of the manuscript.

Funding

This work is supported by National Natural Science Foundations of China [grant number 11531001].

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

Franses, P.H.; Paap, R. Periodic Time Series Models; Oxford U Press: Oxford, UK, 2004. [Google Scholar]
Bell, W.R.; Holan, S.H.; McElroy, T.S. Economic Time Series: Modeling and Seasonality; Chapman and Hall/CRC Press: Boca Raton, FL, USA, 2012. [Google Scholar]
Aliat, B.; Hamdi, F. On Markov-switching periodic ARMA models. Commun. Stat. Theory Methods. 2018, 47, 344–364. [Google Scholar] [CrossRef]
Lütkepohl, H. New Introduction to Multiple Time Series Analysis; Springer Science & Business Media: Berlin, Germany, 2005. [Google Scholar]
Baek, C.; Davis, R.A.; Pipiras, V. Sparse seasonal and periodic vector autoregressive modeling. Comput. Stat. Data Anal. 2017, 106, 103–126. [Google Scholar] [CrossRef]
Gao, W.; Yang, H.; Yang, L. Change points detection and parameter estimation for multivariate time series. Soft Comput. 2020, 24, 6395–6407. [Google Scholar] [CrossRef]
Basu, S.; Michailidis, G. Regularized estimation in sparse high-dimensional time series models. Ann. Stat. 2015, 43, 1535–1567. [Google Scholar] [CrossRef]
Bai, P.; Safikhani, A.; Michailidis, G. Multiple Change Points Detection in Low Rank and Sparse High Dimensional Vector Autoregressive Models. IEEE Trans. Signal Process. 2020, 68, 3074–3089. [Google Scholar] [CrossRef]
Han, F.; Liu, H. Transition Matrix Estimation in High Dimensional Vector Autoregressive Models. In Proceedings of the International Conference on Machine Learning, Atlanta, GA, USA, 16–21 June 2013; pp. 172–180. [Google Scholar]
Hong, D.; Gu, Q.; Whitehouse, K. High-dimensional time series clustering via cross-predictability. In Proceedings of the 20th International Conference on Artificial Intelligence and Statistics, Fort Lauderdale, FL, USA, 20–22 April 2017; pp. 642–651. [Google Scholar]
Chen, X.; Xu, M.; Wu, W.B. Regularized estimation of linear functionals of precision matrices for high-dimensional time series. IEEE Trans. Signal Process. 2016, 64, 6459–6470. [Google Scholar] [CrossRef]
Han, F.; Lu, H.; Liu, H. A Direct Estimation of High Dimensional Stationary Vector Autoregressions. J. Mach. Learn. Res. 2015, 16, 3115–3150. [Google Scholar]
Qiu, H.; Xu, S.; Han, F.; Liu, H.; Caffo, B. Robust estimation of transition matrices in high dimensional heavy-tailed vector autoregressive processes. In Proceedings of the International Conference on Machine Learning. International Conference on Machine Learning, Lille, France, 7–9 July 2015; pp. 1843–1851. [Google Scholar]
Wong, K.C.; Li, Z.; Tewari, A. Lasso guarantees for mixing heavy-tailed time series. Ann. Stat. 2020, 48, 1124–1142. [Google Scholar] [CrossRef]
Maleki, M.; Wraith, D.; Mahmoudi, M.R.; Contreras-Reyes, J.E. Asymmetric heavy-tailed vector auto-regressive processes with application to financial data. J. Stat. Comput. Simul. 2020, 90, 324–340. [Google Scholar] [CrossRef]
Schezhtman, E.; Yitzhaki, S. A measure of association based on gin’s mean difference. Commun. Stat. Theory Methods 1987, 16, 207–231. [Google Scholar] [CrossRef]
Schechtman, E.; Yitzhaki, S. On the proper bounds of the Gini correlation. Econ. Lett. 1999, 63, 133–138. [Google Scholar] [CrossRef]
Schechtman, E.; Yitzhaki, S. A family of correlation coefficients based on the extended Gini index. J. Econ. Inequal. 2003, 1, 129–146. [Google Scholar] [CrossRef]
Yitzhaki, S.; Schechtman, E. The Gini Methodology: A Primer on a Statistical Methodology; Springer Science & Business Media: Berlin, Germany, 2012. [Google Scholar]
Yang, S.S. Linear functions of concomitants of order statistics with application to nonparametric estimation of a regression function. J. Am. Stat. Assoc. 1981, 76, 658–662. [Google Scholar] [CrossRef]
Pang, H.; Liu, H.; Vanderbei, R. The fastclime package for linear programming and large-scale precision matrix estimation in R. J. Mach. Learn. Res. JMLR 2014, 15, 89–493. [Google Scholar]
Granger, C.W. Testing for causality: A personal viewpoint. J. Econ. Dyn. Control. 1980, 2, 329–352. [Google Scholar] [CrossRef]
Fang, K.W. Symmetric Multivariate and Related Distributions; CRC Press: Boca Raton, FL, USA, 2018. [Google Scholar]
Song, S.; Bickel, P.J. Large vector auto regressions. arXiv 2011, arXiv:1106.3915. [Google Scholar]
Haslbeck, J.M.; Bringmann, L.F.; Waldorp, L.J. A Tutorial on Estimating Time-Varying Vector Autoregressive Models. Multivar. Behav. Res. 2020, 4, 1–30. [Google Scholar] [CrossRef] [PubMed]

Figure 1. Prediction errors in stock prices plotted against the sparsity of the estimated transition matrix.

Table 1. Comparison of estimation errors of three methods under different setups. The standard deviations are shown in parentheses. Here

L_{1}

,

L_{\infty}

,

L_{F}

are the

L_{1}

,

L_{\infty}

and Frobenius matrix norms respectively.

Table 1. Comparison of estimation errors of three methods under different setups. The standard deviations are shown in parentheses. Here

L_{1}

,

L_{\infty}

,

L_{F}

are the

L_{1}

,

L_{\infty}

and Frobenius matrix norms respectively.

		Lasso			Dantzig			Our Method
		$L_{1}$	$L_{\infty}$	$L_{F}$	$L_{1}$	$L_{\infty}$	$L_{F}$	$L_{1}$	$L_{\infty}$	$L_{F}$
Model 1	$c l u s t e r$	1.13	1.62	1.67	1.03	1.55	1.58	1.23	1.74	1.79
		(0.17)	(0.16)	(0.55)	(0.16)	(0.07)	(0.03)	(0.27)	(0.37)	(0.43)
	$h u b$	1.37	2.16	2.39	1.18	1.25	1.36	1.09	1.17	1.28
		(0.34)	(0.45)	(0.22)	(0.17)	(0.07)	(0.08)	(0.46)	(0.28)	(0.09)
	$r a n d o m$	1.41	2.58	2.18	0.97	1.08	1.75	1.46	1.17	1.95
		(0.43)	(0.67)	(0.17)	(0.36)	(0.73)	(0.18)	(0.13)	(0.25)	(0.15)
Model 2	$c l u s t e r$	1.25	1.76	1.79	1.13	1.64	1.77	0.95	0.96	1.18
		(0.64)	(0.45)	(0.37)	(0.21)	(0.36)	(0.27)	(0.38)	(0.46)	(0.47)
	$h u b$	1.51	2.76	1.58	1.58	1.64	1.78	1.18	1.59	1.09
		(0.41)	(0.56)	(0.14)	(0.19)	(0.43)	(0.54)	(0.85)	(1.12)	(0.93)
	$r a n d o m$	1.36	2.24	2.17	1.24	1.45	1.85	1.13	1.06	1.07
		(0.32)	(0.17)	(0.17)	(0.29)	(0.15)	(0.09)	(0.22)	(0.19)	(0.27)
Model 3	$c l u s t e r$	1.54	2.13	1.96	1.12	1.32	1.71	1.05	1.29	1.59
		(0.29)	(0.13)	(0.18)	(0.25)	(0.46)	(0.52)	(0.17)	(0.19)	(0.19)
	$h u b$	1.52	2.67	1.41	1.45	1.91	1.45	1.11	1.29	1.31
		(0.26)	(0.18)	(0.16)	(0.07)	(0.09)	(0.12)	(0.24)	(0.06)	(0.09)
	$r a n d o m$	1.78	2.56	2.21	1.66	1.34	2.15	1.03	1.23	1.76
		(0.22)	(0.26)	(0.54)	(0.18)	(0.09)	(0.12)	(0.11)	(0.18)	(0.06)

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Yule–Walker Equations Using a Gini Covariance Matrix for the High-Dimensional Heavy-Tailed PVAR Model

Abstract

1. Introduction

2. Model

2.1. Notation

2.2. Gini–Yule–Walker Equation

2.3. Sample Gini Covariance Matrix

2.4. Convergence Rates of the Estimator $U ({\tilde{X}}_{k}, {\tilde{Y}}_{k})$ and $U ({\tilde{Y}}_{k}, {\tilde{Y}}_{k})$

3. Theoretical Properties

4. Granger Causality

5. Experiments

5.1. Synthetic Data

5.2. Real Data

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Article Metrics

Citations

Article Access Statistics

Yule–Walker Equations Using a Gini Covariance Matrix for the High-Dimensional Heavy-Tailed PVAR Model

Abstract

1. Introduction

2. Model

2.1. Notation

2.2. Gini–Yule–Walker Equation

2.3. Sample Gini Covariance Matrix

2.4. Convergence Rates of the Estimator U ( X ˜ k , Y ˜ k ) and U ( Y ˜ k , Y ˜ k )

3. Theoretical Properties

4. Granger Causality

5. Experiments

5.1. Synthetic Data

5.2. Real Data

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Article Metrics

Citations

Article Access Statistics

2.4. Convergence Rates of the Estimator $U ({\tilde{X}}_{k}, {\tilde{Y}}_{k})$ and $U ({\tilde{Y}}_{k}, {\tilde{Y}}_{k})$