Yule–Walker Equations Using a Gini Covariance Matrix for the High-Dimensional Heavy-Tailed PVAR Model

: Gini covariance plays a vital role in analyzing the relationship between random variables with heavy-tailed distributions. In this papaer, with the existence of a ﬁnite second moment, we establish the Gini–Yule–Walker equation to estimate the transition matrix of high-dimensional periodic vector autoregressive (PVAR) processes, the asymptotic results of estimators have been established. We apply this method to study the Granger causality of the heavy-tailed PVAR process, and the results show that the robust transfer matrix estimation induces sign consistency in the value of Granger causality. Effectiveness of the proposed method is veriﬁed by both synthetic and real data.


Introduction
The heavy-tailed distribution is graphically thicker in the tails and sharper in the peaks than the normal distribution. Intuitively, this means that the probability of extreme values is greater for these data than for normally distributed data. The heavy-tailed phenomenon has been encountered empirically in various fields: physics, earth sciences, economics and political science, etc. Periodicity is a wave-like or oscillatory movement around a long-term trend presented in a time series. It is well-known that cyclicality caused by business and economic activities is usually different from trend movements, not in a single direction of continuous movement, but in alternating fluctuations between ups and downs. When these components, trend and cyclicality, do not evolve independently, traditional differencing filters may not be suitable (see for example, Franses and Paap [1], Bell et al. [2], Aliat and Hamdi [3]). Periodic time series models are used to model periodically stationary time series. Periodic vector autoregressive (PVAR) models extend the classical vector autoregressive (VAR) models by allowing the parameters to vary with the cyclicality.
For fixed v and predetermined value T, the random vector X iT+v represents the realization during the vth season, with v ∈ {1, . . . , T}, at year i, i ∈ {1, . . . , n}. The PVAR model order at season v is given by q(v), whereas α k (v), k = 1, · · · , v, represent the PVAR model coefficients during season v. The error process = { iT+v } corresponds to a periodic white noise, with E( iT+v ) = 0 and var( iT+v ) = σ 2 (v)I, where 0 is a zero vector of order d and I is a unit matrix of order d.
This paper seeks to establish a PVAR model to simulate the heavy-tailed time series. Let random vectors X 1 , ..., X n be from a stationary stochastic process (X k ) ∞ k=−∞ , X k = (X kT+1 , . . . , X kT+T ) , where the transpose (indicated by ) of a row vector is a column vector.
Gcov(X, Y) = cov(X, F 2 (Y)) and Gcov(Y, X) = cov(Y, F 1 (X)), assuming the random variables with only a finite first moment. The Gini covariance has an advantage when analyzing bivariate data defined by both variate values and ranks of the values. The representation of Gini covariance Gcov(X, Y) indicates that it has mixed properties of the variable X and the rank of the variable Y, and thus complements the usual covariance and rank covariance [16][17][18]. In terms of balance between effciency and robustness, Gini covariance plays an important role in measuring association for variables from heavy-tailed distributions [19]. The Yule-Walker equations arise naturally in the problem of linear prediction of any zero-mean weakly stationary process based on a finite number of contiguous observations. The Yule-Walker equations provide a straightforward connection between the autoregressive model parameters and the covariance function of the process. In this paper, relaxing the strong assumption of the existence of higher order moments of the regressors, we use a non-parametric method to estimate the Gini covariance matrix, establish the Gini-Yule-Walker equation to estimate the sparse transition matrix of stationary PVAR processes. The estimator falls into the category of Dantzig-selector-type estimators. With existence of a finite second moment, we investigate the asymptotic behavior of the estimator in high dimensions.
The paper is orginazed as follows: In Section 2, we establish the Gini-Yule-Walker equation and estimate the sample Gini covariance matrix. In Section 3, we derive the convergence rate of transfer matrix estimation. In Section 4, we discuss the characterization and estimation of Granger causality under the heavy-tailed PVAR model. In Section 5, both synthetic and real data are used to demonstrate the empirical performance of the proposed methodology.

Model
In this section, the notations are set. Then, we establish the Gini-Yule-Walker equation, obtain simple non-parametric estimators for Gini covariance matrix and investigate the convergence rate of the sample Gini covariance matrix.

Gini-Yule-Walker Equation
In this paper, we model the time series vector by a stationary PVAR process under the existence of second moment. For each k, 1 ≤ k ≤ n, X 1k , ..., X Tk ∈ R d follow a lag-one VAR process, with E tk independent of X (t−1)k , and E(X tk ) = 0, E(E tk ) = 0.
We define X k = (X 1k , · · ·, X (T−1)k ) , this VAR process may be performed by concatenating the equation systems to analyze the following equation, where Y k = X k ⊗ I d , and , the ⊗ is a Kronecker product operator. Since the matrix Y k is a block diagonal matrix, the estimation problem can be decomposed into d independent sub-problems. Let us consider the i-th equation of the system.
, the above equation system can be considered as a multiple regression 3, · · · , T}, this equation system can be abbreviated as with T − 1 samples to estimate the i-th line of transition matrix A. Let F(x ik ) be the distribution of x ik , we assume the independence between˜ ik and F(x ik ), for i = 1, 2, ..., d. Then we get the Gini covariance matrix equation issue from Equation (5), From the above equation, we obtain the so called Gini-Yule-Walker Equation where G = 1 n ∑ n k=1 G k ,G = 1 n ∑ n k=1G k . The entries of G k are given by G ij k = Gcov(z jk , x ik ), and the entries ofG k are given byG ij k = Gcov(x jk , x ik ).

Sample Gini Covariance Matrix
We use a U−statistic method to estimate the Gini covariance matrix G k andG k . From Equation (6), the elements of the covariance matrix G k andG k can be divided into two categories: Gini covariance Gcov(X k ,Ỹ k ) and Gini mean difference Gcov The i-th ordered variable ofX k is expressed byX i:(T−1) k and the associated variable of Y k (matched withX i:(T−1) k ) is expressed byỸ [i:(T−1)] k , which is the concomitant of the i-th order statistic. In this set-up, in the context of non-parametric estimation of a regression function, Yang [20] proposed a statistic of the form where J(.) is a bounded smooth function, h(.) is a real valued function of (X k ,Ỹ k ), and F T−1 (.) is the empirical distribution function corresponding to FX k ,Ỹ k (.). The Gini covariance defined in Equation (3) can be rewritten as (7), we obtain an estimator of G(X k ,Ỹ k ) as For the Gini mean difference Gcov(Ỹ k ,Ỹ k ), we have sample space {Ỹ 1k ,Ỹ 2k , ...,Ỹ (T−1)k }. LetỸ 1k andỸ 2k be two independent random variables with distribution function FỸ k , the Gini mean difference Gcov(Ỹ k ,Ỹ k ) can be expressed as The estimator of Gcov(Ỹ k ,Ỹ k ) based on U-statistics is given by After some simplification, we obtain whereỸ i:(T−1) k is the i-th order statistic based on the sample space {Ỹ 1k , . . . ,Ỹ (T−1)k }.

Convergence Rates of the Estimator
In this subsection, with the truncation method, we use the Bernstein's inequality to investigate the convergence rates of the estimator U(X k ,Ỹ k ) and U(Ỹ k ,Ỹ k ). From Equations (8) and (9) For analysis, we require the following three assumptions on the time series and the size of variables (d, T, n): (2), suppose that the entries of the heavy-tailed innovation E kT+v = (E kT+v 1 , · · · , E kT+v d ) obey a power-law distribution with P(|E kT+v i | > x) ∼ Cx −α i and α i ≤ 2, then the second moment is finite and the third moment is infinite.

Assumption 1. From Equation
Lemma 1. Let {X k } k∈Z be a stationary PVAR process from Equation (2), and X 1 , · · · , X n be a sequence of observations from {X k } k∈Z . Suppose that Assumptions (A1)-(A2) are satisfied. Then, for T and n large enough, with probability no smaller than 1 − ( Proof. Assuming that m and λ are constants greater than 0, and Then,γ k is a bounded random variable and satisfies the property of independent identical distribution, it follows from the Bernstein's inequality that where As T → ∞ and n → ∞, assuming log n This completes the proof. From Equations (5) and (6), we define the sample estimation of Gini covariance matrix G asĜ = Ĝ ij ,Ĝ ij = 1 n ∑ n k=1 U(z jk , x ik ) and the sample estimation of Gini covariance matrixG asĜ = Ĝ ij ,Ĝ ij = 1 n ∑ n k=1 U(x jk , x ik ), 1 ≤ i, j ≤ d. Next, we investigate the convergence rates of the estimatorĜ andĜ under the l ∞ norm.

Lemma 2.
Let {X k } k∈Z be a stationary PVAR process from Equation (2), and X 1 , · · · , X n be a sequence of observations from {X k } k∈Z . Suppose that Assumptions (A1)-(A3) are satisfied. Then, for T and n large enough, with probability no smaller than 1 − ( 8∆ log n T + 2d 2 n 2 ), we have Proof. Let δ = d 8T log n n , based on the Lemma 1, we have As T → ∞ and n → ∞, assuming log n T → 0, dT √ n ≤ c and d n → 0, then 8∆ log n T + 2d 2 n 2 → 0. Next, we study the convergence of sample Gini covariance matrixĜ.
Let δ = d 8T log n n , based on the Lemma 1, Now, for the off-diagonal entries, we have Combining Equations (15) and (16), we obtain This completes the proof.

Theoretical Properties
Using the Gini covariance matrix Equation (6), we propose to estimate A bŷ The above optimization problem can be further decomposed into d subproblems. The i-th row of transition matrix A bŷ Compared to the lasso-type procedures, the proposed method can be solved in parallel and Equation (17) can be solved efficiency using the parametric simplex method for sparse learning in [21].
Based on Lemma 2, we can further deliver the rates of convergence ofÂ under the matrix l ∞ norm and l 1 norm. We start with some additional notation. For α ∈ [0, 1), η > 0, and M d > 0 that may scale with d, we define the matrix class M(α, η, M d ) requires the transition matrices are sparse in rows. If α = 0, then the maximum number of nonzeros in rows of transition matrice is at most η. M(α, η, M d ) is also investigated in [12].
Proof. We first show that with large probability, A is feasible to the optimization problem. By the Gini-Yule-Walker equation, we have The last inequality is due to A ∈ M(α, η, M d ), by Lemma 2, with probability no small than 1 − ( A is feasible in the optimization equation, by checking the Equation (17), we have Â ≤ A with probability no smaller than 1 − ( 8∆ log n T + 2d 2 n 2 ).

Granger Causality
In this section, the practical example is conducted to verify the effectiveness of the proposed methods, moreover, the characterization and estimation of Granger causality under the heavy-tailed PVAR model are discussed. Firstly, we give the definition of Granger causality. Definition 1. (Granger [22]) Let {X t } ∈ Z be a stationary process, where X t = (X t1 , . . . , X td ) T . For j = k ∈ {1, . . . , d}, {X tk } t∈Z Granger causes X tj t∈Z if and only if there exists a measurable set A such that for all t ∈ Z, where X s,\k is the subvector obtained by removing X sk from X s For a Gaussian VAR process {X t } t∈Z , we have that {X tk } t∈Z Granger causes X tj t∈Z if and only if the (j, k) entry of the transition matrix is non-zero [4]. For the heavy-tailed PVAR process, let {X k } k∈Z be a stationary PVAR process from Equation (2), we defině In the next theorem, we show that a similar property holds for the heavy-tailed PVAR process.
If A ji = 0, then X ti t∈Z Granger causes X tj t∈Z .

2.
If we further assume thatĚ t+1 is independent of X s s≤t for any t ∈ Z, we have that X ti t∈Z Granger causes X tj t∈Z if and only if A ji = 0.
Proof. In order to prove Issue 1, we only need to prove that X ti t∈Z doesn't Granger cause X tj t∈Z implies A ji = 0. Suppose for some t ∈ Z, we have for any measurable set A. The above equation implies that conditioning on X s,\i s t , X (t+1)j is independent of X sj s t . Hence, we have Gcov X (t+1)j ,X ti | X s,\i s t = 0.
The second term on the right hand side is 0, since given X si s t , ∑ i =j A jiXti is constant. SinceĚ t k and X s s<t are independent for any t ∈ Z, we have Gcov Ě (t+1)j ,X si = 0 for any s t. Using Theorem 2.18 in Fang et al. [23], we have the third term is also 0. Thus, we have Gcov X ti ,X ti | X s,\i s t = 0 and hence A ji = 0. This proves Issue 1.
Given Issue 1, to prove Issue 2, it remains to prove that A ji = 0 implies that X ti t∈Z doesn't Granger cause X tj t∈Z . Since A ji = 0, we have p X (t+1)j , X si s t | X s,\i s t =p X (t+1)j | X s,\i s t p X si s t | X s,\i s t .
Here p is the conditional probability density function. The last equation is because E t+1 k is independent of X s s t , and the fact that ∑ i =k A jiXti is constant given X s \i s≤t . Hence, we have and thus p X (t+1)j | X s s t = p X (t+1)j | X s,\i s t .

Remark 1.
The assumption that Gcov X ti ,X s\i |X s,\i s≤t = 0 requires thatX ti cannot be perfectly predictable from the past or from the other observed random variables at time t. Otherwise, we can simply remove X ti t∈Z from the process X t t∈Z , since predicting X ti t∈Z is trivial. Assuming thatĚ t+1 is independent of X s s≤t for any t ∈ Z, the Granger causality relations among the processes X jt t∈Z : j = 1, . . .
then, with probability no smaller than 1 − ( 8∆ log n T + 2d 2 n 2 ), we have sign(Ã) = sign(A), provided that min {(j,i):Aji>0} Proof. The proof is a consequence of Theorem 1. In detail, if A ji > 0, by Equation (25), we have A ji 2γ. By Theorem 1, with probability no smaller than 1 − ( γ. Thus, we have A ji γ with probability no smaller than 1 − ( 8∆ log n T + 2d 2 n 2 ). By the definition of A, we have A ji = A ji γ > 0.
If A ji < 0, by Equation (25), we have A ji −2γ. Using Theorem 1, we have A ji −γ with probability no smaller than 1 − ( 8∆ log n T + 2d 2 n 2 ), By the definition of A, we have A ji = A ji −γ < 0.
If A ji = 0, using Theorem 1, we have A ji < γ with probability no smaller than 1 − ( 8∆ log n T + 2d 2 n 2 ), since P A ji = γ = 0. By the definition of A, we have A ji = 0.

Experiments
This section provides some numerical results on synthetic and real data. We consider Lasso and Dantzig selector for comparison.

•
Dantzig selector: the estimator proposed in [12], S k and S 1k be the marginal and lag one sample covariance matrices of Equation (2). • G-Dantzig selector: the estimator described in Equation (17).

Synthetic Data
In this subsection, we compare the performance of our method with the lasso and Dantzig under synthetic data. The "flare" package in R is used to create the transition matrix according to three patterns: cluster, hub, random. We rescale A such that A 2 = 0.8, generate Σ such that Σ 2 = 2 A 2 , then calculate the covariance matrix Ψ of the noise vector E t as Ψ = Σ − A T ΣA. Let the periodical time series length n = 2000, the period length T = 100 and the dimension d = 150, with A, Σ and Ψ, we simulate periodical time series according to the model described in Equation (2). Specifically, we consider three models: • Model 1: Data generated from Equation (2), where the errors E t ∼ N(0, Ψ). There is no outliers under Model 1. • Model 2: Data generated from Equation (2), where the errors E t ∼ t(0, Ψ, 2). No outliers appear and the tail of the error distribution is heavier than that of the normal distribution. • Model 3: Data generated from Equation (2) with the same error distribution as Model 1. Then replace X i by X i + (−1) I(U 1 <1/2) (20 + 10U 2 )I for i = 1, ..., T/5, where U 1 and U 2 are independently generated from U[0, 1] distribution, there are T/5 outliers that deviate from the majority of the observations.
The tuning parameter λ is chosen by cross validation. We construct 20,000 replicates and compare the three methods described above. Table 1 presents averaged estimation errors under three matrix norms. From this table, we have the following findings: Under the Gaussian model (Model 1), G-Dantzig selector has comparable performance as Dantzig selector and out performs Lasso. Under Model 2 and 3, our methods are more stable than Lasso and Dantzig selector. Thus, we conclude that the G-Dantzig selector is robust to the heavy-tailedness of data and the possible presence of outliers. Figure 1 plots the prediction errors against sparsity for the three transition matrix estimators. We observe that G-Dantzig selector achieves smaller prediction errors compared to Lasso and Dantzig selector.

Real Data
We further compared the three methods on a real equity data from Yahoo Finance. We collected the 10-minutes time intervals price in Monday of each week for 50 stocks with the highest volatility that were consistently in the S&P 500 index from 1 January 2008 to 31 December 2020. Then, the periodical time series length was n = 672, the period length was T = 36 and the dimension was d = 50. Since we chose the data points on the Monday of each week, the data points can be seen as independent of each other. Estimations of the transition matrixÂ s were obtained by the Lasso, Dantzig selector and G-Dantzig selector, where s ∈ [0, 1] is the fraction of non-zero entries ofÂ s and can be controled by the tuning parameters λ and ρ. We define the prediction error associated withÂ s to be s := 1 n n ∑ k=1 1 T − 1 T ∑ t=2 X tk −Â s X (t−1)k 2 .

Conclusions
In this paper, we developed a Gini-Yule-Walker equation for modeling and estimating the heavy-tailedness data and the possible presence of outliers in high dimensions. Our contributions are three-fold. (i) At the model level, we generalized the Gaussian process to time series with the existence of merely second order moments. (ii) Methodologically, we proposed a Gini-Yule-Walker-based estimator of the transition matrix. Experimental results demonstrate that the proposed estimator is robust to heavy-tailedness of data and the possible presence of outliers. (iii) Theoretically, we proved that the adopted method yields a parametric convergence rate in the matrix l ∞ and l 1 norm. In this manuscript, we focused on the stationary vector autoregressive model and our method is designed for such stationary process. The stationary requirement is a common assumption in analysis and is adopted by most recent works, for example, see and [14,24]. We notice that there are works in handling time-varying PVAR models, checking for example in [25]. We would like to explore this problem in the future.