Next Article in Journal
Robust Portfolio Optimization with Environmental, Social, and Corporate Governance Preference
Next Article in Special Issue
The Regime-Switching Structural Default Risk Model
Previous Article in Journal
Features of the Association between Debt and Earnings Quality for Small and Medium-Sized Entities
Previous Article in Special Issue
Invariance of the Mathematical Expectation of a Random Quantity and Its Consequences
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

L1 Regularization for High-Dimensional Multivariate GARCH Models

1
Department of Biostatistics and Bioinformatics, H. Lee Moffitt Cancer Center & Research Institute, Tampa, FL 33612, USA
2
School of Statistics, University of Minnesota, Minneapolis, MN 55455, USA
3
Department of Applied Mathematics and Statistics, State University of New York at Stony Brook, Stony Brook, NY 11733, USA
*
Author to whom correspondence should be addressed.
Risks 2024, 12(2), 34; https://doi.org/10.3390/risks12020034
Submission received: 23 December 2023 / Revised: 27 January 2024 / Accepted: 31 January 2024 / Published: 4 February 2024
(This article belongs to the Special Issue Risks Journal: A Decade of Advancing Knowledge and Shaping the Future)

Abstract

:
The complexity of estimating multivariate GARCH models increases significantly with the increase in the number of asset series. To address this issue, we propose a general regularization framework for high-dimensional GARCH models with BEKK representations, and obtain a penalized quasi-maximum likelihood (PQML) estimator. Under some regularity conditions, we establish some theoretical properties, such as the sparsity and the consistency, of the PQML estimator for the BEKK representations. We then carry out simulation studies to show the performance of the proposed inference framework and the procedure for selecting tuning parameters. In addition, we apply the proposed framework to analyze volatility spillover and portfolio optimization problems, using daily prices of 18 U.S. stocks from January 2016 to January 2018, and show that the proposed framework outperforms some benchmark models.

1. Introduction

Modeling the dynamics of high-dimensional variance–covariance matrices is a challenging problem in high-dimensional time series analysis and has wide applications in financial econometrics. Classical time series models for variance–covariance matrices assume that the number of component time series is low with respect to the number of observed samples. However, many financial and economic applications these days need to model the dynamics of high-dimensional variance–covariance matrices. For example, in modern portfolio management, the number of assets can easily be more than thousands and be larger or on the same order as the observed historical prices of the assets; in analyzing the movements in the financial markets of different products in different countries, it is critical to understand the interdependence and contagion effects of price movements over thousands of markets, while the amounts of jointly observed financial data are only available in decades.
In this paper, we propose an inference procedure with L 1 regularization for high-dimensional BEKK representations and obtain a class of penalized quasi-maximum likelihood (PQML) estimators. The L 1 regularization allows us to identify important parameters and shrink the non-essential ones to zero, hence providing an estimate of sparse parameters in BEKK representations. Under some regularity conditions, we establish some theoretical properties, such as the sparsity and the consistency, of the PQML estimator for BEKK representations. The proposed procedure is a fairly general framework that can be applied to a large class of high-dimensional MGARCH models; by applying our regularization techniques, the complexity of making inferences from high-dimensional MGARCH models can be greatly reduced and the intrinsic sparse model structures can be uncovered. We carried out simulation studies to show the performance of the proposed inference framework and the procedure for selecting tuning parameters. In addition, we applied the proposed framework to analyze volatility spillover and portfolio optimization problems, using daily prices of 18 U.S. stocks from January 2016 to January 2018. In the comparison of portfolio optimization based on different MGARCH models, we show that the proposed framework outperforms three benchmark models, i.e., the constant covariance model, the factor MGARCH model, and the dynamic conditional correlation model.
The proposed framework can be viewed as an extension of the literature on regularization techniques for converting high-dimensional linear models to nonlinear time series models. Since Tibshirani (1996) introduced LASSO for linear regression models, various regularization techniques concerning high-dimensional statistical inference have been studied for various problems in linear models. For example, Fan and Li (2001) proposed the smoothly clipped absolute deviation (SCAD) penalty that generates sparse estimation of regression coefficients with reduced bias and explored the so-called “oracle property”, in which the estimator has asymptotic properties that are equivalent to the maximum likelihood estimator in the non-penalized model. Zou (2006) proposed adaptive LASSO by adding adaptive weights for different parameters in the L 1 penalty to obtain better estimator performance. Yuan and Lin (2006) proposed a group LASSO penalty to solve the problem of selecting grouped factors in regression models. Zhang (2010) proposed a minimax concave penalty that gives nearly unbiased variable selection in linear regression. In addition to discussions on regularized estimation in high-dimensional statistics, which relies primarily on independent and identically distributed (i.i.d.) samples and linear models, regularization techniques have also been applied to study inference problems in high-dimensional linear time-series models. For instance, Uematsu (2015) studied a class of penalty functions and showed the oracle properties for the estimators in high-dimensional vector autoregressive (VAR) models. Basu and Michailidis (2015) investigated the theoretical properties of L 1 -regularized estimates in high-dimensional stochastic regressions with serially correlated errors and transition matrix estimation in high-dimensional VAR models. Sun and Lin (2011) developed a regularization framework for full-factor MGARCH models (Vrontos et al. 2003), in which the dynamics of covariance matrices are determined by the dynamics of univariate GARCH processes for orthogonal factors. Using the group LASSO technique, Poignard (2017) studied the inference problem for MGARCH models with vine structure, an alternative to dynamic conditional correlation MGARCH models.
The proposed regularization framework is also related to the problem of estimating p × p covariance matrices using various shrinkage and regularization methods. For instance, Ledoit and Wolf (2004) proposed an optimal linear shrinkage method to estimate constant covariance matrices of p-dimensional i.i.d. vectors, and, later on, Ledoit and Wolf (2012) extended the method and developed nonlinear shrinkage estimators for high-dimensional covariance matrices. Bickel and Levina (2008) and Cai and Liu (2011) proposed covariance regularization procedures that are based on the thresholding of sample covariance matrices to estimate inverse covariance matrices. Lam and Fan (2009) studied sparsistency and rates of convergence for estimating covariance based on penalized likelihood with nonconcave penalties, and Ravikumar et al. (2011) estimated high-dimensional inverse covariance by minimizing L 1 -penalized log-determinant divergence. This method is also called graphical LASSO and was studied in Yuan and Lin (2006) and Friedman et al. (2007). We note that all these discussions focus on high-dimensional constant covariance matrices; thus, they do not involve the dynamics of covariance matrices.
The remainder of the paper is organized as follows. Section 2 provides a literature review of MGARCH models and their applications in volatility spillover. Section 3 explains the BEKK model with L 1 -penalty functions in detail. In Section 4, we provide theoretical properties and implementation procedures for the regularized BEKK model. Simulation results and real data analysis are presented in Section 5 and Section 6, respectively. Section 7 gives concluding remarks.

2. Literature Review

Inspired by the idea of univariate generalized autoregressive conditionally heteroskedastic (GARCH) models Bollerslev (1986); Engle (1982); Francq and Zakoian (2019); Hafner et al. (2022), various multivariate GARCH (MGARCH) models were proposed to characterize the dynamics of covariance matrices during the last three decades. Among these MGARCH models, the Baba–Engle–Kraft–Kroner (BEKK) model (Engle and Kroner 1995) uses a general specification to describe the dynamics of covariance matrices of an n-dimensional multivariate time series. Since such a specification contains unknown parameters of order O ( n 2 ) , inference on the BEKK model becomes complicated, even for not very large ns. When n 2 increases with the same order as, or larger order than, the length of the time series, inference on the MGARCH–BEKK representation becomes even more difficult due to “the curse of dimensionality”.
To reduce the complexity of inference procedures for unknown parameters in MGARCH models, other forms of MGARCH specifications were proposed to reduce the number of unknown parameters in the model. An important improvement to MGARCH models is the dynamic conditional correlation (DCC) model (Aielli 2013; Bauwens and Laurent 2005; Boudt et al. 2013; Engle 2002). The DCC model allows for time-varying conditional correlations and reduces the dimensionality by factorizing the conditional covariance matrix into the product of a diagonal matrix of conditional standard deviations and a correlation matrix that evolves dynamically over time. Other forms of MGARCH specifications make more assumptions on structures and dynamics of covariance matrices and include, for example, the MGARCH in mean model (Bollerslev et al. 1988), the constant conditional correlation GARCH model (Bollerslev 1990; Ling and McAleer 2003; McAleer et al. 2009), the time-varying conditional correlation MGARCH model (Tse and Tsui 2002), the orthogonal factor MGARCH model (Hafner and Preminger 2009; Lanne and Saikkonen 2007), and so on. Although these MGARCH models provide relatively simple inference procedures, the assumptions on dynamics of covariance matrices are usually too specific to capture the complexity of dynamics of covariance matrices. Furthermore, these models still fail to address the issue of making inference on high-dimensional MGARCH models.
In addition to modeling the joint behavior of volatilities for a set of returns, another aspect of MGARCH models is to characterize volatility spillover in financial markets. Volatility spillover refers to as the process and magnitude by which the instability in one market affects other markets. Volatility spillover is widely observed in equity markets (Hamao et al. 1990), bond markets (Christiansen 2007), futures markets (Pan and Hsueh 1998), exchange markets (Baillie and Bollerslev 1990), markets of equities and exchanges (Apergis and Rezitis 2001), various industries and commodities (Apergis and Rezitis 2003; Kaltenhäuser 2002), and so on. Understanding volatility spillover can provide an insight into financial vulnerabilities, as well as the source and nature of financial exposures, for academic researchers, financial practitioners, and regulatory authorities. For investors, as significant volatility spillover may increase non-systemic risk, understanding volatility spillover can help them diversify the risks associated with their investment. For financial sector regulators, understanding volatility spillover can help them formulate appropriate policies to maintain financial stability, especially when stress from a particular market is transmitted to other markets, such that the risk of systemic instability increases. MGARCH models are generally used to characterize volatility spillover in the markets, which are represented via a low-dimensional multivariate series; see Hamao et al. (1990), Christiansen (2007); Pan and Hsueh (1998), Engle et al. (1990), and Baillie and Bollerslev (1990). In particular, Theodossiou and Lee (1993) used multivariate GARCH-in-mean model to study the economic spillover effect across five countries, Worthington and Higgs (2004) applied a BEKK(1,1) model to study transmission of weekly equity returns and volatility in nine Asian countries from 1988 to 2000, and Hassan and Malik (2007) employed the BEKK(1,1) specification to study three-dimensional US sector indices. Spillover effect has also been explored recently for other financial markets, such as cryptocurrency markets (Billio et al. 2023) and European banks with GARCH models (Giacometti et al. 2023). Additionally, there has been an investigation into the spillover effects using network representations derived from GARCH models in recent studies (Ampountolas 2022; Hong et al. 2023).
The aforementioned studies on spillover effects rely on the foundational structures of the DCC model for analysis (Ampountolas 2022; Shiferaw 2019; Siddiqui and Khan 2018). Although these MGARCH models provide relatively simple inference procedures, the assumptions on dynamics of covariance matrices are usually too specific to capture the complexity of the dynamics of covariance matrices. Moreover, these models still fail to address the issue of making inference on high-dimensional MGARCH models. Under these constraints, the performance and accuracy of these simplified MGARCH models need further investigation in real markets (Engle and Colacito 2006).

3. The MGARCH–BEKK Representations with L 1 Regularization

We first introduce the following notations. Given a vector x and a matrix A, the ith component of x and the i j th elements of A are written as x i and A i j , respectively. The jth column and the ith row vectors of A are denoted as A . j and A i . , respectively. | | x | | is the Euclidean norm for vector x. | | x | | is the largest element of x in the modulus. ρ ( A ) is the spectral radius of A, i.e., the largest modulus of eigenvalues of A. λ min ( A ) and λ max ( A ) are the minimum and maximum eigenvalues of A, respectively. | | A | | is the spectral norm, i.e., a square root of ρ ( A T A ) . | | A | | represents the operator norm induced by | | x | | , or the largest absolute row sum. For any matrix A and vector x such that A x is well defined, let | | A | | 2 , : = max | | x | | = 1 | | A x | | . We use sign ( x ) to denote the sign of x : sign ( x ) = x / | x | if x 0 , and sign ( x ) = 0 otherwise.

3.1. The MGARCH–BEKK Representation

Let r t be the vector of returns on n assets in period t. Let ϵ t be i.i.d. n-dimensional standard normal random vectors. Let F t be the sigma field generated by the past information from r t s. Then, Σ t is measurable with respect to F t 1 ; the distribution of r t can be specified as
r t = Σ t 1 2 ϵ t , ϵ t N ( 0 , I n ) ,
where I n is an n × n identity matrix. Denote the conditional covariance matrix of r t given F t 1 as Σ t , i.e., Σ t = Cov ( r t | F t 1 ) . Engle and Kroner (1995) proposed the following BEKK ( a , b ) model to characterize the dynamics of Σ t :
Σ t = C C + k = 1 K i = 1 a A i k r t 1 r t 1 A i k + k = 1 K i = 1 b B i k Σ t 1 B i k ,
where A i k , and B i k are n × n parameter matrices, C is an n × n triangular matrix, and the summation limit K determines the generality of the process.
To illustrate the idea, we consider BEKK(1,1) in our examples with K = 1 in this paper, which can be written as
Σ t = C C + i = 1 a A i r t i r t i A i + i = 1 b B i Σ t i B i .
in which A i , B i , and C are real n × n matrices. And, without loss of generality, we choose Σ t 1 / 2 to be symmetric. For identification purposes, Engle and Kroner (1995) showed the following property for the BEKK model.
Proposition 1.
Suppose that the diagonal elements in C, a 11 , and b 11 are positive. Then, there exists no other C, A, or B in Model (3) that gives an equivalent representation.
Proposition 1 is also known as the identification condition (Comte and Lieberman 2003).
Let vec and vech be the vector operators that stack the columns of a matrix and the lower triangular part of a matrix, respectively. That is, if
Y = y 11 y 1 n y n 1 y n n ,
then
v e c ( Y ) = ( y 11 , , y n 1 , y 12 , , y n 2 , , y 1 n , , y n n ) ,
and
v e c h ( Y ) = ( y 11 , , y n 1 , y 22 , , y n 2 , , y i i , , y i n , y n n ) .
Then, Model (3) can be rewritten in a vector form:
v e c ( Σ t ) = v e c ( C C ) + i = 1 a A i ˚ v e c ( r t i r t i ) + i = 1 b B i ˚ v e c ( Σ t i ) .
in which A i ˚ = A i A i , B i ˚ = B i B i , and ⊗ is the Kronecker product. Since the covariance matrices Σ t are symmetric, we can also write (3) in the vector-half form:
v e c h ( Σ t ) = v e c h ( C C ) + i = 1 a A ˜ i v e c h ( r t i r t i ) + i = 1 b B ˜ i v e c h ( Σ t i ) .
where A i ˜ = L n A i ˚ K n , B i ˜ = L n B i ˚ K n , and L n and K n are matrices of dimension n ( n + 1 ) × n 2 extracting the upper triangular parts of symmetric matrices A i ˚ and B i ˚ . Note that dim ( v e c ( Σ t ) ) = n 2 and dim ( v e c h ( Σ t ) ) = n ( n + 1 ) / 2 . For convenience, we denote θ = ( θ 1 , , θ p ) by the parameter vector in Model (3), in which p = 2 ( a + b ) n 2 + n ( n + 1 ) / 2 , so that the matrices C, A i , and B i are functions of θ : C = C ( θ ) , A i = A i ( θ ) , B i = B i ( θ ) . And we denote by θ 0 the true parameter vector of the model.
We assume that the values of r t in (1) are stationary; then, the following stationary condition should be imposed for the BEKK ( a , b ) Model (5) (see Engle and Kroner (1995) and Comte and Lieberman (2003)).
Condition 1 (Stationary Condition).
The p-dimensional return series  r t  in (1) is stationary if the following conditions hold for Model (3):
(i) 
C * ( θ ) = C C  is a continuous function of  θ , and there exists  C 0 > 0 det ( C * ( θ ) ) C 0 , where  det ( · )  represents the determinant of a matrix;
(ii) 
For any  θ ,  A ˜ i ( θ )  and  B i ˜ ( θ )  are continuous functions of  θ ;
(iii) 
For any  θ ρ ( i = 1 a A ˜ i ( θ ) + i = 1 b B ˜ i ( θ ) ) < 1 , i.e., the largest modulus of eigenvalues of  i = 1 a A ˜ i ( θ ) + i = 1 b B ˜ i ( θ )  is less than 1.

3.2. Likelihood Function

In this section, we discuss some properties of the likelihood of the BEKK ( a , b ) model. Assume that ϵ t follows a standard n-dimensional Gaussian distribution. Ignoring constants, we can write the quasi-log-likelihood as
T ( θ ) = 1 2 T t = 1 T l t ( θ ) , l t ( θ ) = ( log [ d e t ( Σ t ) ] + r t Σ t 1 r t ) .
Taking the derivative on Σ t with respect to the ith element of θ , we obtain
Σ t θ i = C C θ i + j = 1 a A j θ i r t j r t j A j + A j r t j r t j A j θ i + j = 1 b ( B j θ i Σ t j B j + B j Σ t j B j θ i + B j Σ t j θ i B j ) ,
which can be computed recursively. The derivative in (7) has the following property (the proof is given in Appendix A).
Proposition 2.
Let R t = v e c h ( r t r t ) ; then,
Σ t θ i Ψ 1 + Ψ 2 · sup | | R t | | .
where Ψ 1 and Ψ 2 are two constants.
Assume that L T ( θ ) is twice continuously differentiable in a neighborhood Θ 0 Θ of θ 0 . We define the averages of the score vector and Hessian matrix as follows:
S T ( θ ) = T 1 t = 1 T s t ( θ ) and H T ( θ ) = T 1 t = 1 T h t ( θ ) ,
where s t ( θ ) = l t ( θ ) / θ and h t ( θ ) = 2 l t ( θ ) / θ θ . Taking the derivative of (6) with respect to θ i yields
l t θ i = T r Σ t θ i Σ t 1 r t r t Σ t 1 Σ t θ i Σ t 1 ,
2 l t ( θ ) θ j θ i = T r 2 Σ t θ j θ i Σ t 1 Σ t θ i Σ t 1 Σ t θ j Σ t 1 + r t r t Σ t 1 Σ t θ j Σ t 1 Σ t θ i Σ t 1 r t r t Σ t 1 2 Σ t θ j θ i Σ t 1 + r t r t Σ t 1 Σ t θ i Σ t 1 Σ t θ j Σ t 1 ,
in which T r ( · ) represents the trace of a matrix. Comte and Lieberman (2003) showed the following property for l t ( θ ) .
Proposition 3.
Under Condition 1, the following properties hold:
(i) 
When T + , H T 0 : = 1 T t = 1 T l t 2 ( θ 0 ) θ θ P H for a nonrandom positive-definite matrix H;
(ii) 
For the Fisher information matrix I 0 : = E ( l t ( θ 0 ) θ · l t ( θ 0 ) θ )   = E ( S T 0 ( S T 0 ) ) , | | I 0 | | < ;
(iii) 
For θ Θ , E ( sup | | θ θ 0 | | ϵ | 3 l t ( θ ) θ i θ j θ k | ) is bounded for all ϵ > 0 and i , j , k = 1 , , p .
In the sparse representation, the majority elements of the true parameter vector θ 0 are exactly 0. Hence, we could partition θ 0 into two sub-vectors. Let U 0 be the set of indices { j { 1 , , p } : θ j 0 0 } and θ U 0 0 be the q-dimensional vector composed of the nonzero elements { θ j 0 0 : j U 0 } . Similarly, we define θ U 0 c 0 as a ( p q ) -dimensional zero vector. Without loss of generality, θ 0 is stacked as θ 0 = ( ( θ U 0 0 ) , 0 ) = ( ( θ U 0 0 ) , ( θ U 0 c 0 ) ) . For convenience, we define the average of the “score subvector” S U 0 , T ( θ ) and the “Hessian sub-matrix” H U 0 , T ( θ ) by s U 0 , t ( θ ) = l t ( θ ) / θ U 0 and h U 0 , t ( θ ) = 2 l t ( θ ) / θ U 0 θ U 0 . Similarly, we define S U 0 c , T ( θ ) . We also denote S T ( θ 0 ) = S T ( θ U 0 0 , 0 ) as S T 0 .
Proposition 4.
The quasi-log-likelihood function L T for the BEKK(1,1) has the following properties:
(i) 
For i = 1 , , p , E | T · S T , i 0 | 4 < , where S T , i 0 is the ith element of S T 0 ;
(ii) 
For a sufficiently large T, H U 0 , T 0 is almost surely positive definite, and λ min ( H U 0 , T 0 ) = O p ( 1 ) ;
(iii) 
There exists a neighborhood Θ U 0 0 Θ of θ U 0 0 such that, for all θ ( 1 ) and θ ( 2 ) Θ U 0 0 and some K T = O p ( 1 ) ,
| | H U 0 , T ( θ ( 1 ) , 0 ) H U 0 , T ( θ ( 2 ) , 0 ) | | K T | | θ ( 1 ) θ ( 2 ) | | .
Here, a T = O p ( 1 ) means that | a T | c with probability 1 when T and c is a constant. Proposition 4(i) shows that the fourth moment of the score function S T is always finite. Proposition 4(ii) indicates that λ min ( H U 0 , T 0 ) is almost surely positive and bounded away from 0. Hence, when the L 1 penalty is combined with the quasi-likelihood function, the concavity around θ 0 can be ensured, so that a local maximizer can be obtained. Proposition 4(iii) is trivial in linear models, but not in our case. The proof of Proposition 4 is given in Appendix A.

3.3. L 1 Penalty Function and Penalized Quasi-Likelihood

Before discussing the consistency of the sparse estimator, we introduce the following condition, by following the strong irrepresentable condition for LASSO-regularized linear regression models in Zhao and Yu (2006).
Condition 2 (Irrepresentable condition).
There exists a neighborhood Θ U 0 0 Θ of θ U 0 0 , such that
sup θ ( 1 ) , θ ( 2 ) Θ U 0 0 | | [ ( / θ U 0 T ) S U 0 c , T ( θ ( 1 ) , 0 ) ] [ H U 0 , T ( θ ( 2 ) , 0 ) ] 1 | | c
for a constant c that takes its value in (0,1) almost surely.
Definition 1.
The half of the minimum signal d is defined as
d ( d = d T ) = 1 2 min { | θ j 0 |   :   θ j 0 0 } = 1 2 min j U 0 | θ j 0 | .
Assume that p λ ( x ) is an L 1 penalty function, i.e., p λ ( | x | ) = λ | x | . We consider the following penalized quasi-likelihood (PQL):
Q T ( θ ) = L T ( θ ) P T ( θ )
in which P T ( θ ) = j = 1 p p λ ( | θ j | ) = λ j = 1 p | θ j | is the penalty term and λ ( = λ T ) 0 is the regularization parameter determining the size of the model. If θ ^ maxmizes the PQL, i.e.,
θ ^ = arg max θ Θ Q T ( θ ) .
we say that θ ^ is a penalized quasi-maximum likelihood estimator (PQMLE).
Similar to Fan and Lv (2011) and Uematsu (2015), we add some conditions on the penalty function p λ ( · ) and the half minimum signal.
Condition 3.
The penalty function p λ satisfies the following properties:
(i) 
λ = min { O ( T α ) , o ( q 1 2 T γ log T ) }  for some  α ( δ 0 + γ , 1 2 δ 0 4 ) , γ ( 0 , 1 2 ]  and large T. Here,  a = O ( f ( T ) )  means  | a / f ( T ) |  is bounded by a constant and  b = o ( g ( T ) )  means  | b / g ( T ) |  when  T ;
(ii) 
d T γ log T for some γ ( 0 , 1 2 ] and large T, where d is the half-minimum signal we defined before.

4. Properties of the PQML Estimator and Implementation

This section studies the sparsity and the consistency of the PQML estimator and discuss some implementation issues.

4.1. Sparsity of the PQML Estimator

First, we introduce three lemmas whose proofs are given in Appendix A. For convenience, we denote U ^ : = supp ( θ ^ ) , which is a set of indices corresponding to all nonzero components of θ ^ , where supp is the notation of support set and θ ^ U ^ is a subvector of θ ^ , formed by its restriction to U ^ . Then, U c ^ represents a set of indices corresponding to all 0 components in θ ^ . We also denote ⊙ as the Hadmard product.
Lemma 1.
When the penalty function p λ satisfies Condition 3, θ ^ is a strict local maximizer of the L 1 -PQL Q T ( θ ) defined in (9) if
S U ^ , T ( θ ^ ) λ T 1 s i g n ( θ ^ U ^ ) = 0 ,
| | S U c ^ , T ( θ ^ ) | | < λ T ,
λ m i n [ H U ^ , T ( θ ^ ) ] > 0 ,
in which 1 represents the vector with all elements equaling to 1 and s i g n ( · ) is as defined at the beginning of Section 3.
To show the weak oracle property of the PQML estimator, we also need the following lemma.
Lemma 2.
Let w t be a martingale difference sequence with E | w t | m C w for all t, where m > 2 and C w is a constant. Then, we have
T m 2 E t = 1 T w t m < .
Then, the weak oracle property of the PQML estimator can be established by the following theorem, whose proof is provided in Appendix A.
Theorem 1.
( L 1 -PQML estimator) Under Conditions 2 and 3, for the L 1 penalty function P T ( θ ) = λ i = 1 p | θ | , in which p = O ( T δ ) and q = O ( T δ 0 ) , if
δ [ 0 , 4 ( 1 2 α ) ) , 0 < δ 0 < min { 2 3 ( 1 2 γ ) , γ } ,
with α ( δ 0 + γ , 1 2 δ 0 4 ) , γ ( 0 , 1 2 ] , and δ > δ 0 , then there exists a local maximizer θ ^ = ( ( θ ^ U 0 ) , ( θ ^ U 0 c ) ) for Q T ( θ ) , such that the following properties are satisfied:
(i) 
(Sparsity) θ ^ U 0 c = 0 with probability approaching one;
(ii) 
(Rate of convergence) | | θ ^ U 0 θ U 0 0 | | = O p ( T γ log T ) .
p = O ( T δ ) is equivalent to p T δ c when T . The growth rate of p is controlled by T δ and q is slower than with T δ 0 . For example, to make this growth rate of q much slower than p, we can find a set of values for δ = 3 2 , δ 0 = 1 20 , γ = 1 30 , and α = 1 5 that satisfy the conditions above. Since, in our case, p O ( n 2 ) , we have n = O ( T 3 4 ) and, hence, it is possible for n to exceed the sample size T. Although the difference between the rates of p and n is not as large as that in (Fan and Lv 2011), in which log p = O ( T 1 2 α ) and q = o ( T ) , it is enough to be applied in most cases in practice.

4.2. Implementation and Selection of λ

To compute the whole regularization path of L 1 -PQML estimators, we note that several algorithms have been proposed to solve penalized optimization problems. For example, Efron et al. (2004) proposed the least-angle regression (LARS) algorithm to compute an efficient solution to the optimization problem for LASSO. Later on, pathwise coordinate descent methods were proposed to solve the LASSO-type problem efficiently; see Friedman et al. (2007) and Wu and Lange (2008). For the PQML estimator, we used an algorithm inspired by the BLasso algorithm (Zhao and Yu 2006, 2007) with some necessary modifications since the BLasso algorithm does not need to explicitly calculate the first derivatives and second derivatives of the likelihood function, which are complicated in our case. We note that the original BLasso algorithm uses 0 as initial values for all parameters, but the diagonal elements of A and B are positive by definition, so we make the following modification. We set 0 as the initial values for all off-diagonal elements in A, B, and C, and set the estimated values of fitting the component series into a univariate GARCH model as the initial values of the diagonal elements in parameter matrices A, B, and C.
Another issue in the implementation is to select the tuning parameter λ , which leads to the problem of model selection. The tuning parameter λ can be chosen by several criteria. For example, it is usually easy to consider the Akaike information criterion (AIC), the small-sample corrected AIC (AICC), and Bayesian information criterion (BIC) criteria to select the tuning parameter. In addition, Wang et al. (2009) proposed a modified BIC criterion and Fan and Tang (2013) extended it for the case p > T . Sun et al. (2013) proposed using Cohen’s kappa coefficient, which measures the agreement between two sets. Another method for model selection is to use cross-validation (CV). Zhang and Yang (2015) used CV to choose the best model among model selection procedures such as AIC and BIC. In our study, we apply the AIC and BIC criteria on the testing data and select the best tuning parameters. Note that, since our data are ordered, k-fold CV is not applicable here and the data are split in time order.

5. Simulation

In this section, we study the performance of the regularized BEKK models on some simulated examples. Consider Model (3) with n = 4 and a = b = 1 . Note that we then have p = 42 parameters, as matrix C is lower triangular. We assume that the parameter matrices satisfy the stationary condition, Condition 2, and, for identification purposes, we assume that the diagonal elements in C are positive, a 11 > 0 , and b 11 > 0 . We consider two cases for matrices A, B, and C, which are summarized in Table 1. In both cases, the indices of nonzero elements in coefficient matrices A, B, and C are randomly generated. To ensure that the matrices satisfy Condition 1, values of the diagonal elements in A and B are randomly generated from a uniform distribution on U ( 0.45 , 0.45 ) , and the off-diagonal nonzero elements in A and B are generated from U ( 0.5 , 0.5 ) . All the nonzero elements in C are generated from U ( 0.1 , 0.1 ) .
For each case, we simulate the data r t ( 1 t T ) with T = 600 , and then use the proposed regularized procedure to make inference on the model. Since the diagonal elements in A, B, and C cannot be zero, we do not shrink the diagonal elements in A, B, and C. Additionally, we set the estimates of parameters in univariate GARCH models for each component series as the initial values of diagonal elements in A, B, and C.
To demonstrate the performance of our estimates, we consider three measurements. The first is the success rate in estimating zero and nonzero elements in θ or parameter matrices:
τ 0 = i = 1 p I ( θ i 0 = 0 θ ^ i = 0 ) i = 1 p I ( θ i 0 = 0 ) , τ 0 C = i = 1 p I ( θ i 0 0 θ ^ i 0 ) i = 1 p I ( θ i 0 0 ) .
The second measure is the root of mean squared errors, which is defined as ν = | | θ 0 θ ^ λ | | 2 . The third measure is the Kullback–Leibler information, which is given by
κ = 1 2 T t = 1 T | Σ t Σ ^ t 1 | log | Σ t Σ ^ t 1 | ,
where Σ ^ t = C ^ C ^ + A ^ r t 1 r t 1 A ^ + B ^ Σ t 1 B ^ . We run N = 500 simulations for each case, and present the performance measures and their standard errors (in parentheses) for different λ s in Table 2.
To select the tuning parameter λ , we use the first 500 samples as the training data and the last 100 samples as the test data. The training data are used to estimate model parameters θ λ for a given λ , and the test data are used to choose the best λ , i.e., the one that gives the minimum AICs and BICs. That is,
λ ^ BIC = arg min λ BIC λ , λ ^ AIC = arg min λ AIC λ ,
in which the AIC λ and BIC λ are defined as
BIC λ = 2 L T t e s t ( θ ^ λ ) + k log ( T t e s t ) T t e s t AIC λ = 2 L T t e s t ( θ ^ λ ) + 2 k T t e s t ,
where, in this case, k = i = 1 p I ( θ ^ i λ 0 ) , T t e s t = 100 , and
L T t e s t ( θ ) = 1 2 T t e s t t = 501 600 ( log [ d e t ( Σ t ) ] + r t Σ t 1 r t ) .
Figure 1 shows the histograms of selected λ s via BIC and AIC with CV for Cases 1 and 2. In general, we can see from Figure 1 that λ is favored by BIC and AIC when its value is between 0.64 and 2. However, slight differences between these two cases can be found. For Case 1, λ s around 1 are most favored by both BIC and AIC, while, for Case 2, λ s around 1 and 2 are most favored by BIC and AIC, respectively.

6. Real Data Applications

In this section, we use the regularized BEKK representation to study the volatility spillover effect and find optimal Markowitz’s mean–variance portfolios. The data we studied consist of daily log-returns of 18 stocks during the period 4 January 2016–31 January 2018, which are listed in Table 3 (NASDAQ Stock Symbols n.d.). Figure 2 shows the time series of these 18 stocks and Table 4 summarizes the sample mean, the sample standard deviation, the sample skewness, the sample kurtosis, and the correlations of these 18 series. All the correlations are positive for every two stocks in the selected period, and, except for IPG, all the stocks have a positive mean. The sample kurtosis for some stocks is way larger than 3, which indicates that we cannot simply assume that those returns are following normal distributions individually. Hence, it is natural to employ a suitable time series model to examine the data.

6.1. Volatility Spillovers

To use the MGARCH–BEKK representation to analyze the market, consisting of 18 stocks, we should realize that certain types of regularization or shrinkage are necessary, due to the complexity of the volatility dynamics. In particular, we use the proposed L 1 -regularized BEKK(1,1) model and procedure to study the volatility spillover among the 18 stocks. We first compute the PQML estimates of the model for different λ s. Figure 3 shows the estimated structures of estimated coefficient matrices A ^ λ and B ^ λ for λ = 4 , 2 , 1 , 0.5 , 0.3 , in which the nonzero values of A ^ λ and B ^ λ are represented as the directional lines among stocks. Since matrices A and B in the model are not symmetric before the quadratic forms, we use the directional lines to distinguish the nonzero elements between upper-diagonal and lower-diagonal elements. Specifically, if a i j 0 , the directional line progresses from i to j. As the PQML estimates A ^ λ and B ^ λ tell us the significant interdependence and contagion effects of the 18 stocks, the network structures in Figure 3 provide a clear representation on volatility spillover. Furthermore, we notice that, for some moderate values of λ , for example, λ = 0.5 , A ^ λ is very sparse, whereas B ^ λ demonstrates more interdependence among stocks. When larger values of λ are used in the regularization procedure, the PQML estimates A ^ λ are quickly shrunk into diagonal matrices, and B ^ λ also become more sparse than for the case λ = 0.5 .
Using the PQML estimates of A ^ λ , B ^ λ , and C ^ λ and the BEKK(1, 1) representation, we compute the estimated volatilities and the dynamic correlations among 18 stocks. Figure 4 shows the volatilities estimated by the regularized BEKK(1,1) model with λ = 2 , 0.5 and univariate GARCH models. Note that most volatility series estimated by the three models are similar, except for stocks NFLX, ORCL, and TIF. We also show the estimated dynamic correlations among 18 stocks in a regularized BEKK(1,1) model with λ = 1 in Figure 5. We note that most correlations among the 18 stocks are positive during the sample period.
To show the overall volatility spillover, we extend the idea of the spillover index in Diebold and Yilmaz (2009). Specifically, note that E [ ϵ t + 1 ϵ t + 1 ] = Σ t + 1 = Σ t + 1 1 2 ( Σ t + 1 1 2 ) , where Σ t + 1 1 2 is the unique lower-triangular Cholesky factor of Σ t + 1 . We denote elements of Σ t 1 2 by σ 1 2 , i , j , t ; then, the Spillover Index S t + 1 is defined as
S t + 1 = i , j = 1 , i j n σ ^ 1 2 , i , j , t + 1 2 t r a c e ( Σ ^ t + 1 ) × 100 % ,
where n is the number of stocks, which is equal to 18. We plot the daily spillover indices of 18 stocks for λ = 2 and 0.5 . The spillover indices during the sample period vary between 5% and 80%, and smaller λ s seem to generate more correlations among stocks. In particular, three big spikes can be found on 4 February 2016, 24 June 2016, and 9 November 2016. In addition to finding the PQML estimates for different λ s, we also find the whole Ł 1 regularization path. Note that the number of parameters in the BEKK(1,1) model for 18 stocks is p = 819 , and we only show the regularized path for 819 18 × 3 = 765 off-diagonal elements in A ^ λ , B ^ λ , and C ^ λ . And both plots are shown in Figure 6.

6.2. Portfolio Optimization

We further apply the regularized BEKK model to Markowitz mean–variance portfolio optimization (Markowitz 1952). Using portfolio variance as a measure of the risk, Markowitz portfolio optimization theory provides an optimal pay-off between the profit and the risk. Since the means and covariance matrix of assets are assumed to be known in the theory, they need to be estimated before being plugged into the framework. For high-dimensional portfolios, regularized methods are commonly used to achieve better performance. For instance, Brodie et al. (2009) and Fastrich et al. (2015) used an L 1 penalty function for sparse portfolios, and Di Lorenzo et al. (2012) used a concave optimization-based approach to estimate the optimal portfolio. In our case, we use the regularized BEKK model to predict the covariance matrices in the next period, and then apply Markowitz portfolio theory to find the optimal portfolios.
In particular, we assume that the portfolio consists of n = 18 risky assets and denote μ t and Σ t as the mean and covariance matrix, respectively, of the n risk assets at time t. Let 1 = ( 1 , , 1 ) be an n-dimensional vector of ones. Markowtiz mean–variance portfolio theory minimizes the variance of the portfolio min w t w t Σ t w t , subject to the constraint w t 1 = 1 and w t μ t = μ * , where μ * is the target return. When short selling is allowed, the efficient portfolio can be explicitly expressed as
w effi , t = b ˜ d ˜ Σ t 1 1 a ˜ d ˜ Σ t 1 μ t + μ * c ˜ d ˜ Σ t 1 μ t a ˜ d ˜ Σ t 1 1 ,
where a ˜ = μ t Σ t 1 1 , b ˜ = μ t Σ t μ t , c ˜ = 1 Σ t 1 , and d ˜ = b ˜ c ˜ a ˜ 2 . When the target return μ * is chosen to minimize the variance of the efficient portfolio, we obtain the global minimum variance (GMV) portfolio:
w minvar , t = Σ t 1 1 / 1 Σ t 1 .
For comparison purposes, we also use another three multivariate volatility models to predict the covariance matrices of n = 18 stocks. The first is very simple, and it assumes a constant covariance matrix for n stocks. The second is a factor-GARCH model (Alexander 2000; Engle 1990; van der Weide 2002; Vrontos et al. 2003), which assumes the following for asset return vector r t , factors f t , and volatilties of k independent factors:
r t = W f t , Cov ( f t ) = Σ t = diag { σ 1 t 2 , σ 2 t 2 , , σ k t 2 } ,
σ i t 2 = 1 + β i x i , t 1 2 + γ i σ i , t 1 2 ,
where W is a k × k lower-triangular matrix with diagonal elements equal to 1 and x t = ( x 1 t , , x k t ) is a vector of k independent factors. The third covariance model is a dynamic conditional correlation GARCH (DCC–GARCH) model (Engle 2002), which has the form
r t = Σ t 1 2 ϵ t , ϵ t N ( 0 , I n ) , Σ t = D t R t D t ,
Q t = ( 1 α β ) C + α s t 1 s t 1 + β Q t 1 , R t = diag ( Q t ) 1 2 Q t diag ( Q t ) 1 2 ,
where D t = diag ( d 1 t , , d n t ) , s i , t = r i , t / d i , t , s t = ( s 1 , t , , s T , t ) , and R t is the conditional correlation matrix at time t, that is, R t = Corr ( r t | F t 1 ) . And C is the unconditional correlation matrix, i.e., C = E ( R t ) . The matrix Q t can be interpreted as a conditional covariance matrix of devolatilized residuals. For the dynamics of the univariate volatilities, d i , t s are assumed to follow a GARCH(1,1) process:
d i , t = ω i + a i r i , t 1 2 + b i d i , t 1 2 ,
where ( w i , a i , b i ) are GARCH(1,1) parameters.
Let t = 2 January 2018, ⋯, 31 January 2018; we first fit 4 covariance models to the returns of 18 stocks from 4 January 2016 to t, and then compute the 1-day-ahead prediction of covariance matrices. Using the predicted covariance matrices, we compute the efficient portfolios w minvar , t + 1 and w effi , t + 1 for μ * = 0.15 % , 0.10 % , and 0.05 % . Table 5 shows the means, standard deviations (SD), and the information ratios (IR, i.e., ratio of means and standard deviations) for realized portfolio returns in the month of January 2018. As argued by Engle and Colacito (2006) and Engle et al. (2019), these statistics are good measurements of the out-of-sample performance of Markowitz portfolios. As DeMiguel et al. (2007) claimed that it is difficult to outperform equally weighted portfolios in terms of the out-of-sample mean for Markowitz portfolios, we also include the performance of equally weighted portfolios as a benchmark in Table 5. We note that all the means generated from four covariance models are smaller than that from equally weighted portfolios (0.430%), and the standard deviations of covariance models, except the factor GARCH, are smaller than that of equally weighted portfolios. Notably, the regularized BEKK model consistently maintains the second-best mean performances at 0.39%, 0.352%, 0.382%, and 0.416% for GMV, and μ * values of 0.15%, 0.10%, and 0.05%. However, the information ratio of the regularized BEKK model surpasses that of all other portfolios. It achieves the highest values across all scenarios—0.601, 0.540, 0.654, and 0.657—for GMV and μ * = 0.15 % , 0.10 % , and 0.05 % . These results show the robustness and efficiency of the regularized BEKK model in portfolio optimization, consistently delivering competitive mean performance and superior risk-adjusted returns compared to other covariance models.

7. Discussion and Conclusive Remarks

Modeling the dynamics of high-dimensional covariance matrices is an interesting and challenging problem in both financial econometrics and high-dimensional time series analysis. To address this issue, this paper proposes an inference procedure with L 1 regularization for the sparse representation of high-dimensional BEKK and to obtain a class of penalized quasi-maximum likelihood estimators. The proposed regularization allows us to find significant parameters in the BEKK representation and shrink the non-essential ones to zero, hence providing a sparse estimate of the BEKK representations. We show that the sparse BEKK representation has suitable theoretical properties and is promising for applications in portfolio optimization and volatility spillover.
The proposed sparse BEKK representation also contributes to the application of machine learning methods in time series modeling. As most discussion on applying regularization methods to time series modeling focuses on regularizing high-dimensional vector autoregressive models and their variants (Nicholson et al. 2017; Sánchez García and Cruz Rambaud 2022), it seems that the sparse representation of dynamics of high-dimensional variance–covariance matrices has been ignored in the literature. While obtaining a sparse representation of the dynamics within high-dimensional variance–covariance matrices is crucial to enhance interpretability in time series modeling, our study bridges this gap by considering a basic L 1 regularization method. One obvious extension from our current study is to replace the L 1 penalty with other types of penalty for high-dimensional MGARCH models, for instance, the SCAD penalty (Fan and Li 2001), the adaptive LASSO (Zou 2006), and the group LASSO (Yuan and Lin 2006). With different types of penalty functions, one can regularize the assets in the model with different requirements, hence causing the estimates to have different kinds of asymptotic properties.
As the proposed sparse BEKK representation simplifies the dynamics of covariance matrices of high-dimensional time series, it has advantages over existing MGARCH models in some financial applications. In particular, the sparse BEKK representation can capture significant volatility spillover effects in high-dimensional financial time series, which usually cannot be analyzed using other MGARCH models. Since significant volatility spillover is captured, the proposed method also improves the performance of portfolio optimization based on the dynamics of high-dimensional covariance matrices. The proposed procedure can certainly be extended to incorporate more empirical aspects of financial time series. Taking the leverage effect as an example, one may modify the regularization procedure to obtain sparse representation of high-dimensional multivariate exponential or threshold GARCH models.
Although the proposed framework shows advantages in modeling dynamics of high-dimensional covariance matrices, the computational challenge is not completely resolved. The main reason is that the proposed inference procedure involves a step of computing derivatives via the Kronecker product of parameter matrices. Since the Kronecker product turns two n × n matrices into an n 2 × n 2 matrix, the requirement for computational memory resources increases significantly. Hence, the proposed procedure is suitable for problems in which the number of component time series ranges from several to 100. If the number of assets progresses beyond 200, the computational cost is still a major concern. One possible remedy for this is training a neural network to approximate the regularized likelihood of the high-dimensional model. In such a way, the proposed regularization using the high-dimensional MGARCH model can be extended to characterize the dynamics of covariance matrices of larger size.

Author Contributions

Conceptualization, H.X.; methodology, H.X., H.Z. and S.Y.; software, S.Y.; validation, S.Y., H.Z. and H.X.; formal analysis, S.Y.; investigation, S.Y., H.X. and H.Z.; resources, S.Y.; data curation, S.Y.; writing—original draft preparation, S.Y., H.X. and H.Z.; writing—review and editing, H.X.; visualization, S.Y.; supervision, H.X.; project administration, H.X. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data that support the findings of this study are available by request.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:
AICAkaike information criterion
BEKKBaba–Engle–Kraft–Kroner
BICBayesian information criterion
CVCross-validation
DCCDynamic conditional correlation
GARCHGeneralized autoregressive conditionally heteroskedastic
GMVGlobal minimum variance
IRInformation ratio
LARSLeast-angle regression
LASSOLeast absolute shrinkage and selection operator
MGARCHMultivariate GARCH
PQLPenalized quasi-likelihood
PQMLPenalized quasi-maximum likelihood
SCADSmoothly clipped absolute deviation
SDStandard deviation

Appendix A. Proofs of Propositions, Lemmas, and Theorems

Appendix A.1. Proof of Propostion 2

Let R t , C , and Σ t * be defined by
R t = ( v e c h ( r t r t ) , , v e c h ( r t m + 1 r t m + 1 ) ) , Σ t * = ( v e c h ( Σ t ) , , v e c h ( Σ t m + 1 ) ) ,
where m = m a x ( a , b ) , and let
C = ( v e c h ( C C ) , 0 , , 0 ) ,
with dimensions m n ( n + 1 ) / 2 .
Define A = A ˜ 1 A ˜ m I 0 0 0 0 0 0 0 I 0 and B = B ˜ 1 B ˜ m I 0 0 0 0 0 0 0 I 0 , with convention A ˜ i = 0 if i > a and B ˜ i = 0 if i > b . Then, the model can be written as
Σ t * = C + A R t + B Σ t 1 * = k = 0 t 1 [ B k ( θ ) C ( θ ) ] + B t ( θ ) Σ 0 * + k = 0 t 1 B k ( θ ) A ( θ ) L k R t 1 ( θ 0 )
where L is the backshift operator L r t = r t 1 . Here, Σ 0 * is fixed and R t depends on θ 0 but is not a function of θ . Then, we have
Σ t * θ i = θ i ( k = 0 t 1 B ˜ k C ) + θ i ( B ˜ t ) Σ 0 * + θ i ( k = 0 t 1 B ˜ k A ˜ L ) R t 1
Since
B ˜ k θ = j = 0 k 1 B ˜ j B ˜ θ j B ˜ k 1 j ,
we have
B j B θ j B k 1 j | | B j | | B θ i | | B k 1 j | | , j = 0 , , k 1 .
Applying Lemma A.3. from Comte and Lieberman (2003), | | B k | | Ψ k n 0 ρ 0 k for all k, we have
B j B θ j B k 1 j Ψ 2 k n 0 ρ 0 k B θ i .
in which n 0 is a fixed number, Ψ is a constant independent of θ , and 1 < ρ 0 < 1 .
To bound (A1), there are three terms to bound:
θ i ( k = 0 t 1 B ˜ k C ) = k = 1 t 1 B ˜ k θ i C + k = 0 t 1 B ˜ k C θ i k = 1 t 1 B ˜ k θ i | | | C | | + k = 0 t 1 | | B ˜ k | | C θ i Ψ 2 | | C | | k = 1 t 1 k n 0 ρ 0 k B ˜ θ i + Ψ C θ i k = 0 t 1 k n 0 ρ 0 k π ( n 0 ) Ψ ( Ψ | | C | | · B ˜ θ i + C θ i ) ,
using k = 0 t 1 k n 0 ρ 0 k k = 0 t k n 0 ρ 0 k 1 k = 0 k n 0 ρ 0 k 1 = π ( n 0 ) , where π ( n 0 ) is a constant that only depends on n 0 . And, if ρ 0 = 0 , this term is then easily bounded because B ˜ is the nilpotent and all sums are finite. In the same way,
θ i ( B ˜ t ) Σ 0 * Ψ π ( n 0 ) B ˜ θ i | | Σ 0 * | | .
Finally,
| | θ i ( k = 0 t 1 B ˜ k L k A ˜ ) R t 1 | | | | k = 0 t 1 ( θ i B ˜ k L k A ) ˜ R t 1 | | + | | [ k = 0 t 1 B ˜ k L k ( θ i A ˜ ) R t 1 | | .
Denote the first and second sums on the right-hand side of the inequality as T 1 and T 2 , respectively, we have
| | T 1 | | Ψ 2 k = 1 t 1 k n 0 + 1 ρ 0 k 1 | | A ˜ | | · | | R t k 1 | | · | | B ˜ θ i | | Ψ 2 | | A ˜ | | · | | B ˜ θ i | | · k = 1 t 1 k n 0 + 1 ρ 0 k 1 · sup t | | R t | | π ( n 0 + 1 ) Ψ 2 | | A ˜ | | · | | B ˜ θ i | | · sup t | | R t | | ,
and
| | T 2 | | Ψ 2 π ( n 0 ) | | A ˜ θ i | | s u p t | | R t | | .
By our assumption, | | C | | , | | C θ i | | , | | A ˜ | | , | | A ˜ θ i | | , | | B ˜ θ i | | , | | Σ 0 * | | are all bounded. And there exists a constant w such that | | Σ t * θ i | | = m | | Σ t θ i | | . Hence,
| | Σ t θ i | | Ψ 1 + Ψ 2 sup t | | R t | | .
where Ψ 1 = Ψ π ( n 0 ) ( Ψ | | C | | · | | B ˜ θ i | | + | | C θ i | | ) + Ψ π ( n 0 ) | | B ˜ θ i | | · | | Σ 0 * | | , and Ψ 2 = Ψ 2 π ( n 0 + 1 ) | | A ˜ | | · | | B ˜ θ i | | + Ψ 2 π ( n 0 ) | | A ˜ θ i | | . □

Appendix A.2. Proof of Proposition 4

As l t ( θ ) θ i = T r ( Σ t θ i Σ t 1 r t r t Σ t 1 Σ t θ i Σ t 1 ) , where T r ( · ) denote the trace of a matrix, and E [ r t r t | F t 1 ] = Σ t , we, hence, have E [ l t ( θ ) θ i | F t 1 ] = 0 , which means that l t ( θ ) θ i is a martingale difference. Then, we want to prove that E [ | T 1 / 2 T 1 t = 1 T l t ( θ 0 ) θ i | m ] = E [ | T 1 / 2 t = 1 T l t ( θ 0 ) θ i | m ] < holds for m = 4 . By Lemma 2, this proof is thus completed if we show that E [ | l t ( θ 0 ) θ i | 4 ] < . By Proposition 2, Σ t θ i Ψ 1 + Ψ 2 sup | | v e c h ( r t r t ) | | . Since
Σ t θ i Σ t 1 r t r t Σ t 1 Σ t θ i Σ t 1 = ( I r t r t Σ t 1 ) Σ t θ i Σ t 1 | | ( I r t r t Σ t 1 ) | | Σ t θ i Σ t 1 | | ( I r t r t Σ t 1 ) | | Σ t θ i | | Σ t 1 | | ,
it is equivalent to show that
E l t ( θ 0 ) θ i 4 = E T r 4 Σ t θ i Σ t 1 r t r t Σ t 1 Σ t θ i Σ t 1 < .
Since T r ( A B ) | | A | | · | | B | | and | | Σ t 1 | | is finite, there exists a constant M such that | | Σ t 1 | | M for all t. Additionally, | | ( I r t r t Σ t 1 ) | | | | I | | + | | r t r t | | · | | Σ t | | 1 1 + M | | r t r t | | , then
T r Σ t θ i Σ t 1 r t r t Σ t 1 Σ t θ i Σ t 1 | | ( I r t r t Σ t 1 ) | | | | Σ t 1 | | Σ t θ i
E T r 4 Σ t θ i Σ t 1 r t r t Σ t 1 Σ t θ i Σ t 1 E [ ( 1 + M sup | | r t r t | | ) 4 ( Ψ 1 + Ψ 2 sup t | | v e c h ( r t r t ) | | ) 4 ] .
Because | | A | | | | v e c h ( A ) | | r a n k ( A ) | | A | | , there exists a constant k such that | | r t r t | | = k | | v e c h ( r t r t ) | | . Hence, if we let | | v e c h ( r t r t ) | | = | | R t | | ,
E [ ( 1 + M sup t | | r t r t | | ) 4 ( Ψ 1 + Ψ 2 sup t | | R t | | ) 4 ] = E [ ( 1 + k M sup t | | R t | | ) 4 ( Ψ 1 + Ψ 2 sup t | | R t | | ) 4 ] = E ( i = 0 8 a i | | R t | | i )
where a i s are constants. Since r t Σ t 1 2 ϵ t , where ϵ t s follow a normal distribution, r t s , hence, admit 16 moments of order. Hence, E | | R t | | i < , for i from 0 to 8. Hence, E ( i = 0 8 a i | | R t | | i ) < ; then, E [ | l t ( θ 0 ) θ i | 4 ] < .
Next, we check (c) and (d). (c) is clear, as we said before. By (III) in Lemma 1, the derivative of H U 0 , T ( θ ) is bounded. By the mean-value theorem,
v e c ( H U 0 , T ( θ ( 1 ) , 0 ) H U 0 , T ( θ ( 2 ) , 0 ) ) = H U 0 , T ( θ , 0 ) θ | θ = θ * · ( θ ( 1 ) θ ( 2 ) ) .
Hence,
| | H U 0 , T ( θ ( 1 ) , 0 ) H U 0 T ( θ ( 2 ) , 0 ) | | | | v e c ( H U 0 , T ( θ ( 1 ) , 0 ) H U 0 , T ( θ ( 2 ) , 0 ) | | = H U 0 , T ( θ , 0 ) θ | θ = θ * · ( θ ( 1 ) θ ( 2 ) ) H U 0 , T ( θ , 0 ) θ | θ = θ * · | | θ ( 1 ) θ ( 2 ) | | K ˜ | | θ ( 1 ) θ ( 2 ) | |
where K ˜ is bounded by (iii) in Proposition 3. Hence, K ˜ = O p ( 1 ) and θ * lies between θ ( 1 ) and θ ( 2 ) .
Next, we verify (e) with β = δ 0 / 2 . For every i { 1 , , p } , it is sufficient to show that max | | v | | = 1 | ( H i 1 , T 0 , , H i q , T 0 ) v | = O p ( T δ 0 / 2 ) for a vector v R q . Using the Cauchy–Schwarz inequality and property of the norm, the left-hand side is bounded by | | ( H i 1 , T 0 , , H i q , T 0 ) | | q 1 / 2 max 1 j q | H i j , T | . Since, from (I) and (II) in Lemma 1, H i j , T 0 = O p ( 1 ) and q = O ( T δ 0 ) , the result follows. □

Appendix A.3. Proof of Lemma 1

First, consider the PQL Q T ( θ ) , as defined in (5), in the constrained | | θ ^ | | 0 -dimensional subspace S : = { θ R p : θ c = 0 } of R p , where θ c denotes the subvector of θ formed by the components in U c ^ . It follows from (12) that Q T ( θ ) is strictly concave in a ball N 0 S centered at θ ^ . This, along with (10), entails that θ ^ , as a critical point of Q T ( θ ) in S , is the unique maximizer of Q T ( θ ) in N 0 .
Now, we show that θ ^ is indeed a strict local maximizer of Q T ( θ ) on the whole space R p . Take a small ball N 1 R p centered at θ ^ such that N 1 S N 0 . We then need to show that Q T ( θ ^ ) > Q T ( γ 1 ) for any γ 1 N 1 N 0 . Let γ 2 be the projection of γ 1 onto S , such that γ 2 N 0 . Thus, it suffices to prove that Q T ( γ 2 ) > Q T ( γ 1 ) . By the mean value theorem, we have
Q T ( γ 1 ) Q T ( γ 2 ) = Q T ( γ 0 ) γ T ( γ 1 γ 2 ) ,
where the vector γ 0 lies between γ 1 and γ 2 . Note that the components of γ 1 γ 2 are zero for their indices in U ^ and ( γ 0 j ) = s g n ( γ 1 j ) for j U c ^ . Therefore, we have
Q T ( γ 0 ) γ T ( γ 1 γ 2 ) = S T ( γ 0 ) T ( γ 1 γ 2 ) λ T [ 1 s g n ( γ 0 ) ] T ( γ 1 γ 2 ) = S U c ^ T ( γ 0 ) T γ 1 U c ^ λ T j U c ^ | γ 1 j |
where γ 1 U c ^ is a subvector of γ 1 formed by the components in U c ^ . By (10), there exists some δ > 0 such that, for any θ in a ball in R p centered at θ ^ with radius δ ,
| | S U c ^ T ( θ ) | | < λ T
We further shrink the radius of ball N 1 to less than δ , so that | γ 0 j |     | γ 1 j |   <   δ for j U c ^ and (A3) holds for any θ N 1 . Since γ 0 N 1 , it follows from (A3) that (A2) is strictly less than
λ T | | γ 1 U c ^ | | 1 λ T | | γ 1 U c ^ | | 1 = 0
Since | | S U c ^ T ( γ 0 ) | | < λ T , S U c ^ T ( γ 0 ) T γ 1 U c ^ λ T | | γ 1 U c ^ | | 1 , and
λ T j U c ^ | γ 1 j |     λ T j U c ^ | γ 1 j |   =   λ T j U c ^ | | γ 1 U c ^ | | 1 .
we have Q T ( γ 0 ) γ T ( γ 1 γ 2 ) 0 and Q T ( γ 1 ) Q T ( γ 2 ) . □

Appendix A.4. Proof of Lemma 2

A Marcinkiewicz-–Zygmund inequality for martingales (Rio 2017) states that
E ( t = 1 T w t ) m { 4 m ( m 1 ) } m / 2 T ( m 2 ) / 2 t = 1 T E | w t | m
holds for m > 2 . Because E | w t | m C w for all t, we have
T m / 2 E ( t = 1 T w t ) m { 4 m ( m 1 ) } m / 2 T 1 t = 1 T E | w t | m { 4 m ( m 1 ) } m / 2 C w .
Thus, the result follows. □

Appendix A.5. Proof for Theorem 1

For notational simplicity, we write, for example, Q T ( ( ( θ U 0 ) , ( θ U 0 c ) ) ) as Q T ( θ U 0 , θ U 0 c ) . Consider events
E T 1 = { | | S U 0 , T 0 | | ( q 1 / 2 / T ) 1 / 2 l o g 1 / 4 T } , E T 2 = { | | S U 0 c , T 0 | | λ l o g 1 T } ,
where q = O ( T δ 0 ) and λ = O ( T α ) . It follows from Bonferroni’s inequality and Markov’s inequality, together with Proposition 4(i), that
P ( E T 1 E T 2 ) 1 i U 0 P ( | T 1 / 2 S i , T 0 |   >   q 1 / 4 ( log T ) 1 / 4 ) i U 0 c P ( | T 1 / 2 S i , T 0 |   >   T 1 / 2 α ) 1 max i U 0 E ( | T 1 / 2 S i , T 0 | 4 ) q log T ( p q ) max i U 0 c E ( | T 1 / 2 S i , T 0 | 4 ) T 4 ( 1 / 2 α ) ( log T ) 4 = 1 O ( log 1 T ) O ( T δ 4 ( 1 / 2 α ) ( log T ) 4 ) ,
where the last two terms are o ( 1 ) because of the condition δ < 4 ( 1 / 2 α ) . Under the event E T 1 E T 2 , we will that there exists a solution θ ^ R p to (10)–(12) with s g n ( θ ^ ) = s g n ( θ 0 ) and | | θ ^ θ 0 | | = O ( T γ log T ) for some γ ( 0 , 1 / 2 ] .
First, we prove that, for a sufficiently large T, Equation (10) has a solution θ ^ U 0 inside the hypercube N = { θ U 0 R q : | | θ U 0 θ U 0 0 | | = T γ log T } , when we suppose U ^ = U 0 . Define the function Ψ : R q R q by
Ψ ( θ U 0 ) = S U 0 , T ( θ U 0 , 0 ) λ 1 s g n ( θ U 0 ) .
Then, (10) is equivalent to Ψ ( θ ^ U 0 ) = 0 . To show that the solution is in the hypercube N , we expand Ψ ( θ U 0 ) around θ U 0 0 . Function (A7) is written as
Ψ ( θ U 0 ) = S U 0 , T 0 + H U 0 T ( θ U 0 * , 0 ) ( θ U 0 θ U 0 0 ) λ 1 s g n ( θ U 0 ) = H U 0 , T 0 ( θ U 0 θ U 0 0 ) + [ S U 0 , T 0 λ 1 s g n ( θ U 0 ) ] + [ H U 0 , T ( θ U 0 * , 0 ) H U 0 , T ] ( θ U 0 θ U 0 0 ) = H U 0 , T 0 ( θ U 0 θ U 0 0 ) + v T + w T
where θ U 0 * lies on the line segment that joins θ U 0 and θ U 0 0 . Since the matrix H U 0 0 is invertible by Proposition 4(ii), (A8) is further written as
Ψ ˜ ( θ U 0 ) : = ( H U 0 , T 0 ) 1 Ψ ( θ U 0 ) = θ U 0 θ U 0 0 + ( H U 0 , T 0 ) 1 v T + ( H U 0 , T 0 ) 1 w T = θ U 0 θ U 0 0 + v ˜ T + w ˜ T
We now derive bounds for the last two terms in (A8). We consider v ˜ T first. For any θ U 0 N ,
min j U 0 | θ j | min j U 0 | θ j 0 | d T = d T T γ log T
by Condition 3(ii), and s g n ( θ U 0 ) = s g n ( θ U 0 0 ) . Using Condition 3(i), we have
| | λ 1 s g n ( θ U 0 ) | | = λ o ( q 1 / 2 T γ log T ) .
This, along with the property of matrix norms and Proposition 4(ii), entails that, during the event E T 1 ,
| | v ˜ T | | = | | H U 0 T 0 1 [ S U 0 T 0 λ 1 s g n ( θ U 0 ) ] | | | | H U 0 T 0 1 | | | | S U 0 T 0 λ 1 s g n ( θ U 0 ) | | q 1 / 2 | | H U 0 T 0 1 | | ( | | S U 0 T 0 | | + | | λ 1 s g n ( θ U 0 ) | | ) q 1 / 2 O p ( 1 ) ( ( q 2 / 4 / T ) 1 / 2 ( log T ) 1 / 2 + o ( q 1 / 2 T γ log T ) ) = o p ( T γ log T )
where the last equality follows from q = O ( T δ 0 ) and δ 0 < 2 3 ( 1 2 γ ) . Next, we consider w ˜ T . By the property of norms and Propositions 4(ii) and (iii),
| | w ˜ T | | = | | ( H U 0 , T 0 ) 1 ( θ U 0 * , 0 ) [ H U 0 , T ( θ U 0 * , 0 ) H U 0 , T 0 ] ( θ U 0 θ U 0 0 ) | | q 1 / 2 | | ( H U 0 , T 0 ) 1 | | | | [ H U 0 , T ( θ * , 0 ) H U 0 , T 0 ] ( θ U 0 θ U 0 0 ) | | q O p ( 1 ) | | H U 0 , T ( θ U 0 * , 0 ) H U 0 , T 0 | | | | θ U 0 θ U 0 0 | | q O p ( 1 ) K T | | θ U 0 * θ U 0 0 | | | | θ U 0 θ U 0 0 | | ,
Since K T = O p ( 1 ) and q = O ( T δ 0 ) with δ 0 < γ ,
| | w ˜ T | | = q O p ( T 2 γ ( log T ) 2 ) = o p ( T γ log T ) ,
with θ i θ i 0 = T γ l o g T for all i U 0 . By (A9), (A11), and(A12), for sufficiently large T, and for all i U 0 ,
Ψ ˜ i ( θ U 0 ) T γ l o g T | | v ˜ T | | | | w ˜ T | | 0 ,
if θ i θ i 0 = T γ log T , and
Ψ ˜ i ( θ U 0 ) T γ l o g T + | | v ˜ T | | + | | w ˜ T | | 0 ,
if θ i θ i 0 = T γ log T .
By the continuity of Ψ ˜ and inequalities (A13) and (A14), an application of Miranda’s existence theorem tells us that Ψ ˜ ( θ U 0 ) = 0 has a solution θ ^ U 0 in N . Clearly, θ ^ U 0 also solves the equation Ψ ( θ U 0 ) = 0 with regard to the first equality in (A8). Thus, we have shown that (10) indeed has a solution in N .
Second, let θ ^ = ( θ ^ U 0 , θ ^ U 0 c ) R p , with θ ^ U 0 N as a solution to (10), and θ ^ U 0 c = 0 . Next, we show that θ ^ satisfies (11) for the event E T 2 . By the triangle inequality and mean value theorem, we have
λ 1 | | S U 0 c T ( θ ^ ) | | λ 1 | | S U 0 c T 0 | | + λ 1 | | S U 0 c T ( θ ^ ) S U 0 c T 0 | | ( log T ) 1 + λ 1 | | ( / θ U 0 ) S U 0 c T ( θ ^ U 0 * * , 0 ) ( θ ^ U 0 θ U 0 0 ) | | ,
where θ ^ U 0 * * lies on the line segment joining θ ^ U 0 and θ U 0 0 . The first term of the upper bound in (A15) is negligible, so that it suffices to show that the second term is less than g ( 0 + ) = 1 . Since θ ^ U 0 solves the equation Ψ ( θ U 0 ) = 0 in (12), we obtain
S U 0 T 0 + H U 0 T ( θ ^ U 0 * , 0 ) ( θ ^ U 0 θ U 0 0 ) λ 1 s g n ( θ ^ U 0 ) = 0
with θ ^ U 0 * lying between θ ^ U 0 and θ U 0 0 . From Proposition 4(ii),(iii) and Condition 1, the last term in (A15) can be expressed as
λ 1 | | ( / θ U 0 ) S U 0 c T ( θ ^ U 0 * * , 0 ) [ H U 0 T ( θ ^ U 0 * , 0 ) ] 1 [ S U 0 T 0 λ 1 s g n ( θ ^ U 0 ) ] | | λ 1 sup θ , θ N | | ( / θ U 0 ) S U 0 c T ( θ , 0 ) [ H U 0 T ( θ , 0 ) ] 1 | | ( | | S U 0 T 0 | | + λ ) λ 1 c ( q 1 / 2 / T ) 1 / 2 l o g 1 / 2 T + λ = λ 1 c ( q 1 / 2 / T ) 1 / 2 log 1 / 2 T + c .
By Condition 3(i), the first term in the last equation of (A16) is o p ( 1 ) ; hence, (A16) is eventually less than 1. This verifies (11).
Finally, (12) is guaranteed by Lemma 1: we have θ ^ as a strict local maximizer of Q T ( θ ) with | | θ ^ θ 0 | | = O ( T γ log T ) and θ ^ U 0 c = 0 in the event that E T 1 E T 2 . Thus, the proofs of Theorems 1(a) and (b) are complete, by (A6). □

References

  1. Aielli, Gian Piero. 2013. Dynamic conditional correlation: On properties and estimation. Jouranl of Business and Economic Statistics 31: 282–99. [Google Scholar] [CrossRef]
  2. Alexander, Carol. 2000. Orthogonal methods for generating large positive semi-definite covariance matrices. In ICMA Centre Discussion Papers in Finance icma-dp2000-06. London: Henley Business School, Reading University. [Google Scholar]
  3. Ampountolas, Apostolos. 2022. Cryptocurrencies intraday high-frequency volatility spillover effects using univariate and multivariate GARCH models. International Journal of Financial Studies 10: 51. [Google Scholar] [CrossRef]
  4. Apergis, Nicholas, and Anthony Rezitis. 2001. Asymmetric cross-market volatility spillovers: Evidence from daily data on equity and foreign exchange markets. The Manchester School 69: 81–96. [Google Scholar] [CrossRef]
  5. Apergis, Nicholas, and Anthony Rezitis. 2003. An examination of okun’s law: Evidence from regional areas in greece. Applied Economics 35: 1147–51. [Google Scholar] [CrossRef]
  6. Baillie, Rechard T., and Tim Bollerslev. 1990. A multivariate generalized arch approach to modeling risk premia in forward foreign exchange rate markets. Journal of International Money and Finance 9: 309–24. [Google Scholar] [CrossRef]
  7. Basu, Sumanta, and George Michailidis. 2015. Regularized estimation in sparse high-dimensional time series model. The Annals of Statistics 43: 1535–67. [Google Scholar] [CrossRef]
  8. Bauwens, Luc, and Sébastien Laurent. 2005. A new class of multivariate skew densities, with application to generalized autoregressive conditional heteroscedasticity models. Journal of Business and Economic Statistics 23: 346–54. [Google Scholar] [CrossRef]
  9. Bickel, Peter J., and Elizaveta Levina. 2008. Covariance regularization by thresholding. The Annals of Statistics 36: 2577–604. [Google Scholar] [CrossRef] [PubMed]
  10. Billio, Monica, Massimiliano Caporin, Lorenzo Frattarolo, and Loriana Pelizzon. 2023. Networks in risk spillovers: A multivariate GARCH perspective. Econometrics and Statistics 28: 1–29. [Google Scholar] [CrossRef]
  11. Bollerslev, Tim. 1986. Generalized autoregressive conditional heteroskedasticity. Journal of Econometrics 31: 307–27. [Google Scholar] [CrossRef]
  12. Bollerslev, Tim. 1990. Comparing predictive accuracy modelling the coherence in short-run nominal exchange rates: A multivariate generalized arch model. The Review of Economics and Statistics 72: 498–05. [Google Scholar] [CrossRef]
  13. Bollerslev, Tim, Robert Engle, and Jeffrey Wooldridge. 1988. A capital asset pricing model with time-varying covariances. Journal of Political Economy 96: 116–31. [Google Scholar] [CrossRef]
  14. Boudt, Kris, Jon Danielsson, and Sébastien Laurent. 2013. Robust forecasting of dynamic conditional correlation garch models. International Journal of Forecasting 29: 244–57. [Google Scholar] [CrossRef]
  15. Brodie, Joshua, Ingrid Daubechies, Christine De Mol, Domenico Giannone, and Ignace Loris. 2009. Sparse and stable markowitz portfolios. Proceedings of the National Academy of Sciences of the United States of America 106: 12267–72. [Google Scholar] [CrossRef]
  16. Cai, Tony, and Weidong Liu. 2011. Adaptive thresholding for sparse covariance matrix estimation. Journal of the American Statistical Association 106: 672–84. [Google Scholar] [CrossRef]
  17. Christiansen, Charlotte. 2007. Volatility-Spillover Effects in European Bond Markets. European Financial Management 13: 923–948. [Google Scholar] [CrossRef]
  18. Comte, Fabienne, and Offer Lieberman. 2003. Asymptotic theory for multivariate garch processes. Journal of Multivariate Analysis 84: 61–84. [Google Scholar] [CrossRef]
  19. DeMiguel, Victor, Lorenzo Garlappi, and Raman Uppal. 2007. Optimal versus naive diversification: How inefficient is the 1/N portfolio strategy? The Review of Financial Studies 22: 1915–53. [Google Scholar] [CrossRef]
  20. Diebold, Francis X., and Kamil Yilmaz. 2009. Measuring financial asset return and volatitliy spillovers, with application to global equity markets. Economic Journal 199: 158–71. [Google Scholar] [CrossRef]
  21. Di Lorenzo, David, Giampalo Liuzzi, Francesco Rinaldi, Fabio Schoen, and Marco Sciandrone. 2012. A concave optimization-based approach for sparse portfolio selection. Optimization Methods and Software 27: 983–1000. [Google Scholar] [CrossRef]
  22. Efron, Bradley, Trevor Hastie, and Robert Tibshirani. 2004. Least angle regression. The Annals of Statistics 32: 407–99. [Google Scholar] [CrossRef]
  23. Engle, Rober. 1982. Autoregressive conditional heteroskedasticity with estimates of the variance of united kingdom inflation. Econometrica 50: 987–1007. [Google Scholar] [CrossRef]
  24. Engle, Robert. 1990. Asset pricing with a factor-arch covariance structure: Empirical estimates for treasury bills. Journal of Econometrics 45: 213–37. [Google Scholar] [CrossRef]
  25. Engle, Robert. 2002. Dynamic conditional correlation: A simple class of multivariate generalized autoregressive conditional heteroskedasticity models. Journal of Business and Economic Statistics 20: 339–50. [Google Scholar] [CrossRef]
  26. Engle, Robert, and Kenneth Kroner. 1995. Multivariate simultaneous generalized arch. Econometric Theory 11: 122–50. [Google Scholar] [CrossRef]
  27. Engle, Robert, and Riccardo Colacito. 2006. Testing and valuing dynamic correlations for asset allocation. Journal of Business and Economic Statistics 24: 238–53. [Google Scholar] [CrossRef]
  28. Engle, Robert, Olivier Ledoit, and Michael Wolf. 2019. Large dynamic covariance matrices. Journal of Business and Economic Statistics 37: 363–75. [Google Scholar] [CrossRef]
  29. Engle, Robert, Takatoshi Ito, and Wen-Ling Lin. 1990. Meteor showers or heat waves? Heteroskedastic intra-daily volatility in the foreign exchange market. Econometrica 58: 525–42. [Google Scholar] [CrossRef]
  30. Fan, Jianqing, and Jinchi Lv. 2011. Noncave penalized likelihood with np-dimensionality. IEEE Transactions on Information Theory 57: 5467–84. [Google Scholar] [CrossRef]
  31. Fan, Jianqing, and Runze Li. 2001. Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the American Statistical Association 96: 1348–60. [Google Scholar] [CrossRef]
  32. Fan, Yingying, and Cheng Yong Tang. 2013. Tuning parameter selection in high dimensional penalized likelihood. Journal of the Royal Statistical Society Series B: Statistical Methodology 75: 531–52. [Google Scholar] [CrossRef]
  33. Fastrich, Björn, Sandra Paterlini, and Peter Winker. 2015. Constructing optimal sparse portfolios using regularization methods. Computational Management Science 12: 417–34. [Google Scholar] [CrossRef]
  34. Francq, Christian, and Jean-Michel Zakoian. 2019. GARCH Models: Structure, Statistical Inference and Financial Applications. Hoboken: John Wiley & Sons. [Google Scholar]
  35. Friedman, Jerome, Trevor Hastie, Holger Höfling, and Robert Tibshirani. 2007. Pathwise coordinate optimization. The Annals of Applied Statistics 1: 302–32. [Google Scholar] [CrossRef]
  36. Giacometti, Rosella, Gabriele Torri, Kamonchai Rujirarangsan, and Michela Cameletti. 2023. Spatial Multivariate GARCH Models and Financial Spillovers. Journal of Risk and Financial Management 16: 397. [Google Scholar] [CrossRef]
  37. Hamao, Yasushi, Ronald W. Masulis, and Victor Ng. 1990. Correlations in price changes and volatility across international stock markets. The Review of Financial Studies 3: 281–307. [Google Scholar] [CrossRef]
  38. Hafner, Christian M., and Arie Preminger. 2009. Asymptotic theory for a factor GARCH model. Econometric Theory 25: 336–63. [Google Scholar] [CrossRef]
  39. Hafner, Christian M., Helmut Herwartz, and Simone Maxand. 2022. Identification of structural multivariate GARCH models. Journal of Econometrics 227: 212–27. [Google Scholar] [CrossRef]
  40. Hassan, Syed Aun, and Farooq Malik. 2007. Multivariate garch modeling of sector volatility transmission. The Quarterly Review of Economics and Finance 47: 470–80. [Google Scholar] [CrossRef]
  41. Hong, Junping, Yi Yan, Ercan Engin Kuruoglu, and Wai Kin Chan. 2023. Multivariate Time Series Forecasting With GARCH Models on Graphs. IEEE Transactions On Signal And Information Processing Over Networks 9: 557–68. [Google Scholar] [CrossRef]
  42. Kaltenhäuser, Bernd. 2002. Return and Volatility Spillovers to Industry Returns: Does EMU Play a Role? CFS Working Paper Series 2002/05. Frankfurt a. M.: Center for Financial Studies (CFS). [Google Scholar]
  43. Lam, Clifford, and Jianqing Fan. 2009. Sparsistency and rates of convergence in large covariance matrix estimation. The Annals of Statistics 37: 4254–78. [Google Scholar] [CrossRef] [PubMed]
  44. Lanne, Markku, and Pentti Saikkonen. 2007. A multivariate generalized orthogonal factor GARCH model. Journal of Business & Economic Statistics 25: 61–75. [Google Scholar]
  45. Ledoit, Olivier, and Michael Wolf. 2004. A well-conditioned estimator for large-dimensional covariance matrices. Journal of Multivariate Analysis 88: 365–411. [Google Scholar] [CrossRef]
  46. Ledoit, Olivier, and Michael Wolf. 2012. Nonlinear shrinkage estimation of large-dimensional covariance matrices. The Annals of Statistics 40: 1024–60. [Google Scholar] [CrossRef]
  47. Ling, Shiqing, and Michael McAleer. 2003. Asymptotic theory for a vector arma-garch model. Econometric Theory 19: 280–310. [Google Scholar] [CrossRef]
  48. Markowitz, Harry. 1952. Portfolio selection. The Journal of Finance 7: 77–91. [Google Scholar]
  49. McAleer, Michael, Suhejia Hoti, and Felix Chan. 2009. Structure and asymptotic theory for multivariate asymmetric conditional volatility. Econometric Reviews 28: 422–40. [Google Scholar] [CrossRef]
  50. NASDAQ Stock Symbols. n.d. Stock Symbol. Available online: https://www.nasdaq.com/market-activity/stocks/ (accessed on 24 January 2024).
  51. Nicholson, William B., David S. Matteson, and Jacob Bien. 2017. VARX-L: Structured regularization for large vector autoregressions with exogenous variables. International Journal of Forecasting 33: 627–51. [Google Scholar] [CrossRef]
  52. Pan, Ming-Shiun, and L. Paul Hsueh. 1998. Transmission of stock returns and volatility between the U.S. and Japan: Evidence from the stock index futures markets. Asia-Pacific Financial Markets 5: 211–25. [Google Scholar] [CrossRef]
  53. Poignard, Benjamin. 2017. New Approaches for High-Dimensional Multivariate Garch Models. General Mathematics [math.GM]. Ph.D. thesis, Université Paris Sciences et Lettres, Paris, France. [Google Scholar]
  54. Ravikumar, Pradeep, Martin J. Wainwright, Garvesh Raskutti, and Bin Yu. 2011. High-dimensional covariance estimation by minimizing 1-penalized log-determinant divergence. Electronic Journal of Statistics 5: 935–80. [Google Scholar] [CrossRef]
  55. Rio, Emmanuel. 2017. Asymptotic Theory of Weakly Dependent Random Processes. Berlin: Springer Nature. [Google Scholar]
  56. Sánchez García, Javier, and Salvador Cruz Rambaud. 2022. Machine Learning Regularization Methods in High-Dimensional Monetary and Financial VARs. Mathematics 10: 877. [Google Scholar] [CrossRef]
  57. Shiferaw, Yegnanew A. 2019. Time-varying correlation between agricultural commodity and energy price dynamics with Bayesian multivariate DCC-GARCH models. Physica A: Statistical Mechanics and Its Applications 526: 120807. [Google Scholar] [CrossRef]
  58. Siddiqui, Taufeeque Ahmad, and Mazia Fatima Khan. 2018. Analyzing spillovers in international stock markets: A multivariate GARCH approach. IMJ 10: 57–63. [Google Scholar]
  59. Sun, Wei, Junhui Wang, and Yixin Fang. 2013. Consistent selection of tuning parameters via variable selection stability. Journal of Machine Learning Research 14: 3419–40. [Google Scholar]
  60. Sun, Yan, and Xiaodong Lin. 2011. Regularization for stationary multivariate time series. Quantitative Finance 12: 573–86. [Google Scholar] [CrossRef]
  61. Theodossiou, Panayiotis, and Unro Lee. 1993. Mean and volatility spillovers across major national stock markets: Further empirical evidence. The Journal of Financial Research 16: 337–50. [Google Scholar] [CrossRef]
  62. Tibshirani, Robert. 1996. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society. Series B (Methodological) 58: 267–88. [Google Scholar] [CrossRef]
  63. Tse, Yiu Kuen, and Albert K. C. Tsui. 2002. A multivariate generalized autoregressive conditional heteroscedasticity model with time-varying correlations. Journal of Business & Economic Statistics 20: 351–62. [Google Scholar]
  64. Uematsu, Yoshimasa. 2015. Penalized likelihood estimation in high-dimensional time series models and its application. arXiv arXiv:1504.06706. [Google Scholar]
  65. van der Weide, Roy. 2002. Go-garch: A multivariate generalized orthogonal garch model. Journal of Applied Econometrics 17: 549–64. [Google Scholar] [CrossRef]
  66. Vrontos, Ioannis, Petros Dellaportas, and Dimitris N. Politis. 2003. A full-factor multivariate garch model. The Econometrics Journal 6: 312–34. [Google Scholar] [CrossRef]
  67. Wang, Hansheng, Bo Li, and Chenlei Leng. 2009. Shrinkage tuning parameter selection with a diverging number of parameters. Journal of the Royal Statistical Society. Series B (Statistical Methodology) 71: 671–83. [Google Scholar] [CrossRef]
  68. Worthington, Andrew, and Helen Higgs. 2004. Transmission of equity returns and volatility in asian developed and emerging markets: A multivariate garch analysis. International Journal of Finance & Economics 9: 71–80. [Google Scholar]
  69. Wu, Tong Tong, and Kenneth Lange. 2008. Coordinate descent algorithms for lasso penalized regression. Annals of Applied Statistics 2: 224–44. [Google Scholar] [CrossRef]
  70. Yuan, Ming, and Yi Lin. 2006. Model selection and estimation in regression with grouped variables. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 68: 49–67. [Google Scholar] [CrossRef]
  71. Zhang, Cun-Hui. 2010. Nearly unbiased variable selection under minimax concave penalty. Annals of Statistics 38: 894–942. [Google Scholar] [CrossRef]
  72. Zhang, Yongli, and Yuhong Yang. 2015. Cross-validation for selecting a model selection procedure. Journal of Econometrics 187: 95–112. [Google Scholar] [CrossRef]
  73. Zhao, Peng, and Bin Yu. 2006. On model selection consistency of lasso. Journal of Machine Learning Research 7: 2541–67. [Google Scholar]
  74. Zhao, Peng, and Bin Yu. 2007. Stagewise lasso. Journal of Machine Learning Research 8: 2701–26. [Google Scholar]
  75. Zou, Hui. 2006. The adaptive lasso and its oracle properties. Journal of the American Statistical Association 101: 1418–29. [Google Scholar] [CrossRef]
Figure 1. Histograms of selected λ in Cases 1 (top) and 2 (bottom) via BIC (left) and AIC (right).
Figure 1. Histograms of selected λ in Cases 1 (top) and 2 (bottom) via BIC (left) and AIC (right).
Risks 12 00034 g001
Figure 2. Daily returns of 18 stocks from 4 January 2016 to 31 January 2018.
Figure 2. Daily returns of 18 stocks from 4 January 2016 to 31 January 2018.
Risks 12 00034 g002
Figure 3. The network structure of estimated matrices A (top) and B (bottom) under different λ s.
Figure 3. The network structure of estimated matrices A (top) and B (bottom) under different λ s.
Risks 12 00034 g003
Figure 4. Estimated volatilities by regularized BEKK(1,1) with λ = 2 (red lines), λ = 0.5 (blue lines), and univariate GARCH models (green lines).
Figure 4. Estimated volatilities by regularized BEKK(1,1) with λ = 2 (red lines), λ = 0.5 (blue lines), and univariate GARCH models (green lines).
Risks 12 00034 g004
Figure 5. Daily estimated conditional correlations when λ = 1 .
Figure 5. Daily estimated conditional correlations when λ = 1 .
Risks 12 00034 g005
Figure 6. Daily spillover index (top) and regularization paths of estimated off-diagonal parameters in BEKK regularization Model represented by different colors (bottom).
Figure 6. Daily spillover index (top) and regularization paths of estimated off-diagonal parameters in BEKK regularization Model represented by different colors (bottom).
Risks 12 00034 g006
Table 1. Parameter matrices in simulations.
Table 1. Parameter matrices in simulations.
Case 1Case 2
A 0.1268 0 0.0358 0.0618 0 0.1737 0 0 0 0 0.2621 0 0 0 0 0.4096 0.4040 0.0200 0 0 0 0.4434 0.0752 0 0 0 0.0406 0.0684 0 0 0 0.2226
B 0.4257 0 0 0 0 0.3008 0 0 0.0912 0 0.2868 0 0 0 0 0.0372 0.2453 0 0 0 0 0.2401 0 0 0.2398 0 0.2157 0 0 0 0 0.3996
C 0.0324 0 0 0 0 0.0681 0 0 0 0.0349 0.0469 0 0.0728 0 0 0.0739 0.0804 0 0 0 0 0.0473 0 0 0 0 0.0521 0 0.0200 0 0 0.0628
Table 2. Performance measures in two cases.
Table 2. Performance measures in two cases.
Case λ 6543210.640.320.160.08
τ 0 0.9880.9880.9840.9810.9740.9540.9340.8900.8410.756
(0.031)(0.031)(0.035)(0.037)(0.040)(0.046)(0.051)(0.058)(0.066)(0.083)
1 τ 0 C 0.7550.7550.7670.7860.8140.8210.8220.8340.8640.897
(0.028)(0.033)(0.035)(0.036)(0.024)(0.014)(0.013)(0.026)(0.034)(0.038)
ν 0.1510.1510.1500.1470.1420.1330.1320.1310.1280.125
(0.029)(0.029)(0.029)(0.029)(0.030)(0.032)(0.032)(0.031)(0.030)(0.029)
100 κ 3.1043.5332.3701.6110.7350.4610.3590.3250.2890.282
(6.463)(7.124)(5.471)(4.158)(1.051)(0.460)(0.356)(0.294)(0.242)(0.257)
τ 0 0.9850.9850.9830.9770.9600.9180.8890.8410.8050.767
(0.030)(0.030)(0.034)(0.034)(0.043)(0.048)(0.052)(0.052)(0.054)(0.065)
2 τ 0 C 0.7400.7420.7450.7520.7590.7660.7650.7670.7680.782
(0.029)(0.029)(0.028)(0.025)(0.019)(0.011)(0.014)(0.013)(0.014)(0.030)
ν 0.1510.1510.1510.1490.1450.1360.1360.1350.1320.126
(0.022)(0.022)(0.022)(0.021)(0.021)(0.023)(0.023)(0.022)(0.021)(0.022)
100 κ 2.0042.0401.8180.3300.3230.3690.4020.2780.2750.209
(8.525)(9.024)(7.710)(1.381)(1.936)(1.829)(2.388)(0.163)(0.163)(0.180)
Table 3. Full names of 18 tickers.
Table 3. Full names of 18 tickers.
TickerCompanyTickerCompany
GOOGAlphabet Inc., Mountain View, CA, USAGWWW.W. Grainger, Inc., Lake Forest, FL, USA
IBMInternational Business Machines Corporation, Armonk, NY, USAJPMJPMorgan Chase & Co., New York, NY, USA
MSFTMicrosoft Corporation, Redmond, WA, USANKENike Inc., Beaverton, OR, USA
ORCLOracle Corporation, Austin, TX, USATIFTiffany & Co., New York, NY, USA
IPGThe Interpublic Group of Companies, New York, NY, USAMASMasco Corporation, Livonia, MI, USA
MCDMcdonald’s Corp., Chicago, IL, USANFLXNetflix, Inc., Los Gatos, CA, USA
RLRalph Lauren Corporation, New York, NY, USATXTTextron Inc., Providence, RI, USA
LNCLincoln National Corporation, Radnor, PA, USAMROMarathon Oil Corporation, Houston, TX, USA
TGTTarget Corporation, Minneapolis, MN, USAWMTWalmart Inc., Bentonville, AR, USA
Table 4. Correlation and statistical features of 18 stocks for 2016–2017.
Table 4. Correlation and statistical features of 18 stocks for 2016–2017.
GOOGGWWIBMJPMMSFTNKEORCLTIFIPGMASMCDNFLXRLTXTLNCMROTGTWMT
Mean7.037 × 10 4 5.164 × 10 4 4.297 × 10 4 1.150 × 10 3 1.033 × 10 3 1.420 × 10 4 6.439 × 10 4 8.143 × 10 4 −7.648 × 10 5 1.029 × 10 3 8.891 × 10 4 1.304 × 10 3 1.610 × 10 4 7.103 × 10 4 1.148 × 10 3 1.217 × 10 3 5.523 × 10 5 1.116 × 10 3
Std. Dev.0.0130.0160.0110.0130.0120.0140.0120.0150.0150.0140.0090.0230.0210.0150.0190.0340.0170.012
Skewness−0.4400.1240.2910.3480.0400.5090.097−0.391−1.779−0.4560.0050.635−2.332−1.438−0.7790.640−1.1082.291
Kurtosis6.12016.39914.4468.9078.92611.68315.0226.27416.8429.4099.64017.19633.45517.97010.8307.38313.60024.910
GOOG 0.040.240.260.670.280.360.220.280.330.290.440.150.230.270.110.060.13
GWW 0.380.340.130.170.160.240.290.270.030.060.090.340.280.210.160.09
IBM 0.380.300.150.420.270.300.310.120.130.150.390.370.190.150.14
JPM 0.370.230.380.410.310.440.230.190.280.520.780.360.160.09
MSFT 0.260.450.290.290.380.350.360.190.320.330.200.080.12
NKE 0.200.230.300.300.180.180.330.230.300.120.270.16
ORCL 0.340.260.360.260.260.150.300.350.190.090.12
TIF 0.230.330.140.180.250.330.420.260.270.11
IPG 0.380.090.120.210.260.320.150.210.10
MAS 0.280.230.260.360.460.200.240.12
MCD 0.140.130.120.160.070.090.16
NFLX 0.100.220.190.070.020.09
RL 0.220.350.160.310.09
TXT 0.530.260.170.09
LNC 0.390.190.05
MRO 0.080.04
TGT 0.36
Table 5. Performance of portfolios using different covariance models.
Table 5. Performance of portfolios using different covariance models.
ModelMean (%)SD. (%)IRMean (%)SD. (%)IR
Equally weighted0.4300.7610.565
GMV μ * = 0.15 %
Regularized BEKK0.3900.6500.6010.3520.6520.540
Factor GARCH0.3260.8850.3680.2231.2000.186
DCC–GARCH0.2440.6650.3670.3020.6770.446
Constant covariance0.2610.6580.3970.1650.7770.212
μ * = 0.10 % μ * = 0.05 %
Regularized BEKK0.3820.5850.6540.4160.6330.657
Factor GARCH0.2101.1690.1800.2211.3210.167
DCC–GARCH0.2860.6310.4520.3160.6600.479
Constant covariance0.2190.6690.3270.2730.6680.409
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Yao, S.; Zou, H.; Xing, H. L1 Regularization for High-Dimensional Multivariate GARCH Models. Risks 2024, 12, 34. https://doi.org/10.3390/risks12020034

AMA Style

Yao S, Zou H, Xing H. L1 Regularization for High-Dimensional Multivariate GARCH Models. Risks. 2024; 12(2):34. https://doi.org/10.3390/risks12020034

Chicago/Turabian Style

Yao, Sijie, Hui Zou, and Haipeng Xing. 2024. "L1 Regularization for High-Dimensional Multivariate GARCH Models" Risks 12, no. 2: 34. https://doi.org/10.3390/risks12020034

APA Style

Yao, S., Zou, H., & Xing, H. (2024). L1 Regularization for High-Dimensional Multivariate GARCH Models. Risks, 12(2), 34. https://doi.org/10.3390/risks12020034

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop