De-Biased Graphical Lasso for High-Frequency Data

This paper develops a new statistical inference theory for the precision matrix of high-frequency data in a high-dimensional setting. The focus is not only on point estimation but also on interval estimation and hypothesis testing for entries of the precision matrix. To accomplish this purpose, we establish an abstract asymptotic theory for the weighted graphical Lasso and its de-biased version without specifying the form of the initial covariance estimator. We also extend the scope of the theory to the case that a known factor structure is present in the data. The developed theory is applied to the concrete situation where we can use the realized covariance matrix as the initial covariance estimator, and we obtain a feasible asymptotic distribution theory to construct (simultaneous) confidence intervals and (multiple) testing procedures for entries of the precision matrix.


Introduction
In high-frequency financial econometrics, covariance matrix estimation of asset returns has been extensively studied in the past two decades. High-frequency financial data are commonly modeled as a discretely observed semimartingale for which the quadratic covariation matrix plays the role of the covariance matrix, so their treatments are often different from those in a standard i.i.d. setting. In recent years, motivated by application to portfolio allocation and risk management in a large-scale asset universe, the high-dimensionality problem has attracted much attention in this area. Since the 2000s, great progress has been made in high-dimensional covariance estimation from i.i.d. data, so researchers are naturally led to apply the techniques developed therein to the context of high-frequency data. For example, Wang and Zou [1] have applied the entry-wise shrinkage methods considered in [2,3] to estimating the covariance matrix of high-frequency data which are asynchronously observed with noise. See also [4][5][6][7] for further developments in this approach. In the meantime, it is well-recognized that the factor structure is an important ingredient both theoretically and empirically for financial data. In the context of high-dimensional covariance estimation from high-frequency data, this perspective was first taken into account by Fan et al. [8] and subsequently built up by, among others, [9][10][11]. Other common methods used in i.i.d. settings have also been investigated in the literature of high-frequency financial econometrics. Hautsch et al. [12] and Morimoto and Nagata [13] formally apply eigenvalue regularization methods based on random matrix theory to high-frequency data. Lam et al. [14] accommodate the non-linear shrinkage estimator of [15] to a high-frequency data setting with the help of the spectral distribution theory for the realized covariance matrix developed in [16]. Brownlees et al. [17] employ the 1 -penalized Gaussian MLE, which is known as the graphical Lasso, to estimate the precision matrix (the inverse of the covariance matrix) of high-frequency data. The last approach is closely related to the methodology we will focus on. Despite the recent advances in this topic as above, most studies in this area focus only on point estimation of covariance and precision matrices, and there are little work about interval estimation and hypothesis testing for these objects. A few Notation 1. Throughout the paper, we assume d ≥ 2. stands for the transpose of a matrix. For a vector x ∈ R d , we write the i-th component of x as x i for i = 1, . . . , d. For two vectors x, y ∈ R d , the statement x ≤ y means x i ≤ y i for all i = 1, . . . , d. The identity matrix of size d is denoted by E d . We write R l×k for the set of all l × k matrices. S d denotes the set of all d × d symmetric matrices. S + d denotes the set of all d × d positive semidefinite matrices. S ++ d denotes the set of all d × d positive definite matrices. For a l × k matrix A, the (i, j)-th entry of A is denoted by A ij . Also, A i· and A ·j denote the i-th row vector and the j-th column vector, respectively (both are regarded as column vectors). We write vec(A) for the vectorization of A: vec(A) := (A 11 , . . . , A l1 , A 12 , . . . , A l2 , . . . , A 1k , . . . , A lk ) ∈ R lk .
For every w ∈ [1, ∞], we set Also, we write |||A||| w for the w -operator norm of A: It is well-known that |||A||| 1 = max 1≤j≤k ∑ l i=1 |A ij | and |||A||| ∞ = max 1≤i≤l ∑ k j=1 |A ij |. When l = k, diag(A) denotes the diagonal matrix with the same diagonal entries as A, and we set A − := A − diag(A). If A is symmetric, we denote by Λ max (A) and Λ min (A) the maximum and minimum eigenvalues of A, respectively. For two matrices A and B, A ⊗ B denotes their Kronecker product. When A and B has the same size, we write A • B for their Hadamard product.

Estimators and Abstract Results
Given a stochastic basis B = (Ω, F , (F t ) t∈[0,1] , P), we consider a d-dimensional semimartingale Y = (Y t ) t∈[0,1] defined there. We assume Σ Y = [Y, Y] 1 is a.s. invertible. In this paper, we consider the asymptotic theory such that the dimension d possibly depends on a parameter n ∈ N so that d = d n → ∞ as n → ∞. Consequently, both B and Y may also depend on n. However, following the custom of the literature, we omit the indices n from these objects and many other ones appearing below.
Our aim is to estimate the precision matrix Θ Y = Σ −1 Y when we have an estimatorΣ n for Σ Y ; as a corollary, we can also estimate Σ Y itself. We assume thatΣ n is an S + d -valued random variable all of whose diagonal entries are a.s. positive, but we do not specify the form ofΣ n because the asymptotic theory developed in this section depends on the property ofΣ n rather than their construction. This is convenient because construction of the estimator depends heavily on observation schemes for Y (with or without noise, synchronous or not, continuous or discontinuous and so on; see [33] for details). In Section 4 we illustrate how we apply the abstract theory developed in this and the next sections to a concrete situation.
We use the weighted graphical Lasso to estimate Θ Y (cf. [24]). The weighted graphical Lasso estimatorΘ λ with penalty parameter λ > 0 based onΣ n is defined bŷ whereV n := diag(Σ n ) 1 2 . According to the proof of [34] (Lemma 1), the optimization problem in Equation (1) has the unique solution when λ > 0 andΣ n is positive semidefinite and all the diagonal entries ofΣ n are positive, soΘ λ is a.s. defined in our setting. In the following we allow λ to be a random variable because we typically select λ in a data-driven way.
To analyze the theoretical property ofΘ λ , it is convenient to consider the graphical Lasso estimator K λ based on the correlation matrix estimatorR n :=V −1 nΣnV −1 n as follows: We can easily checkΘ λ =V −1 nKλV −1 n .

Remark 1.
As pointed out in Rothman et al. [35] and Janková and van de Geer [24], the graphical Lasso based on correlation matrices is theoretically preferable to that based on covariance matrices (so the weighted graphical Lasso is also preferable). In particular, we do not need to impose the so-called irrepresentability condition on Σ Y to derive the theoretical properties of our estimators, which contrasts with Brownlees et al. [17] (see Assumption 2 in [17]). See also Remark 2 for an additional discussion.
We introduce some notation related to the sparsity assumptions we will impose on Θ Y . Let A ∈ S d . . These quantities have a clear interpretation when the matrix A represents the edge structure of some graph so that A ij = 0 is equivalent to the presence of an edge between vertices i and j for i = j; in this case, d j (A) is the number of edges adjacent to vertex j (which is called the degree of vertex j) and s(A) is the total number of edges contained in the graph.
To derive our asymptotic results, we will impose the following structural assumptions on Σ Y .
[A2] states that the sparsity of Θ Y is controlled by the deterministic sequence s n ; we will require the growth rate of s n to be moderate.
[A3] is another sparsity assumption on Θ Y . It is weaker than [A2] in the sense that it always holds true with d n = s n under [A2]. However, we can generally take d n smaller than s n .

Consistency
. Let (λ n ) ∞ n=1 be a sequence of positive-valued random variables satisfying the following conditions: [B2] s n λ n → p 0 as n → ∞.
Then we have and as n → ∞ for any w ∈ [1, ∞].
Proposition 1 is essentially a rephrasing of Theorem 14.1.3 in [24]. To get a better convergence rate in Proposition 1, we should choose λ n as small as possible, where a lower bound of λ n is determined by the convergence rate ofΣ n in the ∞ -norm by [B1]. One typically derives this convergence rate by establishing entry-wise concentration inequalities forΣ n . Such inequalities have already been established for various covariance estimators used in high-frequency financial econometrics; see Theorems 1-2 and Lemma 3 in [36], Theorem 1 in [4], Theorem 1 in [37], and Theorem 2 in [17] for example. We however note thatΣ n should be positive semidefinite to ensure that the graphical Lasso has the unique solution. This property is not necessarily ensured by many covariance estimators used in this area. In this regard, we mention that pre-averaging and realized kernel estimators have versions to ensure this property, for which relevant bounds are available in [6] (Theorem 2) and [11] (Lemma 1). Brownlees et al. [17]). Compared with [17] (Theorem 1), Proposition 1 has two major theoretical improvements. First, Proposition 1 does not assume the so-called irrepresentability condition, which is imposed in [17] (Theorem 1) as Assumption 2. In fact, under the assumptions of Proposition 1, the unweighted graphical Lasso estimator adopted in [17] would have the convergence rate (s n + d)λ n (rather than s n λ n in our case) to estimate Θ Y in the norm |||·||| w , in view of [24] (Theorem 14.1.2). This means that we need to select λ n so that dλ n → p 0 as n → ∞ to ensure the consistency, which is much stronger than the corresponding assumption [B2] in our setting. Since λ n typically converges to 0 no faster than 1/ √ n with n being the sample size (cf. Section 4), the condition dλ n → p 0 excludes high-dimensional settings such that d n. Second, Proposition 1 gives consistency in the w -operator norm for all w ∈ [1, ∞], while [17] (Theorem 1) only shows consistency in the ∞ -norm. We shall remark that consistency in matrix operator norms is important in application. For example, the consistency ofΘ λ n in the 2 -operator norm implies that eigenvalues ofΘ λ n consistently estimate the corresponding eigenvalues of Θ Y . Also, the consistency in the ∞ -operator
On the other hand, unlike [17] (Theorem 1), we do not show selection consistency (i.e., P(S(Θ λ n ) = S(Θ Y )) → 1 as n → ∞) under our assumptions. Indeed, in the linear regression setting, it is known that an irrepresentability type condition is necessary for the selection consistency of the Lasso; see [22] (Section 7.5.3) for more details. This suggests that our estimator would not have oracle property in the sense of [38] in general. However, we shall remark that the asymptotic mixed normality of the de-biased estimator stated below can be used to construct an estimator with selection consistency via thresholding as in e.g., [39] (Section 3.1) and [40] (Section 4.2). See Corollary 2 and the subsequent discussion for details.

Asymptotic Mixed Normality
The following lemma states thatΘ λ n − Θ Y is asymptotically linear inΣ n − Σ Y after bias correction when Θ Y is sufficiently sparse.
Lemma 1 is an almost straightforward consequence of Equation (4) and the Karush-Kuhn-Tucker (KKT) conditions for the optimization problem in Equation (1). As a consequence of this lemma, we obtain the following result, which states that the "de-biased" weighted graphical Lasso estimator Θ λ n − Θ Y − Γ n inherits the asymptotic mixed normality ofΣ n .

Proposition 2.
Suppose that the assumptions of Lemma 1 are satisfied. For every n ∈ N, let a n > 0, C n be a d 2 × d 2 positive semidefinite random matrix and J n be an m × d 2 random matrix, where m = m n may depend on n. Assume a n |||J n ||| ∞ λ 2 n s n d n log(m + 1) → p 0 as n → ∞. Assume also that lim n→∞ sup y∈R m P a n J n vec Σ n − Σ Y ≤ y − P J n C 1/2 n ζ n ≤ y = 0 (5) and lim b↓0 lim sup as n → ∞, where J n := −J n (Θ Y ⊗ Θ Y ) and ζ n is a d 2 -dimensional standard Gaussian vector independent of F , which is defined on an extension of the probability space (Ω, F , P) if necessary. Then, In a standard i.i.d. setting such that Θ Y is non-random, we can usually verify Equation (5) by classical Lindeberg's central limit theorem when m = 1 and J n is non-random because a n J n vec Σ n − Σ Y can be written as a sum of independent random variables; see the proof of [25] (Theorem 1) for example. By contrast, Θ Y is generally random and not independent ofΣ n − Σ Y in our setting, so a n J n vec Σ n − Σ Y may not be a martingale even if vec Σ n − Σ Y is a martingale. In the case that d is fixed, we typically resolve this issue by proving stable convergence in law of vec Σ n − Σ Y ; see e.g., [26] for details. However, extension of this approach to the case that d → ∞ as n → ∞ is far from trivial as discussed at the beginning of [20] (Section 3). For this reason, [20] gives a result to directly establish Equation (5) type convergence in a high-dimensional setting. This result will be used in Section 4 to apply our abstract theory to a more concrete setting. Remark 3. Proposition 2 also allows m to diverge as n → ∞, which is necessary when we need to derive an asymptotic approximation of the joint distribution of vec Θ λ n − Γ n − Θ Y . Such an approximation can be used to make simultaneous inference for entries of Θ Y ; see [40] for example.

Factor Structure
In financial applications, it is often important to take account of the factor structure of asset prices. In fact, many empirical studies have documented the existence of common factors in financial markets (e.g., [41] (Section 6.5)). Also, factor models play a dominant role in asset pricing theory (cf. [21] (Chapter 9)). When common factors are present across asset returns, the precision matrix cannot be sparse because all pairs of the assets are partially correlated given other assets through the common factors. Therefore, in such a situation, it is common practice to impose a sparsity assumption on the precision matrix of the residual process which is obtained after removing the co-movements induced by the factors (see e.g., [17] (Section 4.2) and [42] (Section 4.2)). In this section, we accommodate the theory developed in Section 2 to such an application.
Specifically, suppose that we have an r-dimensional known factor process X, and consider the following continuous-time factor model: Here, β is a non-random d × r matrix and Z is a d-dimensional semimartingale such that [Z, X] 1 = 0. β and Z represent the factor loading matrix and residual process of the model, respectively. This model is widely used in high-frequency financial econometrics; see [8,9,11] in the context of high-dimensional covariance matrix estimation. One restriction of the model Equation (7) is that the factor loading β is assumed to be constant, but there is empirical evidence that β may be regarded as constant in short time intervals (one week or less); see [18,43] for instance. We are interested in estimating Σ Y based on observation data for X and Y while taking account of the factor structure given by Equation (7). Suppose that we have generic estimatorsΣ Y,n ,Σ X,n and Σ YX,n for Σ Y , Σ X and Σ YX , respectively.Σ Y,n ,Σ X,n andΣ YX,n are assumed to be random variables taking values in S d , S + r and R d×r , respectively. Now, by assumption we have Assume Σ X is a.s. invertible. Then β can be written as β = Σ YX Σ −1 X . Therefore, we can naturally estimate β byβ n :=Σ YX,nΣ −1 X,n , provided thatΣ X,n is invertible. In practical applications, the invertibility ofΣ X,n is usually not problematic because the number of factors r is sufficiently small compared to the sample size. However, it is theoretically convenient to (formally) defineβ n in the case thatΣ X,n is singular. For this reason, we take an S ++ d -valued random variableΣ † X,n such that Σ † X,n =Σ −1 X,n on the event whereΣ X,n is invertible, and redefineβ n asβ n :=Σ YX,nΣ † X,n . This does not affect the asymptotic properties of our estimators becauseΣ X,n is asymptotically invertible under our assumptions we will impose. Now, from Equation (8), Σ Z is estimated bŷ Σ Z,n :=Σ Y,n −β nΣX,nβ n .
SinceΣ Z,n might be a poor estimator for Σ Z because d can be extremely large in our setting, we apply the weighted graphical Lasso toΣ Z,n in order to estimate Σ Z . Specifically, we construct the weighted graphical Lasso estimatorΘ Z,λ based onΣ Z,n as follows: Then Σ Z is estimated by the inverse ofΘ Z,λ . Hence our final estimator for Σ Y is constructed aŝ Remark 5. Although we will impose the assumptions which guarantee that the optimization problem in Equation (10) asymptotically has the unique solution with probability 1, it may have no solution for a fixed n. Thus, we formally defineΘ Z,λ as an S ++ d -valued random variable such thatΘ Z,λ is defined by Equation (10) on the event where the optimization problem in Equation (10) has the unique solution.
We will impose the following structural assumptions on the model: as n → ∞.
[C1]-[C3] are natural structural assumptions on the model and standard in the literature; see e.g., Assumptions 2.1 and 3.3 in [44].
[C4]-[C5] are sparsity assumptions on the precision matrix of the residual process and necessary for our application of the (weighted) graphical Lasso. [C6] requires the factors to have non-negligible impact on almost all assets and is also standard in the context of covariance matrix estimation based on a factor model; see e.g., Assumption 3.5 in [44] and Assumption 6 in [8].
The following result establishes the consistency of the residual precision matrix estimatorΘ Z,λ .
, which are typically derived from entry-wise concentration inequalities forΣ X,n ,Σ YX,n andΣ Y,n . (b) [D3] ensures thatΣ Z,n is asymptotically positive semidefinite. This is necessary for guaranteeing that the optimization problem in Equation (10) asymptotically has the unique solution with probability 1.
From Proposition 3 we can also derive the convergence rates for the estimatorsΣ Z,λ n andΣ −1 Z,λ n in appropriate norms, which may be seen as counterparts of Theorems 1-2 in [8].
Next we present the high-dimensional asymptotic mixed normality of the de-biased version ofΘ Z,λ .

Proposition 6.
Suppose that the assumptions of Proposition 3 and [C5] are satisfied. For every n ∈ N, let a n > 0, C n be a d 2 × d 2 positive semidefinite random matrix and J n be an m × d 2 random matrix, where m = m n may depend on n. Assume a n |||J n ||| ∞ λ 2 n s n d n log(m + 1) → p 0 as n → ∞. Assume also that as n → ∞, where J Z,n := −J n (Θ Z ⊗ Θ Z ) and ζ n is a d 2 -dimensional standard Gaussian vector independent of F , which is defined on an extension of the probability space (Ω, F , P) if necessary. Then, where Γ Z,n := −(Θ Z,λ n −Θ Z,λ nΣ Z,nΘZ,λ n ). (12) is stated forΣ Z,n rather thanΣ Z,n . In other words, for deriving the asymptotic distribution, we do not need to take account of the effect of pluggingβ n into β, at least in the first order. This is thanks to Lemma A11.

Remark 8. It is worth mentioning that condition Equation
Although it is generally difficult to derive the asymptotic mixed normality of (the de-biased version of)Σ −1 Y,λ n , this is possible when d is sufficiently large. In fact, in such a situation, the entry-wise behavior of Σ −1 Y is dominated by Θ Z as described by the following lemma: Consequently, we obtain the following result.

Proposition 7.
Suppose that the assumptions of Proposition 6 and [C6] are satisfied. Suppose also a n |||J n ||| ∞ rd n log(m + 1)/d → 0 as n → ∞. Then we have

Application to Realized Covariance Matrix
In this section, we apply the abstract theory developed above to the simplest situation where the processes have no jumps and are observed at equidistant times without noise. Specifically, we consider the continuous-time factor model Equation (7) and assume that both Y and X are observed at equidistant time points h/n, h = 0, 1, . . . , n. In this case, Σ Y = [Y, Y] 1 is naturally estimated by the realized covariance matrix: Analogously, we defineΣ X,n := [X, X] n 1 andΣ YX,n := [Y, X] n 1 . In addition, we assume that Z and X are respectively d-dimensional and r-dimensional continuous Itô semimartingales given by To apply the convergence rate results to this setting, we impose the following assumptions: [E1] For all n, ν ∈ N, we have an event Ω n (ν) ∈ F and (F t )-progressively measurable processes 1] which take values in R d , R r , R d×d and R r×d , respectively, and they satisfy the following conditions: [E1] is a local boundedness assumption on the coefficient processes and typical in the literature: For example, [E1] is satisfied when µ, µ, σ and σ are all bounded by some locally bounded process independent of n. This latter condition is imposed in [8], among others. [E2] restricts the growth rates of d and r. It is indeed an adaptation of [D1] to the present setting.
. Let λ n be a sequence of positive-valued random variables such that λ −1 Moreover, if we additionally assume Remark 9 (Optimal convergence rate). From Theorem 1, the convergence rate ofΘ Z,λ n to Θ Z in the w -operator norm for any w ∈ [1, ∞] can be arbitrarily close to s n (log d)/n, which is similar to that in a standard i.i.d. setting (cf. Theorem 14.1.3 in [24]). On the other hand, in the Gaussian i.i.d. setting without factor structure, the minimax optimal rate for this problem is known to be d(Θ Z ) (log d)/n (see [45] (Theorem 1.1) and [46] (Theorem 5)), which can be faster than s n (log d)/n. In a standard i.i.d. setting, this rate can be attained by using a node-wise penalized regression (see e.g., [46] (Section 3.1)), so it would be interesting to study the convergence rate of such a method in our setting. We leave it to future research. In the meantime, such a method does not ensure the positive definiteness of the estimated precision matrix in general, so our estimator would be preferable for some practical applications such as portfolio allocation.
Next we derive the asymptotic mixed normality of the de-biased estimator in the present setting. As announced, we accomplish this purpose with the help of Malliavin calculus. In the following we will freely use standard concepts and notation from Malliavin calculus. We refer to [47,48] (Chapter 1) for detailed treatments of this subject.
We consider the Malliavin calculus with respect to W. For any real number p ≥ 1 and any integer k ≥ 1, D k,p denotes the stochastic Sobolev space of random variables which are k times differentiable in the Malliavin sense and the derivatives up to order k have finite moments of order p. If F ∈ D k,p , we denote by D k F the kth Malliavin derivative of F, which is a random variable taking values in Here, we identify the space (R d ) ⊗k with the set of all d -dimensional k-way arrays, i.e., real-valued functions on {1, . . . , d } k . Since D k F is a random function on [0, 1] k , we can consider the value D k F(t 1 , . . . , t k ) evaluated at (t 1 , . . . , t k ) ∈ [0, 1] k . We denote this value by D t 1 ,...,t k F. Moreover, since D t 1 ,...,t k F takes values in (R d ) ⊗k , we can consider the value D t 1 ,...,t k F(a 1 , . . . , a k ) evaluated at (a 1 , . . . , a k ) ∈ {1, . . . , d } k . This value is denoted by D (a 1 ,...,a k ) t 1 ,...,t k F. We remark that the variable D t 1 ,...,t k F is defined only a.e. on [0, 1] k × Ω with respect to the measure Leb k ×P, where Leb k denotes the Lebesgue measure on [0, 1] k . Therefore, if D t 1 ,...,t k F satisfies some property a.e. on [0, 1] k × Ω with respect to Leb k ×P, by convention we will always take a version of D t 1 ,...,t k F satisfying that property everywhere is defined in an analogous way. Finally, for any (R d ) ⊗k -valued random variable F and p ∈ (0, ∞], we set We also need to define some variables related to the "asymptotic" covariance matrices of the estimators. We define d 2 × d 2 random matrix C n by where c s := σ s σ s . Then we set V n : , we define C n (ν) similarly to C n with replacing σ by σ(ν). C n and V n play roles of the asymptotic covariance matrices ofΣ Z,n andΘ Z,λ n , respectively. We impose the following assumptions on the model. where We give a few remarks on these assumptions. First, [F1] imposes the (local) Malliavin differentiability on the coefficient processes of the residual process Z and the local boundedness on their Malliavin derivatives. Such an assumption is necessary for the application of the high-dimensional mixed normal limit theorem of [20] to our setting (see Lemma A16). Please note that we do not need to impose this type of assumption on the factor process X. We also remark that analogous assumptions are sometimes used in the literature of high-frequency financial econometrics even in low-dimensional settings; see e.g., [49,50]. Second, [F2] is clearly understood when we consider a Gaussian graphical model associated with Σ Z : The non-randomness of Q Z implies that the edge structure of this Gaussian graphical model is determined in a non-random manner (by conditioning, it is indeed sufficient that the edge structure is determined independently of the driving Wiener process W). Also, we remark that the condition d(Q Z ) = O(1) is equivalent to [C5] with d n = 1. It is seemingly possible to relax this condition so that it allows a diverging sequence d n as long as d n (log d) κ /n → 0 for an appropriate constant κ > 0. However, to determine the precise value of κ, we need to carefully revise the proof of Lemma A16 so that it allows the quantity inside sup n∈N in (A7) to diverge as n → ∞. To avoid such an additional complexity, we restrict our attention to the case of d n = 1. Third, the condition (log d) 13 /n → 0 in [F3] is used again for applying the high-dimensional CLT of [20]. Now we are ready to state our result.
. Let λ n be a sequence of positive-valued random variables such that λ −1 n (log d)/n → p 0, (s n + r)λ n → p 0 and λ 2 n s n n log d → p 0 as n → ∞. Then we have and sup as n → ∞.
Remark 10. λ n is typically chosen of order close to log d/n as possible, so λ 2 n s n n log d → p 0 is almost equivalent to s n (log d) 3 2 / √ n → 0. This is stronger than the condition s n (log d)/ √ n → 0 which is used to derive the asymptotic normality of the de-biased weighted graphical Lasso estimator in [24] (Theorem 14.1.6) (note that we assume d(Θ Z ) = O p (1)). This is because Theorem 2 derives the approximations of the joint distributions of the de-biased estimator and its Studentization, while [24] (Theorem 14.1.6) focuses only on approximation of their marginal distributions.
Theorem 2 is statistically infeasible in the sense that V n is unobservable. Thus, we need to estimate it from the data. Since Θ Z is naturally estimated byΘ Z,λ n , we construct an estimator for C n . Define the d 2 -dimensional random vectorsχ h bŷ whereẐ h/n := Y h/n −βX h/n . Then we set Lemma 3. Suppose that the assumptions of Theorem 2 are satisfied. Suppose also r 2 (log d)/n = O(1) as n → ∞ and that there is a constant γ ∈ (0, 1 2 Let us setV n := (Θ Z,λ n ⊗Θ Z,λ n )Ĉ n (Θ Z,λ n ⊗Θ Z,λ n ) andŜ n := diag(V n ) 1/2 .

Corollary 1.
Under the assumptions of Lemma 3, we have the following results: (a) Assume s n λ n log d → p 0 and r(log d) 7/2 / √ n + n −γ log d → 0 as n → ∞. Then, Corollary 1(a) particularly implies that whereŝ ij n :=Ŝ and Φ is the standard normal distribution function. This result can be used to construct entry-wise confidence intervals for Θ Z . Meanwhile, combining Corollary 1(b) with [20] (Proposition 3.2), we can estimate the quantiles of max k∈K (V 1/2 n ζ n ) k and max k∈K (S −1 n V 1/2 n ζ n ) k for a given set of indices K ⊂ {1, . . . , d 2 } by simulation. Such a result can be used to construct simultaneous confidence intervals and control the family-wise error rate in multiple testing for entries of Θ Z ; see Sections 2.3-2.4 of [51] for details.
As announced, another application of our result is to construct an estimator with selection consistency via thresholding. This is carried out by using the following result: Then, under the assumptions of Corollary 1(a), we have Please note that the last condition is satisfied if min (i,j)∈S(Θ Z ) |Θ ij Z | is bounded away from zero because n/ log d → ∞ under our assumptions. Taking the sequence α n so that α = 0 in Corollary 2, we can asymptotically recover the support of Θ Z . In this case, if we define Θ Z,λ n = ( Θ ij Z,λ n ) 1≤i,j≤d by otherwise, Θ Z,λ n will be oracle in the sense of [38]. However, we note that the estimator Θ Z,λ n would not be continuous in data, so it would not satisfy the third desirable property in [38] (p. 1349). To construct an oracle estimator for Θ Z which is continuous in data, we will need to consider a non-concave penalized estimator as in [52]. This is left to future research.

Implementation
To implement the proposed estimation procedure, we need to solve the optimization problem in Equation (10). Among many existing algorithms to solve this problem, we employ the GLASSOFAST algorithm of [53], which is an improved implementation of the popular GLASSO algorithm of [54] and implemented in the R package glassoFast.
The remaining problem is how to select the penalty parameter λ. Following [17,55], we select it by minimizing the following formally defined Bayesian information criterion (BIC): The minimization is carried out by grid search. The grid {λ 1 , . . . , λ m } is constructed analogously to the R package glmnet (see Section 2.5 of [56] for details): First, as the maximum value λ max of the grid, we take the smallest value for which all the off-diagonal entries ofΘ Z,λ max are zero: In our case, λ max is set to the maximum modulus of the off-diagonal entries ofΣ Z,n (cf. [57] (Corollary 1)). Next, we take a constant ε > 0 and set λ min := ελ max as the minimum value of the grid. Finally, we construct the values λ 1 , . . . , λ m increasing from λ min to λ max on the log scale: We use ε = (log d)/n and m = 10 in our experiments (the computation procedure of the weighted graphical Lasso described here is implemented in the R package yuima as the function cce.factor with the option regularize="glasso" since version 1.9.2).

Simulation Design
We basically follow the setting of [8]. We simulate the model (7) with the following specification: For the factor process X, we set r = 3 and , i.e., the gamma distribution with shape 2κ j θ j /η 2 j and rate 2κ j /η 2 j . The entries of the loading β are independently drawn as . Finally, as the residual process Z, we take a d-dimensional Wiener process with covariance matrix Q. We consider the following two designs for Q: respectively the degree and adjacent matrices of the random graph G. Formally, given a weight vector w ∈ R d with w ≥ 0, A is defined as a d × d symmetric random matrix such that all the diagonal entries of A are equal to 0 and the off-diagonal upper triangular entries are generated by independent Bernoulli variables so that P(A ij = 1) = 1 − P(A ij = 0) = w i w j / ∑ d k=1 w k for i < j. Then, D is defined as the diagonal matrix such that the j-th diagonal entry of D is given by Design 1 is the same one as in [8]. Design 2 is motivated by the recent work of Barigozzi et al. [42], which reports that several characteristics of the residual precision matrix of the S&P 500 assets exhibit power-law behaviors and they are well-described by the power-law partial correlation network model proposed in [42]; the specification in Design 2 is the same one as in the simulation study of [42].

Results
We begin by assessing the estimation accuracy of the proposed estimator in various norms. For comparison, we consider the following 5 different methods to estimate Σ Y :
In addition, for Design 1, we also consider the estimator proposed in [8]: Assuming that we know which entries of Σ Z are zero, we estimate Σ Y byβ nΣX,nβ n + (Σ ij Z,n 1 {Σ ij Z =0} ) 1≤i,j≤d . We label this method f-thr. Since the estimates of RC and f-thr are not always regular, we use their Moore-Penrose generalized inverses to estimate Σ −1 Y when they are singular. Please note that the methods glasso and f-glasso correspond to those proposed in [17], while wglasso and f-wglasso are those proposed in this paper. We report the simulation results in Tables 1 and 2.
We first focus on the accuracy of estimating the precision matrix Σ −1 Y . The tables reveal the excellent performance of graphical Lasso-based methods. In particular, they outperform f-thr in Design 1 except for the case n = 780 even when we ignore the factor structure of the model. Nevertheless, the tables also show apparent benefit to take the factor structure into account in constructing the graphical Lasso type estimators. When we compare the weighted graphical Lasso estimators with the unweighted versions, the weighted ones tend to outperform the unweighted ones as n increases, especially when the factor structure is taken into account. This is more pronounced in Design 2. It is also worth mentioning that the estimation errors for Σ −1 Y in the method RC are greater at n = 390,780 than those at n = 78,130,195. This is presumably due to a "resonance" effect between the sample size n and dimension d coming from the use of the Moore-Penrose generalized inverse, which is well-known in multivariate analysis (see e.g., [58]): The estimation error for the precision matrix by the generalized inverse of the sample covariance matrix drastically increases as n approaches d. Theoretically, this occurs because the smallest non-zero eigenvalue of the sample covariance matrix tends to 0 as n approaches d.
Turning to the estimation accuracy for Σ Y in terms of the ∞ -norm, we find little advantage to use the graphical Lasso type methods over the realized covariance matrix: f-glasso and f-wglasso tend to outperform RC at small values of n, but the differences of the performance become less clear as n increases. From a theoretical point of view, this is not surprising because the realized covariance matrix is a consistent estimator for Σ Y in the ∞ -norm with the convergence rate log d/n; this can be seen from e.g., Lemma A15. Meanwhile, in Design 1 f-thr performs the best in terms of estimating Σ Y at all the values of n.
Next we assess the accuracy of the mixed normal approximation for the de-biased estimator. For this purpose, we construct entry-wise confidence intervals for Θ Z based on Equation (21) (with taking the factor structure into account) and evaluate their nominal coverages. Table 3 reports these coverages averaged over the sets We see from the table that the asymptotic approximation perfectly works to construct confidence intervals for zero entries of Θ Z . By contrast, confidence intervals for non-zero entries of Θ Z tend to be over-coverages, especially in Design 1. However, these coverage distortions tend to be moderate at larger values of n, which suggests that the normal approximation starts to work for relatively large sample sizes. Note. RC: realized covariance matrix; glasso: graphical Lasso; wglasso: weighted graphical Lasso; f-glasso: graphical Lasso with taking the factor structure into account; f-wglasso: weighted graphical Lasso with taking the factor structure into account; f-thr: location-based thresholding with taking the factor structure into account (the method of [8]). The results are based on 10,000 Monte Carlo iterations.

Empirical Application
To illustrate the applicability of the proposed method to real data analysis, we conduct a simple empirical study using high-frequency financial data. We take 1 March 2018 as the observation interval [0, 1] and the log-price processes of the component stocks of the S&P 500 index as the process Y. In addition, as is often performed in the literature, we regard the SPDR S&P 500 ETF (SPY) as the observable factor process X. We use 5-minute returns to compute the estimators presented in Section 4. The dataset is provided by Bloomberg. Please note that our setting implies d = 504 and n = 77, yielding a high-dimensional setting considered in this paper (note that our dataset does not contain observations at the market opening).
The selection procedure presented in Section 5.1 suggests λ n ≈ 0.272. Then we estimate the support S(Θ Z ) of Θ Z by the estimatorŜ n (Θ Z ) with α n = 0.05 from Corollary 2. Figure 1 shows the partial correlation network induced byŜ n (Θ Z ), drawn by the R package igraph. Specifically, it depicts the undirected graph with vertices consisting of the S&P 500 component stocks and an edge set given byŜ n (Θ Z ). To illuminate the relationship between the network and sector structures, we color the vertices according to their Global Industry Classification Standard (GICS) sectors. We find there are strong interconnections in several sectors such as Consumer Staples, Energy, Real Estate and Utilities. The figure also suggests that the network would have some characteristics that are commonly observed in scale-free networks: It consists of a giant component with several hubs and a few small components. This is consistent with an observation made in [42]. Indeed, in [42] the authors have proposed a model for Θ Z that induces a scale-free partial correlation network. According to their model, the decay of the largest eigenvalues of Θ Z also exhibits power-law behavior. More precisely, letting Λ 1 ≥ · · · ≥ Λ d be the ordered eigenvalues of Θ Z , we have Λ i i −α with some α > 0 for moderate i and large d. Then, it is interesting to check whether this is the case in our dataset. Figure 2 shows the log-log size-rank plot for the 50 largest eigenvalues ofΘ Z,λ n . We see that except for the three largest eigenvalues, they clearly display power-law behavior.

Conclusions
In this paper, we have developed a generic asymptotic theory to estimate the high-dimensional precision matrix of high-frequency data using the weighted graphical Lasso. We have shown that the consistency of the weighted graphical Lasso estimator in matrix operator norms follows from the consistency of the initial estimator in the ∞ -norm, while the asymptotic mixed normality of its de-biased version follows from that of the initial estimator, where the asymptotic mixed normality has been formulated appropriately for the high-dimensional setting considered here. Our theory also encompasses a situation where a known factor structure is present in the data. In such a situation, we have applied the weighted graphical Lasso to the residual process obtained after removing the effect of factors.
We have applied the developed theory to the concrete situation where we can use the realized covariance matrix as the initial covariance estimator. We have derived the desirable asymptotic mixed normality of the realized covariance matrix by an application of the recent high-dimensional central limit theorem obtained in [20], where Malliavin calculus resolves the main theoretical difficulties caused by the high-dimensionality. Consequently, we have obtained a feasible asymptotic distribution theory to conduct inference for entries of the precision matrix. A Monte Carlo study has shown the good finite sample performance of our asymptotic theory.
A natural direction for future work is to apply the developed theory to a more complex situation where the process is asynchronously observed with noise and/or jumps. To accomplish this purpose, we need to establish the high-dimensional asymptotic mixed normality of relevant covariance estimators.

Acknowledgments:
The author thanks two anonymous referees for their constructive comments that substantially improved the original version of this paper.

Conflicts of Interest:
The author declares no conflict of interest.

Abbreviations
The following abbreviations are used in this manuscript: independent and identically distributed MLE maximum likelihood estimation KKT Karush-Kuhn-Tucker CLT central limit theorem

Appendix A. Matrix Inequalities
This appendix collects some elementary (but less trivial) inequalities for matrices used in the proofs of the main results.

Proof. See Theorem 14 in [59] (Chapter 11).
Lemma A2. Let A ∈ S + d and B ∈ R d×r . Then Proof. Let x be an eigenvector associated with Λ max (A) such that x 2 = 1. Then, by Theorem 4 in [59] Therefore, we obtain the first inequality. The second one can be shown analogously.

Lemma A4. For any
Proof. This is a straightforward consequence of the Schwarz inequality. Lemma A5. Let A, B ∈ R r×r . If A is invertible and |||A −1 (B − A)||| w < 1 for some w ∈ [1, ∞], B is invertible and Proof. See pages 381-382 of [60].
Lemma A6. Let A ∈ S r and B, C ∈ R d×r . Then Proof. This result has essentially been shown in [61] (Lemma A.7). Since A is symmetric, there is an orthogonal matrix U ∈ R r×r such that Λ := U AU is a diagonal matrix. Now, for any i, j ∈ {1, . . . , d}, This yields the desired result.
Lemma A7. Let A, B, C ∈ R d×d . Then, for any i, j = 1, . . . , d, Proof. This is a straightforward consequence of the Schwarz inequality.

Appendix B. Proofs for Section 2
Appendix B.1. Proof of Proposition 1 The following result has essentially been proven in [24] and gives an estimate for the "deterministic part" of oracle inequalities for graphical Lasso type estimators. Proposition A1. Let A 0 , A ∈ S d and assume A − A 0 ∞ ≤ λ 0 for some λ 0 > 0. Assume also that there are numbers L > 1 and λ > 0 such that it holds that We first prove Proposition A1 under an additional assumption: Lemma A8. Proposition A1 holds true if we additionally have B − B 0 2 ≤ 1/(2L).
Proof of Proposition 1. Thanks to Lemma 7.2 of [45], it suffices to consider the case w = 1. For any L, n ∈ N, we define the set Ω n,L ⊂ Ω by where c L := 8L 2 . Then we have lim L→∞ lim sup n→∞ P(Ω c n,L ) = 0.

Appendix B.3. Proof of Proposition 2
In the light of Lemma 3.1 of [20], it is enough to prove as n → ∞. Please note that vec(ABC) = (C ⊗ A) vec(B) for any d × d matrices A, B, C (cf. Theorem 2 in [59] (Chapter 2)). Thus, we obtain the desired result once we prove as n → ∞. This follows from Lemma 1 and the assumptions of the proposition.

Appendix C.2. Proof of Proposition 4
We first establish some asymptotic properties ofβ n which are necessary for the subsequent proofs.
Lemma A12. Under the assumptions of Proposition 3, we have the following results: (b) By Lemma A9, on the event Ω n , we haveβ n =Σ YX,nΣ −1 X,n . Hence, for every i = 1, . . . , d, , we also obtain max 1≤i≤d β i· n ∞ = O p ( √ r). (c) This follows from (a)-(b) and rλ n = o p (1). (d) This is a direct consequence of (b). (e) This follows from (b) and the Schwarz inequality.
Proof of Proposition 4. Since A ∞ ≤ |||A||| 2 for any matrix A, in view of Proposition 3 it suffices to prove λ −1 Therefore, the desired result follows from Lemmas A9, A12(b) and assumption.
Appendix C.4. Proof of Proposition 6 We apply Proposition 2 toΣ Z,n . From the arguments in the proof of Proposition 3, it remains to check condition Equation (5). More precisely, we need to prove lim n→∞ sup y∈R m P a n J Z,n vec Σ Z,n − Σ Z ≤ y − P J Z,n C 1/2 n ζ n ≤ y = 0.
Thanks to Lemma 3.1 in [20] and Equation (12), this claim follows once we prove log(m + 1) a n J Z,n vec(Σ Z,n − Σ Z ) − a n J Z,n vec(Σ Z,n − Σ Z ) ∞ → 0. Since we have a n J Z,n vec(Σ Z,n − Σ Z ) − a n J Z,n vec( by Lemma A4, the desired result follows from Lemma A10 and assumption.
Appendix C.5. Proof of Lemma 2 and Proposition 7 We use the same notation as in Section C.3. By Sherman-Morisson-Woodbury formula we have where the second inequality follows from Lemma A6. Since |||Θ Z ||| 2 Remark A1. Similar estimates to Lemma A14 have already been obtained in the literature (see e.g., [36] (Lemma 3), [4] (Lemma 10) and [8] (Lemma A.1)). Since we use slightly different assumptions from the existing ones, we give its proof in Appendix E for the shake of completeness.
To apply Lemma A16 to the present setting, we prove some auxiliary results.
Proof. This follows from a straightforward computation.