Next Article in Journal
Semiparametric Survival Analysis of 30-Day Hospital Readmissions with Bayesian Additive Regression Kernel Model
Next Article in Special Issue
On the Sampling Size for Inverse Sampling
Previous Article in Journal
Quantile Regression Approach for Analyzing Similarity of Gene Expressions under Multiple Biological Conditions
Previous Article in Special Issue
Omnibus Tests for Multiple Binomial Proportions via Doubly Sampled Framework with Under-Reported Data
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A Log-Det Heuristics for Covariance Matrix Estimation: The Analytic Setup

Dipartimento di Scienze Statistiche, Università di Bologna, 40126 Bologna, Italy
*
Author to whom correspondence should be addressed.
These authors contributed equally to this work.
Stats 2022, 5(3), 606-616; https://doi.org/10.3390/stats5030037
Submission received: 4 June 2022 / Revised: 1 July 2022 / Accepted: 2 July 2022 / Published: 5 July 2022
(This article belongs to the Special Issue Multivariate Statistics and Applications)

Abstract

:
This paper studies a new nonconvex optimization problem aimed at recovering high-dimensional covariance matrices with a low rank plus sparse structure. The objective is composed of a smooth nonconvex loss and a nonsmooth composite penalty. A number of structural analytic properties of the new heuristics are presented and proven, thus providing the necessary framework for further investigating the statistical applications. In particular, the first and the second derivative of the smooth loss are obtained, its local convexity range is derived, and the Lipschitzianity of its gradient is shown. This opens the path to solve the described problem via a proximal gradient algorithm.

1. Introduction

The estimation of large covariance or precision matrices is a relevant challenge nowadays, due to the increasing availability of datasets composed of a large number of variables p compared to the sample size n in many fields. The urgency of this topic is testified by several recent books [1,2,3], and comprehensive reviews [4,5,6]. In this paper, we assume for the p × p covariance matrix Σ * a low rank plus sparse decomposition, that is
Σ * = L * + S * = B B + S * ,
where L * = B B = U L Λ L U L , U L is a p × r matrix such that U L U L = I r Λ L is a r × r diagonal matrix, and S * is element-wise sparse, i.e. it contains only s p ( p 1 ) 2 off-diagonal non-zero elements. Since [7] proposed their approximate factor model, structure (1) has become the reference model for many high-dimensional covariance matrix estimators, like POET [8].
The recovery of structure (1) is a statistical problem of primary relevance. Ref. [7] proposed to consistently estimate L * (as p ) by means of principal component analysis (PCA, see [9]), assuming that the eigenvalues of L * diverge with the dimension p while the eigenvalues of S * remain bounded. [8] proposes to estimate L * by the top r principal components of the sample covariance matrix Σ n (as p ) and to estimate S * by thresholding their orthogonal complement. In [10], L * and S * are recovered by nuclear norm plus l 1 penalization, that is by computing
L ^ , S ^ = arg min L 0 , S 0 L ( L , S ) + P ( L , S ) ,
where L ( L , S ) is a smooth loss function, P ( L , S ) is a nonsmooth penalty function, L 0 denotes positive semidefiniteness for L and S 0 denotes positive definiteness for S . In particular, denoting by λ i ( M ) , i = 1 , , p , the eigenvalues of a p × p matrix M sorted in descending order, L ( L , S ) = 1 2 Σ n ( L + S ) F 2 , P ( L , S ) = ψ L * + ρ S 1 , where L * = i = 1 p λ i ( L ) (the nuclear norm of L ), S 1 = i = 1 p j = i p | S i j | (the l 1 norm of S ), and ψ and ρ are non-negative threshold parameters.
The nuclear norm was first proposed in [11] as an alternative to PCA. Ref. [12] furnishes a proof that ψ L * + ρ S 1 is the tightest convex relaxation of the original non-convex penalty ψ rk ( L ) + ρ S 0 . Ref. [13] proves that the l 1 norm minimization provides the sparsest solution to most large underdetermined linear systems, while [14] proves that the nuclear norm minimization provides guaranteed rank minimization under a set of linear equality constraints. Ref. [15] shows that l 1 norm minimization selects the best linear model in a wide range of situations. The nuclear norm has instead been used to solve large matrix completion problems, like in [16,17,18], and [19]. Nuclear norm plus l 1 norm minimization was first exploited in [20] to provide a robust version of PCA under grossly corrupted or missing data.
The pair of estimators (2) derived in [10] is named ALCE (ALgebraic Covariance Estimator). Although ALCE has many desirable statistical properties, there is room to further improve it by replacing 1 2 Σ n ( L + S ) F 2 by a different loss. The Frobenius loss optimizes in fact the entry by entry performance of Σ ^ , while a loss able to explicitly control the spectrum estimation quality may be desirable. In this paper, we consider the loss
L ( L , S ) = 1 2 log det ( I p + Δ n Δ n ) ,
where Δ n = Σ Σ n , and Σ = L + S . Heuristics (3) is controlled by the individual singular values of Δ n , because
log det ( I p + Δ n Δ n ) = log i = 1 p ( λ i ( I p + Δ n Δ n ) ) i = 1 p ( 1 + λ i ( Δ n ) 2 ) = p + i = 1 p λ i ( Δ n ) 2 ,
and, therefore, it is better suited for the estimation of the underlying spectrum.
To the best of our knowledge, the mathematical properties of (3) have not been extensively studied. Analogously to the univariate context ( p = 1 ), (3) is not a convex function. According to ongoing works like [21], nonconvex problems may be approached either by searching for approximate solutions instead of global solutions, or by exploiting the geometric structure of the objective function. Furthermore, in this case, the idea of restricting the analysis to the convexity region of the objective, a region that may be indefinitely extended (see the concept of Extendable Local Strong Convexity in [22]), is the key to apply, for instance, existing proximal gradient algorithms for convex functions (see [23]). For this reason, in this paper we calculate the first and second derivatives of (3), we derive its range of local convexity, and the Lipschitzianity of (3) and of its gradient. This opens the path to using the usual proximal gradient algorithms (see [23]) to solve problem (2) with L ( L , S ) as in (3).

2. Analytic Setup

We consider the objective function
ϕ ( L , S ) = L ( L , S ) + P ( L , S ) ,
where L ( L , S ) = 1 2 log det ( I p + Δ n Δ n ) is the smooth part of ϕ ( L , S ) and P ( L , S ) = ψ L * + ρ S 1 is the non-smooth (but convex) part of ϕ ( L , S ) . First, we calculate the derivative of the smooth component L ( L , S ) wrt L and S , which is
δ 1 2 log det ( I p + Δ n Δ n ) δ L = δ 1 2 log det ( I p + Δ n Δ n ) δ S = ( I p + Δ n Δ n ) 1 Δ n .
Proof. 
Let us consider two generic p × p matrices L and S , their sum Σ = L + S , and the matrix Δ n = Σ Σ n . Let us define the matrix function φ ( Σ ) = I p + Δ n Δ n and the function ϕ ( Σ ) = log det φ ( Σ ) . We denote by e i the i-th canonical basis vector, by e i l = δ i l its l-th element, and by σ i j the i j entry of Σ . Then, following [24], for each i , j = 1 , , p , we can write
Σ σ i j = m i j = e i e j σ i j log det φ ( Σ ) = Tr φ 1 ( Σ ) φ σ i j φ σ i j = Δ n Δ n σ i j + Δ n σ i j Δ n Δ n σ i j = m i j .
Therefore,
σ i j log det φ ( Σ ) = Tr φ 1 ( Σ ) Δ n m j i + m i j Δ n = Tr φ 1 ( Σ ) Δ n m j i + Tr φ 1 ( Σ ) m i j Δ n .
Since for A , B , C conformable matrices
Tr ( A B C ) = Tr ( ( A B C ) ) = Tr ( C B A ) = Tr ( A B C ) ,
we get
σ i j log det φ ( Σ ) = 2 Tr φ 1 ( Σ ) Δ n m j i = 2 ν φ 1 ( Σ ) Δ n m j i ν ν = 2 ν , ρ φ 1 ( Σ ) ν ρ Δ n m j i ρ ν = 2 ν , ρ , σ φ 1 ( Σ ) ν ρ Δ n , ρ σ m j i σ ν .
Finally, considering that
m j i σ ν = ( e j e i ) σ ν = δ j σ δ i ν ,
we get
σ i j log det φ ( Σ ) = 2 ν , ρ , σ φ 1 ( Σ ) ν ρ Δ n , ρ σ δ j σ δ i ν = 2 ρ φ 1 ( Σ ) i ρ Δ n , ρ j = 2 φ 1 ( Σ ) Δ n i j .
To sum up,
Σ 1 2 log det φ ( Σ ) = L 1 2 log det φ ( Σ ) = S 1 2 log det φ ( Σ ) = φ 1 ( Σ ) Δ n .
   □
In the following, we explicit the second derivative of L ( L , S ) = 1 2 log det φ ( Σ ) , with φ ( Σ ) = ( I p + Δ n Δ n ) and Σ = L + S :
2 σ i j σ h k 1 2 log det φ ( Σ ) = 1 2 Hess log det φ ( Σ ) i j h k = δ j k φ 1 ( Σ ) i h μ , σ φ 1 ( Σ ) h μ Δ μ j φ 1 ( Σ ) i σ Δ σ k φ 1 ( Σ ) i h μ , λ φ 1 ( Σ ) λ μ Δ μ j Δ λ k .
More, if Σ = Σ n , we get
1 2 Hess log det φ ( Σ ) i j h k = δ j k δ i h = I p I p i j h k ,
that is,
1 2 Hess log det φ ( Σ ) = I p I p .
Proof. 
From [24], we write
2 σ i j σ h k 1 2 log det φ ( Σ ) = 1 2 Tr φ 1 ( Σ ) 2 φ ( Σ ) σ i j σ h k Tr φ 1 ( Σ ) φ ( Σ ) σ i j φ 1 ( Σ ) φ ( Σ ) σ h k ,
and we recall that
2 φ ( Σ ) σ i j σ h k = σ h k σ j i + σ i j σ k h .
Then, we can calculate
Tr φ 1 ( Σ ) 2 φ ( Σ ) σ i j σ h k = Tr φ 1 ( Σ ) σ h k σ j i + φ 1 ( Σ ) σ i j σ k h = Tr φ 1 ( Σ ) σ h k σ j i + Tr φ 1 ( Σ ) σ i j σ h k = 2 Tr φ 1 ( Σ ) σ h k σ j i = 2 ν φ 1 ( Σ ) σ h k σ j i = 2 δ j k ( φ 1 ( Σ ) ) i h .
The second summand
Tr φ 1 ( Σ ) φ ( Σ ) σ i j φ 1 ( Σ ) φ ( Σ ) σ h k
can be derived from (12) as
2 Tr φ 1 ( Σ ) Δ σ j i φ 1 ( Σ ) Δ σ k h + 2 Tr φ 1 ( Σ ) Δ σ j i φ 1 ( Σ ) σ h k Δ .
Equation (9) is consequently proved.    □

3. Local Convexity

The aim of this section is to determine the range of convexity for L ( L , S ) = 1 2 log det ( I p + Δ n Δ n ) , Δ n = Σ Σ n , wrt to the semidefinite positive matrix Δ n Δ n . In the univariate context, the function 1 2 ln det ( 1 + x 2 ) is convex if and only if | x | < 1 2 . In the multivariate context, it is therefore reasonable to suppose that a similar condition on Δ n Δ n ensures local convexity. A proof can be given by showing the positive definiteness of the Hessian of L ( L , S ) for some range of Δ n Δ n . In other words, we need to show that there exists a positive δ such that, whenever Δ n Δ n < δ , the function 1 2 log det ( I p + Δ n Δ n ) is convex.
Lemma 1.
Given 0 < μ 1 3 p , we have that the function
log det I p + A A *
is convex on the set C μ = { A | A is a real p × p matrix , A 2 μ } where A 2 denotes the spectral norm of A .
Proof. 
We proceed using the criterion of convexity estimating the second derivative with respect to t of
ϕ ( t ) = log det I p + ( t A + ( 1 t ) B ) ( t A + ( 1 t ) B ) * .
Let us recall that
d d t log det G ( t ) = Tr ( G ( t ) 1 G ( t ) ) ,
where G ( t ) is a differentiable square matrix-valued function and (15) holds for those values of t for which G ( t ) is invertible.
Furthermore, we have as well
d d t Tr ( A ( t ) ) = Tr ( d d t A ( t ) ) , d d t A 1 = A 1 d d t A A 1 ,
for any differentiable square matrix-valued function A ( t ) . (See [25] or [26] e.g., how to prove these identities).
Calling G ( t ) = I p + ( t A + ( 1 t ) B ) ( t A + ( 1 t ) B ) * , we see that G ( t ) = 2 t Λ Λ * + R and G ( t ) = 2 Λ Λ * where
Λ = A B , R = B Λ * + Λ B *
and G ( t ) = 1 3 p μ I p + t 2 Λ Λ * + t R + B B * .
Thus, applying (15) and (16) we get
ϕ ( t ) = Tr ( G ( t ) 1 G ( t ) )
and
ϕ ( t ) = Tr ( G ( t ) 1 G ( t ) ) = Tr ( G 1 G G 1 G G 1 G )
Convexity will follow once we have proven that (17) is non-negative for every t [ 0 , 1 ] and every A and B in C μ .
Due to the circularity of the trace function we also have
ϕ ( t ) = Tr G 1 / 2 G G 1 / 2 G 1 / 2 G G 1 G G 1 / 2 ,
that is
ϕ ( t ) = Tr G 1 / 2 G G 1 / 2 Tr G 1 / 2 G G 1 / 2 G 1 / 2 G G 1 / 2 .
This can be written as
ϕ ( t ) = 2 Tr G 1 / 2 Λ Λ * G 1 / 2 Tr G 1 / 2 ( 2 t Λ Λ * + R ) G 1 / 2 G 1 / 2 ( 2 t Λ Λ * + R ) G 1 / 2 .
We recall that G ( t ) is self-adjoint so that denoting by H the matrix G 1 / 2 Λ and K the matrix G 1 / 2 G G 1 / 2 we get that (19) can be written as
ϕ ( t ) = 2 Tr H H * Tr K K * .
We also recall that Tr ( A B * ) induces a scalar product to which the trace norm is attached:
| | A | | tr = Tr ( A A * ) = i σ i ( A ) ,
where σ i ( A ) are the singular values of A . In particular we have A 2 | | A | | tr p A 2 for every A . Now from (21) convexity can be checked as
ϕ ( t ) = 2 | | H | | tr | | K | | tr 0 .
Let us consider
K 2 = G 1 / 2 G G 1 / 2 2 = G 1 / 2 ( 2 t Λ Λ * + R ) G 1 / 2 2 .
We have
K 2 2 t G 1 / 2 Λ Λ * G 1 / 2 2 + G 1 / 2 R G 1 / 2 2 .
Notice that the spectral norm 2 is self-adjoint, that is M * 2 = M 2 for every M (see e.g., [27]). Then
K 2 2 t H H * 2 + G 1 / 2 ( B Λ * + Λ B * ) G 1 / 2 2 ,
that is
K 2 2 H 2 G 1 / 2 Λ 2 + 2 G 1 / 2 Λ 2 G 1 / 2 B 2 .
Thus,
K 2 2 H 2 G 1 / 2 Λ 2 + G 1 / 2 B 2 .
Assume now that A , B C μ : A 2 μ , B 2 μ . We deduce that Λ 2 2 μ and due to the structure of G ( t ) = I p + Q ( t ) Q ( t ) * we also have
G 1 / 2 Λ 2 2 μ , G 1 / 2 B 2 μ .
Finally, we have
K 2 6 μ H 2 .
Going back to (22) we have
ϕ ( t ) = 2 | | H | | tr | | K | | tr 2 H 2 p K 2 2 ( 1 3 p μ ) H 2 0 ,
since 0 < μ 1 3 p .    □
By means of a simple change of variable, the following result can be proven.
Lemma 2.
For any δ > 0 the function
log det δ 2 I p + A A *
is convex on the closed ball C δ = { A | A is a real p × p matrix , A 2 1 3 δ p } .
In conclusion, even though the function log det ( I p + A ) is always concave, Lemma 2 shows that the function log det δ 2 I p + A A * can be made locally convex into any ball centered in 0, just choosing a suitable δ .

4. Lipschitz-Continuity

In this section, we prove the Lipschitzianity of the smooth function L ( L , S ) = 1 2 ln det ( I p + Δ n Δ n ) , and of its gradient function, δ L ( L , S ) δ L = δ L ( L , S ) δ S = ( I p + Δ n Δ n ) 1 Δ n (see (6)).
Lemma 3.
The function L ( L , S ) = 1 2 ln det ( I p + Δ n Δ n ) is Lipschitz continuous in Euclidean norm with Lipschitz constant equal to 1:
| log det φ ( Σ 1 ) log det φ ( Σ 2 ) | Σ 1 Σ 2 2 .
Proof. 
Let us recall that L and S are two generic p × p matrices, Σ = L + S is their sum, and Δ n = Σ Σ n . We reconsider the matrix function φ ( Σ ) = I p + Δ n Δ n and the function ϕ ( Σ ) = log det φ ( Σ ) . We recall from (6) that
Σ 1 2 log det φ ( Σ ) = φ 1 ( Σ ) Δ n .
Given two vectors u , v R p , let us define the Euclidean inner product u , v = u v . We consider
φ ( Σ ) v , v = ( I p + Δ n Δ n ) v , v = | v | 2 + | Δ n v | 2 ,
where | v | is the Euclidean norm of v R p . Then we have
| v | 2 + | Δ n v | 2 1 2 1 δ 2 | φ ( Σ ) v | 2 + δ 2 | v | 2 ,
via Cauchy-Schwarz, for any δ R . Now choose δ = 2 , then we have
4 | Δ n v | 2 | φ ( Σ ) v | 2
for every v R p . Noticing that φ ( Σ ) is invertible and plugging in the previous inequality v = φ ( Σ ) 1 w , w R p , we obtain
max w 0 | Δ n φ ( Σ ) 1 w | | w | 2 1 2 .
Now recall that (see [25] p. 312) that the spectral norm of a matrix A , A 2 , can be computed also via the equality
A 2 = max x 0 | A x | | x | ,
and that the spectral norm is self-adjoint (again see [25] p. 309), that is A 2 = A 2 . Summing up, we have proved that
φ ( Σ ) 1 Δ n 2 1 2 .
This means that the gradient of log det φ ( Σ ) is uniformly bounded and since Σ log det φ ( Σ ) is a smooth function we have that the Lipschitz condition is satisfied with Lipschitz constant equal to 1:
| log det φ ( Σ 1 ) log det φ ( Σ 2 ) | Σ 1 Σ 2 2 .
   □
We have proven that the function
2 σ i j σ h k 1 2 log det φ ( Σ ) = δ j k φ 1 ( Σ ) i h μ , σ φ 1 ( Σ ) h μ Δ n , μ j φ 1 ( Σ ) i σ Δ n , σ k φ 1 ( Σ ) i h μ , λ φ 1 ( Σ ) λ μ Δ n , μ j Δ n , λ k = δ j k φ 1 ( Σ ) i h ( φ 1 ( Σ ) Δ n ) h j ( φ 1 ( Σ ) Δ n ) i k φ 1 ( Σ ) i h φ 1 ( Σ ) Δ n , j , Δ n , k
is Lipschitz continuous.
Now, we prove that the function δ L ( L , S ) δ L = δ L ( L , S ) δ S = ( I p + Δ n Δ n ) 1 Δ n is Lipschitz continuous.
Lemma 4.
The function δ L ( L , S ) δ L = δ L ( L , S ) δ S = ( I p + Δ n Δ n ) 1 Δ n is Lipschitz continuous with Lipschitz constant equal to 5 4 :
F ( Δ n + ϵ H ) F ( Δ n ) 2 5 4 ϵ H 2 + O ( ϵ 2 ) ,
with F ( Δ n ) = φ 1 ( Σ ) Δ n = ( I p + Δ n Δ n ) 1 Δ n and fix ϵ > 0 , for any ϵ > 0 .
Proof. 
Let us call F ( Δ n ) = φ 1 ( Σ ) Δ n = ( I p + Δ n Δ n ) 1 Δ n and fix ϵ > 0 .
Let us compute
F ( Δ n + ϵ H ) F ( Δ n ) = ( I p + ( Δ n + ϵ H ) ( Δ n + ϵ H ) ) 1 ( Δ n + ϵ H ) ( I p + Δ n Δ n ) 1 Δ n .
We have
( I p + ( Δ n + ϵ H ) ( Δ n + ϵ H ) ) 1 ( Δ n + ϵ H ) = ( I p + Δ n Δ n + Λ ) 1 ( Δ n + ϵ H ) ,
with Λ = ϵ Λ 0 + ϵ 2 Λ 1 and
Λ 0 = H Δ n + Δ n H , Λ 1 = H H .
Calling Ψ = I p + Δ n Δ n we have
( Ψ + Λ ) 1 = ( Ψ ( I p + Ψ 1 Λ ) ) 1 = ( I p + Ψ 1 Λ ) 1 Ψ 1 ,
so that we have
F ( Δ n + ϵ H ) F ( Δ n ) = ( I p + Ψ 1 Λ ) 1 Ψ 1 ( Δ n + ϵ H ) Ψ 1 Δ n .
Recalling that
( I p + G ) 1 = I p G + ( I p + G ) 1 G 2 ,
whenever I p + G is invertible, we have
F ( Δ n + ϵ H ) F ( Δ n ) = ( I p Ψ 1 Λ + ( I p + Ψ 1 Λ ) 1 ( Ψ 1 Λ ) 2 ) Ψ 1 ( Δ n + ϵ H ) Ψ 1 Δ n .
We develop in the powers of ϵ :
F ( Δ n + ϵ H ) F ( Δ n ) = ( I p Ψ 1 ( ϵ Λ 0 + ϵ 2 Λ 1 ) + ( I p + Ψ 1 ( ϵ Λ 0 + ϵ 2 Λ 1 ) 1 ) ( Ψ 1 ( ϵ Λ 0 + ϵ 2 Λ 1 ) ) 2 ) Ψ 1 ( Δ n + ϵ H ) Ψ 1 Δ n
A tedious but simple computation yields to
F ( Δ n + ϵ H ) F ( Δ n ) = ϵ T 1 + O ( ϵ 2 ) ,
with
T 1 = Ψ 1 H Ψ 1 Λ 0 Ψ 1 Δ n ,
that is
T 1 = Ψ 1 H ( I p Δ n Ψ 1 Δ n ) Ψ 1 Δ n H Ψ 1 Δ n .
The previous computations for the Lipschizianity of log det ( I p + Δ n Δ n ) gave us (see (28)) that
Ψ 1 Δ n 2 1 2 .
It is also easy to check that
Ψ 1 2 1 ,
and that
I p Δ n Ψ 1 Δ n 2 1 .
Putting all together we get
F ( Δ n + ϵ H ) F ( Δ n ) 2 5 4 ϵ H 2 + O ( ϵ 2 ) ,
such that we have proven that the directional derivative of the gradient is bounded in every direction by 5 / 4 , i.e., the gradient is Lipschitz as a function from M ( p ) to M ( p ) , the vector space of p × p real matrices.    □

5. Discussion

In this paper, we have proved that the loss 1 2 log det ( I p + Δ n Δ n ) has good analytic properties for the purpose of optimization, provided that the matrix Δ n fulfills certain conditions. As a consequence, by [23,28] and the supplement of [10], it follows that our analytic setup can provide a numerical solution to the problem
min L 0 , S 0 1 2 log det ( I p + Δ n Δ n ) + ψ L * + ρ S 1 ,
by using proximal gradient algorithms. The local convexity of log det ( I p + Δ n Δ n ) is the key to apply first-order methods to solve (33). Following [23,28] and the supplement of [10], we derive the following solution Algorithm 1.
Such algorithm may be applied in many fields, like economics, finance, biology, genetics, health, climatology, social science, among others. In future research, we plan to properly develop the selection of threshold parameters, to study how local convexity may cope with the random nature of the sample error matrix Δ n , and to establish the consistency of the solution pair of (33).
Algorithm 1 Pseudocode to solve problem (33) given any input covariance matrix Σ n .
(1)
Set ( L 0 , S 0 ) = 1 2 ( diag ( Σ n ) , diag ( Σ n ) ) , η 0 = 1 .
(2)
Initialize Y 0 = L 0 and Z 0 = S 0 . Set t = 1 .
(3)
For t 1 , repeat:
(i) 
calculate calculate Δ t , n = Y t 1 + Z t 1 Σ n ;
(ii) 
compute 1 2 log det I p + Δ t , n Δ t , n Y t 1 = 1 2 log det I p + Δ t , n Δ t , n Z t 1 = ( I p + Δ t , n Δ t , n ) 1 Δ t , n .
(iii) 
apply the singular value thresholding (SVT, [29]) operator T ψ to
E Y , t = Y t 1 1 ( I p + Δ t , n Δ t , n ) 1 Δ t , n , with = 10 4 ,
and set L t = T ψ ( E Y , t ) = U ^ D ^ ψ U ^ .
(v) 
apply the soft-thresholding operator [30] T ρ to
E Z , t = Z t 1 1 ( I p + Δ t , n Δ t , n ) 1 Δ t , n , with = 10 4 , and set S t = T ρ ( E Z , t ) .
(vi) 
set ( Y t , Z t ) = ( L t , S t ) + η t 1 1 η t { ( L t , S t ) ( L t 1 , S t 1 ) } where
η t = 1 2 + 1 2 1 + 4 η t 1 2 .
(vii) 
stop if the convergence criterion L t L t 1 F 1 + L t 1 F + S t S t 1 F 1 + S t 1 F ε .
(4)
Set L ^ ALCE = Y t and S ^ ALCE = Z t .

Author Contributions

Conceptualization, M.F.; Investigation, E.B. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Pourahmadi, M. High-Dimensional Covariance Estimation: With High-Dimensional Data; John Wiley & Sons: Hoboken, NJ, USA, 2013; Volume 882. [Google Scholar]
  2. Wainwright, M.J. High-Dimensional Statistics: A Non-Asymptotic Viewpoint; Cambridge University Press: Cambridge, UK, 2019; Volume 48. [Google Scholar]
  3. Zagidullina, A. High-Dimensional Covariance Matrix Estimation: An Introduction to Random Matrix Theory; Springer Nature: Berlin/Heidelberg, Germany, 2021. [Google Scholar]
  4. Fan, J.; Liao, Y.; Liu, H. An overview of the estimation of large covariance and precision matrices. Econom. J. 2016, 19, C1–C32. [Google Scholar] [CrossRef]
  5. Lam, C. High-dimensional covariance matrix estimation. Wiley Interdiscip. Rev. Comput. Stat. 2020, 12, e1485. [Google Scholar] [CrossRef]
  6. Ledoit, O.; Wolf, M. Shrinkage estimation of large covariance matrices: Keep it simple, statistician? J. Multivar. Anal. 2021, 186, 104796. [Google Scholar] [CrossRef]
  7. Chamberlain, G.; Rothschild, M. Arbitrage, factor structure, and mean-variance analysis on large asset markets. Econometrica 1983, 51, 1281. [Google Scholar] [CrossRef] [Green Version]
  8. Fan, J.; Liao, Y.; Mincheva, M. Large covariance estimation by thresholding principal orthogonal complements. J. R. Stat. Soc. Ser. (Stat. Methodol.) 2013, 75, 603–680. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  9. Abdi, H.; Williams, L.J. Principal component analysis. Wiley Interdiscip. Rev. Comput. Stat. 2010, 2, 433–459. [Google Scholar] [CrossRef]
  10. Farnè, M.; Montanari, A. A large covariance matrix estimator under intermediate spikiness regimes. J. Multivar. Anal. 2020, 176, 104577. [Google Scholar] [CrossRef] [Green Version]
  11. Fazel, M.; Hindi, H.; Boyd, S.P. A rank minimization heuristic with application to minimum order system approximation. In Proceedings of the American Control Conference, Arlington, VA, USA, 25–27 June 2001; Volume 6, pp. 4734–4739. [Google Scholar]
  12. Fazel, M. Matrix Rank Minimization with Applications. Ph.D. Thesis, Stanford University, Stanford, CA, USA, 2002. [Google Scholar]
  13. Donoho, D.L. For most large underdetermined systems of linear equations the minimal l1-norm solution is also the sparsest solution. Commun. Pure Appl. Math. J. Issued Courant Inst. Math. Sci. 2006, 59, 797–829. [Google Scholar] [CrossRef]
  14. Recht, B.; Fazel, M.; Parrilo, P.A. Guaranteed minimum-rank solutions of linear matrix equations via nuclear norm minimization. SIAM Rev. 2010, 52, 471–501. [Google Scholar] [CrossRef] [Green Version]
  15. Candès, J.E.; Plan, Y. Near-ideal model selection by l1 minimization. Ann. Stat. 2009, 37, 2145–2177. [Google Scholar] [CrossRef]
  16. Candès, J.E.; Tao, T. The power of convex relaxation: Near-optimal matrix completion. IEEE Trans. Inf. Theory 2010, 56, 2053–2080. [Google Scholar] [CrossRef] [Green Version]
  17. Mazumder, R.; Hastie, T.; Tibshirani, R. Spectral regularization algorithms for learning large incomplete matrices. J. Mach. Learn. Res. 2010, 11, 2287–2322. [Google Scholar] [PubMed]
  18. Srebro, N.; Rennie, J.; Jaakkola, T.S. Maximum-margin matrix factorization. In Advances in Neural Information Processing Systems; MIT Press: Cambridge, MA, USA, 2005; pp. 1329–1336. [Google Scholar]
  19. Hastie, T.; Mazumder, R.; Lee, J.D.; Zadeh, R. Matrix completion and low-rank svd via fast alternating least squares. J. Mach. Learn. Res. 2015, 16, 3367–3402. [Google Scholar] [PubMed]
  20. Candès, E.J.; Li, X.; Ma, Y.; Wright, J. Robust principal component analysis? J. ACM (JACM) 2011, 58, 11. [Google Scholar] [CrossRef]
  21. Danilova, M.; Dvurechensky, P.; Gasnikov, A.; Gorbunov, E.; Guminov, S.; Kamzolov, D.; Shibaev, I. Recent theoretical advances in non-convex optimization. arXiv 2020, arXiv:2012.06188. [Google Scholar]
  22. Dey, D.; Mukhoty, B.; Kar, P. Agglio: Global optimization for locally convex functions. In Proceedings of the 5th Joint International Conference on Data Science & Management of Data (9th ACM IKDD CODS and 27th COMAD), Bangalore, India, 8–10 January 2022; pp. 37–45. [Google Scholar]
  23. Nesterov, Y. Gradient methods for minimizing composite functions. Math. Program. 2013, 140, 125–161. [Google Scholar] [CrossRef]
  24. Harville, D.A. Matrix Algebra from A Statistician’s Perspective; Springer: New York, NY, USA, 1997. [Google Scholar]
  25. Horn, A.R.; Johnson, C.R. Matrix Analysis; Cambridge University Press: Cambridge, UK, 2012. [Google Scholar]
  26. Graham, A. Kronecker Products and Matrix Calculus: With Applications; Ellis Horwood Limited: London, UK, 1981. [Google Scholar]
  27. Lax, P.D. Linear Algebra and Its Applications, 2nd ed.; Pure and Applied Mathematics (Hoboken); Wiley-Interscience (John Wiley & Sons): Hoboken, NJ, USA, 2007. [Google Scholar]
  28. Luo, X. High dimensional low rank and sparse covariance matrix estimation via convex minimization. arXiv 2011, arXiv:1111.1133. [Google Scholar]
  29. Cai, J.-F.; Candès, E.J.; Shen, Z. A singular value thresholding algorithm for matrix completion. SIAM J. Optim. 2010, 20, 1956–1982. [Google Scholar] [CrossRef]
  30. Daubechies, I.; Defrise, I.M.; Mol, C.D. An iterative thresholding algorithm for linear inverse problems with a sparsity constraint. Commun. Pure Appl. Math. 2004, 57, 1413–1457. [Google Scholar] [CrossRef] [Green Version]
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Bernardi, E.; Farnè, M. A Log-Det Heuristics for Covariance Matrix Estimation: The Analytic Setup. Stats 2022, 5, 606-616. https://doi.org/10.3390/stats5030037

AMA Style

Bernardi E, Farnè M. A Log-Det Heuristics for Covariance Matrix Estimation: The Analytic Setup. Stats. 2022; 5(3):606-616. https://doi.org/10.3390/stats5030037

Chicago/Turabian Style

Bernardi, Enrico, and Matteo Farnè. 2022. "A Log-Det Heuristics for Covariance Matrix Estimation: The Analytic Setup" Stats 5, no. 3: 606-616. https://doi.org/10.3390/stats5030037

Article Metrics

Back to TopTop