Next Article in Journal
Phase-Type Distributions of Animal Trajectories with Random Walks
Previous Article in Journal
Quantile Regression Based on the Weighted Approach with Dependent Truncated Data
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Differentially Private Sparse Covariance Matrix Estimation under Lower-Bounded Moment Assumption

Department of Mathematics, Beijing University of Technology, Beijing 100124, China
*
Author to whom correspondence should be addressed.
Mathematics 2023, 11(17), 3670; https://doi.org/10.3390/math11173670
Submission received: 14 June 2023 / Revised: 2 August 2023 / Accepted: 23 August 2023 / Published: 25 August 2023
(This article belongs to the Section Probability and Statistics)

Abstract

:
This paper investigates the problem of sparse covariance matrix estimation while the sampling set contains sensitive information, and both the differentially private algorithm and locally differentially private algorithm are adopted to preserve privacy. It is worth noting that the requirement of the distribution assumption in our work is only the existing bounded 4 + ε ( ε > 0 ) moment. Meanwhile, we reduce the error bounds by modifying the threshold of the existing differentially private algorithms. Finally, the numerical simulations and results from a real data application are presented to support our theoretical claims.

1. Introduction

The covariance matrix plays an important role in modern multivariate analysis. Many methodologies, including regression analysis, principal component analysis and discriminant analysis, rely on the estimation of the covariance matrices. Such estimation problems arise in various fields of science; for example, risk management [1], gene expressions [2], social networks [3], among others. Up to now, many methods have been developed to estimate the covariance matrix. One of the most popular approaches is to impose the sparse structure on the covariance matrix (see [4,5]).
In the big data era, sampling databases usually include personal financial or health information, such as those used in social science, biomedicine and genomics, so it is inevitable that one must deal with sensitive data. In recent years, the differentially private algorithm (DPA) and locally differentially private algorithm (LDPA) have become widely used methods that can prevent privacy leakage and defend against differential attacks, and these methods have been developed in the real world by Apple [6], Microsoft [7] and Google [8]. The DPA and LDPA aim at hiding the true information while keeping the basic property of the whole dataset. A popular idea to achieve this goal is to add some special noise into the original model [9,10,11].
This paper focuses on efficient estimation for a form of general sparse covariance matrix based on both DPA and LDPA in order to protect sensitive information. Although the study of datasets with the DPA and LDPA brings difficulties, it also makes it possible to use some special methods to break this bottleneck. In this paper, we adopt the Gaussian mechanism to keep the differentially private property of algorithms, and use the thresholding method to deal with the covariance matrix.

1.1. Related Work and Our Contributions

There are several papers developing the theory of covariance matrix estimation based on the DPA or LDPA to provide private protection. Jiang et al. [12] added Wishart-distributed noise to construct a DPA, and applied it to keep privacy while establishing the covariance matrix estimation; however, the Wishart distribution seems a strong condition. Amin et al. [13] used the Laplace mechanism to construct a DPA, and they utilized it to protect sensitive samples while estimating the covariance matrix; nonetheless, their method is just for the low-dimensional case. Kamath et al. [14] considered high-dimensional covariance matrix estimation with a DPA to protect privacy, but they did not assume the sparse structure of the covariance matrix, so it led to a large error bound. Moreover, the distribution assumption is required to be Gaussian. Recently, Wang and Xu [10] reduced the distribution assumption from Gaussian to sub-Gaussian and imposed the sparse structure on the covariance matrix, which led to a low convergence rate, while using a DPA and LDPA to preserve privacy.
Our contributions are as follows: (i) For the random vector X = ( X 1 , , X p ) T R p , the requirement for the distribution of X i is only the existing bounded 4 + ε ( ε > 0 ) moment. However, previous work requires that X is Gaussian [14] or sub-Gaussian [10], which is equivalent to any existing bounded moment of X i ( i = 1 , , p ) (see [15]). Moreover, if X i ( i = 1 , , p ) is subjected to a heavy-tailed distribution, i.e, the distribution of X i satisfies R e t x d F i ( x ) = for any t > 0 , then it is obvious that this distribution can be covered by our distribution, and cannot be handled by the existing literature. (ii) We adjust the threshold in the original algorithms of Wang and Xu [10], which leads to the error bound becoming smaller under both the DPA and LDPA (see Remarks in 3 and 7). (iii) We measure the error bounds not only by the spectral norm and matrix operator norm, but also by the Frobenius norm, which has not previously been considered as far as we know. (iv) The sparse structure of the covariance matrix in the present paper contains that of Wang and Xu under an extra condition as a special case (see Remark 1).

1.2. Notations

For a random variable Z , E Z and D Z denote the expectation and variance of Z, respectively. For a vector X = ( X 1 , , X p ) T R p , we define its l ω norm by X l ω = ( i = 1 p | X i | ω ) 1 / ω with w [ 1 , ) and X l = max i | X i | . For a matrix A = ( a i j ) R p × p , the spectral norm is defined as A 2 = sup X l 2 1 A X l 2 ; the matrix l 1 norm and the Frobenius norm are defined by A 1 = max j i = 1 p | a i j | and A F = i , j = 1 p | a i j | 2 , respectively. Moreover, the matrix l ω operator norm is given by A ω = sup X l ω 1 A X l ω . For two sequences of real numbers { a n } n 1 and { b n } n 1 , “ a n = O ( b n ) ” stands for a n C b n for some constant C > 0 independent of n .

1.3. Organization of this Paper

The remainder of this paper is outlined as follows. Section 2 introduces some important definitions and lemmas. In Section 3, we firstly establish the sparse covariance matrix estimation based on the DPA, while any component of the random vector just has a bounded 4 + ε ( ε > 0 ) moment. Then, the estimation problem is extended to utilize the LDPA. Finally, the results from several numerical experiments and a real data example are presented to support our theoretical results. All proofs are relegated to Appendix A.

2. Preliminaries

2.1. Sparse Covariance Estimation

In this paper, we always assume the real random vector X = ( X 1 , , X p ) T R p satisfying the following conditions, denoted as X P ( K , τ 0 , s n , p , M ) .
Condition (i) For a polynomial-type moment, assume E X = 0 , X 2 = 1 , and for some positive constants γ , τ 0 , K and ε = c ( γ ) , it holds p n γ ,
E | X i D X i | 4 + ε K for 1 i p
and min 1 i , j p D X i X j D X i D X j τ 0 .
Condition (ii) The covariance matrix Σ of the random vector X belongs to the l q -sparse space as follows:
U q ( s n , p , M ) : = Σ = ( σ i j ) p × p > 0 : max j i = 1 p | σ i j | q s n , p , max j σ j j M .
Remark 1.
Condition (i) mainly means that the distribution of the any component of X satisfies the existing bounded 4 + ε ( ε > 0 ) moment. Note that if X is subjected to a multivariate normal distribution N p ( μ , Σ ) or a multivariate t-distribution T p ( α ; μ , Σ ) with freedom α 4 + ε , then Condition (i) holds. Moreover, the relationship p n γ is necessary for proofing Lemma 1. Meanwhile, the parameter ε controls the level of moment; K , M can be determined if the distribution of X and the covariance matrix Σ are given.
Condition (ii) is a common assumption for the high-dimensional covariance matrix Σ of X (see [4] and [5]). Moreover, we find that the parameter space defined in Wang and Xu [10] is
G 0 ( s n , p ) : = Σ = ( σ i j ) p × p > 0 : max j i j | σ i j | 0 s n , p .
Then, it is obvious that there exists a constant C > 0 , such that
G 0 ( s n , p , M ) : = Σ G 0 ( s n , p ) , 0 < σ j j M ( j = 1 , , p ) U 0 ( C s n , p , M ) ,
i.e., the sparse space (1) with the extra condition of 0 < σ j j M ( j = 1 , , p ) can be seen as a special case of U 0 ( s n , p , M ) .
Let X 1 , , X n i . i . d . P ( K , τ 0 , s n , p , M ) , where X i = ( X 1 i , , X p i ) , then, the sample covariance matrix is given as
Σ ^ : = ( σ ^ i j ) p × p = 1 n i = 1 n X i X i T .
The following conclusion about the difference between σ ^ i j and σ i j was acquired by [4].
Lemma 1.
For any η 2 and some ε > 0 ,
P | σ ^ i j σ i j | η ( θ ^ i j log p ) / n , 1 i , j p = O ( log p ) 1 2 p η + 2 + n ε ,
where θ ^ i j = 1 n k = 1 n ( X i k X j k σ ^ i j ) 2 .

2.2. Differential Privacy: Gaussian Mechanism and Post-Processing Property

This paper adopts the differentially private mechanism to protect sensitive data, which requires no significant change in the outcome if a single data point of a dataset varies.
Definition 1
([9]). Two datasets, X and X X p , are called neighbors if they only differ in only one entry, denoted as X X .
A similar explanation can be found in Cai et al. [11].
Definition 2
([13]). A randomized algorithm M is ( ϵ , δ ) -differentially private (DP) if
P ( M ( X ) O ) e ϵ P ( M ( X ) O ) + δ
holds for every pair of neighbors, i.e., X and X X p , where O is any measurable event in the output space of M .
This paper adopts the Gaussian mechanism to guarantee the algorithm being ( ϵ , δ ) -DP.
Definition 3
([16]). For any algorithm f mapping a dataset X p to R p , the Gaussian mechanism is defined as
M G ( X ) : = f ( X ) + Y ,
where Y = ( Y 1 , , Y p ) T with Y i i . i . d . N ( 0 , 2 ( 2 ( f ) / ϵ ) 2 log ( 1.25 / δ ) ) and 2 ( f ) = sup X X f ( X ) f ( X ) l 2 .
In fact, post-processing a differentially private algorithm preserves privacy.
Lemma 2.
Post-processing property [11]: If M 1 is a ( ϵ , δ ) -differentially private algorithm and M 2 is any deterministic algorithm, then the composition M 2 ( M 1 ( · ) ) is ( ϵ , δ ) -differentially private.

3. Two Methods for Privacy Protection

3.1. Differentially Private Algorithm

Firstly, we introduce the ( ϵ , δ ) -differentially private algorithm, which post-processes the perturbed sample covariance matrix by the modified threshold. After thresholding, eigenvalue selection is conducted to guarantee a positive semi-definite property. For details, see the following description of Algorithm 1. Moreover, we provide Theorem 1 to bound the errors between Σ U q ( s n , p , M ) and the output Σ ˜ + τ of Algorithm 1.
Algorithm 1 Modified DP-Thresholding
Input:  X 1 , , X n i . i . d . P ( K , τ 0 , s n , p , M ) , ϵ , δ ( 0 , 1 ) , ε , K , M > 0 and η 2
1:
Compute
Σ ˜ = ( σ ˜ i j ) 1 i , j p = 1 n i = 1 n X i X i T + N ,
where N = ( n i j ) 1 i , j p is a symmetric matrix for i j , n i j i . i . d . N ( 0 , σ 1 2 ) with σ 1 2 = 4 log ( 1.25 / δ ) / ( n ϵ ) 2 .
2:
Define the thresholding estimator Σ ˜ τ = ( σ ˜ i j τ ) 1 i , j p with
σ ˜ i j τ = σ ˜ i j · I | σ ˜ i j | > 4 2 3 η K 2 4 + ε M log p n + 4 σ 1 ( log p ) 1 / 2 .
3:
Compute the eigen-decomposition Σ ˜ τ = i = 1 p λ ˜ i v i v i T and λ ˜ i + = max { λ ˜ i , 0 } , then let Σ ˜ + τ = i = 1 p λ ˜ i + v i v i T .
4:
return   Σ ˜ + τ .
In Algorithm 1, the parameters ϵ and δ control the level of privacy protection, and the privacy constraint becomes more stringent as ϵ and δ tend to 0. The parameter η is usually chosen by cross-validation in the simulation experiment in order to decrease the error.
Remark 2.
Algorithm 1 is ( ϵ , δ ) -differentially private. Since Theorem 1 in [17] states that Step 1 in Algorithm 1 preserves ( ϵ , δ ) -differential privacy, then this, along with Lemma 2, implies that Algorithm 1 is ( ϵ , δ ) -differentially private. Note that the process of verifying differential privacy is long and elementary [18], so we omit it here.
Theorem 1.
For any η 2 and some ε > 0 , there exist positive constants C 1 : = C 1 ( q , η , K , M , ε ) , C 2 : = C 2 ( q , η , K , M , ε ) , such that the output Σ ˜ + τ of Algorithm 1 satisfies
( i ) . inf P P ( K , τ 0 , s n , p , M ) P X | P Σ ˜ + τ Σ 2 C 1 s n , p log p n + log δ 1 log p n 2 ϵ 2 1 q 1 O ( log p ) 1 2 p η + 2 + n ε + p 5 2 ; ( i i ) . inf P P ( K , τ 0 , s n , p , M ) P X | P 1 p Σ ˜ + τ Σ F C 2 s n , p log p n + log δ 1 log p n 2 ϵ 2 1 q / 2 1 O ( log p ) 1 2 p η + 2 + n ε + p 5 2 .
Remark 3.
In Theorem 1, the condition η 2 is necessary. Otherwise, O ( ( log p ) 1 2 p η + 2 + n ε ) may be larger than 1 for some ε > 0 , which leads to results (i)–(ii) being trivial. According to the results in Theorem 1, we find that when the levels of sparsity (i.e., s n , p ) and the dimension p of the random vector increase and the sample size n decreases, the errors between Σ U q ( s n , p , M ) and the output Σ ˜ + τ of Algorithm 1 become larger. Moreover, if ϵ and δ become smaller, which means stronger privacy protection and noise disturbance, the errors become larger. This theoretical analysis is consistent with the results of the simulation experiments in Section 4.1.
Note that the error bound under the spectral norm over the parameter space G 0 ( s n , p ) (see (1)) in Wang and Xu [10] is O s n , p log p n + log δ 1 log p n 2 ϵ 2 + log 2 δ 1 n 2 ϵ 4 , while guaranteeing DP. Theorem 1(i), together with (2), states that our error bound over G 0 ( s n , p , M ) is O s n , p log p n + log δ 1 log p n 2 ϵ 2 , which performs better than that of Wang and Xu if the space G 0 ( s n , p ) has an extra condition. However, the distribution assumption in our work is weaker than that of Wang and Xu, and we also consider the case with 0 < q < 1 .
Remark 4.
In a non-private case, result (i) of Theorem 1 reduces to
inf P P ( K , τ 0 , s n , p , M ) P X | P { Σ ˜ + τ Σ 2 C 1 s n , p log p n 1 q 2 } 1 O ( log p ) 1 2 p η + 2 + n ε + p 5 2 ,
which is consistent with Theorem 1(ii) in [4], i.e., we extend the work of [4] from a non-private mechanism to a ( ϵ , δ ) -differentially private mechanism.
Remark 5.
From the proving process of Theorem 1(i), we find that
Σ ˜ τ Σ 1 C 1 s n , p log p n + log δ 1 log p n 2 ϵ 2 1 q
holds under event E defined by (A1).
On the other hand, the Riesz–Thorin interpolation theorem [19] states that the l ω operator norm of the symmetric matrix A satisfies A ω A 1 . Hence, we obtain
Σ ˜ τ Σ ω C 1 s n , p log p n + log δ 1 log p n 2 ϵ 2 1 q
under event E. This, along with Lemma A1, implies that
inf P P ( K , τ 0 , s n , p , M ) P X | P Σ ˜ τ Σ ω C 1 s n , p log p n + log δ 1 log p n 2 ϵ 2 1 q 1 O ( log p ) 1 2 p η + 2 + n ε + p 5 2 ,
i.e., we can provide the upper error bound under the l ω operator norm of the matrix.

3.2. Locally Differentially Private Algorithm

Differential privacy in the local model. In the locally differentially private (LDP) model, there exists a data universe D , n users who hold private data x D and a server. In each round, the server sends a message, each user selects a differentially private algorithm and runs it on their data in order to encrypt it, and then sends the output back to the server.
Definition 4
([20]). A randomized algorithm M is ( ϵ , δ ) -locally differentially private if
P ( M ( x ) S ) e ϵ P ( M ( x ) S ) + δ
holds for all pairs of x , x D , where S is any measurable event in the output space of M .
In fact, the idea of Algorithm 2 is that each X i perturbs its sample covariance matrix in order to aggregate the noise disturbance in the algorithm.
Algorithm 2 Modified LDP-Thresholding
Input:  X 1 , , X n i . i . d . P ( K , τ 0 , s n , p , M ) , ϵ , δ ( 0 , 1 ) , ε , K , M > 0 and η 2
1:
for each k { 1 , , n }  do
2:
X k X k T + R k ,
where R k = ( r i j k ) 1 i , j p is a symmetric matrix and for i j , r i j k i . i . d . N ( 0 , σ 2 2 ) with σ 2 2 = 2 log ( 1.25 / δ ) / ϵ 2 .
3:
end for
4:
Compute
Σ ˇ = ( σ ˇ i j ) p × p = 1 n k = 1 n ( X k X k T + R k ) .
5:
Define the thresholding estimator Σ ˇ τ = ( σ ˇ i j τ ) 1 i , j p with
σ ˇ i j τ = σ ˇ i j · I | σ ˇ i j | > 4 2 3 η K 2 4 + ε M log p n + 4 σ 2 ( log p ) 1 / 2 .
6:
Compute the eigen-decomposition Σ ˇ τ = i = 1 p λ ˇ i v i v i T and λ ˇ i + = max { λ ˇ i , 0 } , then let Σ ˇ + τ = i = 1 p λ ˇ i + v i v i T .
7:
return   Σ ˇ + τ .
Remark 6.
In fact, Step 2 in Algorithm 2 can be viewed as a special case of Algorithm 1 in [21] that keeps ( ϵ , δ ) -locally differentially private. Then, using the post-processing property leads to Algorithm 2 being ( ϵ , δ ) -locally differentially private. Here, we omit Step 2 to verify the preservation of local differential privacy.
Theorem 2.
For any η 2 and some ε > 0 , there exist positive constants C 3 : = C 3 ( q , η , K , M , ε ) , C 4 : = C 4 ( q , η , K , M , ε ) , such that the output Σ ˇ + τ of Algorithm 2 satisfies
( i ) . inf P P ( K , τ 0 , s n , p , M ) P X | P Σ ˇ + τ Σ 2 C 3 s n , p log δ 1 log p n ϵ 2 1 q 2 1 O ( log p ) 1 2 p η + 2 + n ε + p 5 2 ;
( i i ) . inf P P ( K , τ 0 , s n , p , M ) P X | P 1 p Σ ˇ + τ Σ F C 4 s n , p log δ 1 log p n ϵ 2 1 2 q 4 1 O ( log p ) 1 2 p η + 2 + n ε + p 5 2 .
Remark 7.
The reason for setting η 2 and ε > 0 is the same as in Theorem 1. Moreover, the parameters s n , p and p are positively correlated with error bounds, and n, ϵ and δ are negatively correlated with error bounds, respectively, and these results are consistent with the simulation studies in Section 4.1.
It is easy to see that the error bound under the spectral norm over G 0 ( s n , p ) in Wang and Xu [10] is O s n , p log δ 1 log p n ϵ 2 while maintaining LDP. Moreover, our Theorem 2(i) and (2) imply that the error bound under the spectral norm over G 0 ( s n , p , M ) differs with that of Wang and Xu only in theory. However, according to the numerical studies in Section 4, the output of our Algorithm 2 outperforms that of Wang and Xu. In addition, our assumption of the distribution is weaker than Wang and Xu’s condition, and the parameter space with 0 < q < 1 is considered in our work.
Remark 8.
According to the proving process of Theorem 2, we observe that the error bounds under spectral and Frobenius norms can be O s n , p log p n + log δ 1 log p n ϵ 2 1 q and O s n , p log p n + log δ 1 log p n ϵ 2 1 q / 2 , respectively. Comparing these results with the conclusions of Theorem 1, it is easy to see that the upper bound of Theorem 2 is larger than that of Theorem 1. This is reasonable since Algorithm 2 enforces a stronger privacy protection mechanism, so it has to pay a larger price.

4. Numerical Experiments

4.1. Simulation Studies

In this subsection, we investigate the numerical performances of the outputs Σ ˜ + τ and Σ ˇ + τ of Algorithms 1 and 2, respectively. Moreover, we also compare these two estimators with the outputs Σ ˜ 1 + and Σ ˜ 2 + of the DP-thresholding and LDP-thresholding algorithms proposed by Wang and Xu [10], respectively.
Data generation
The following models for the covariance matrix Σ = ( σ i j ) p × p are considered:
Model 1. σ i j = 0 . 6 | i j | ;
Model 2. σ i i = 1 , σ i , i + 1 = σ i + 1 , i = 0.6 , σ i , i + 2 = σ i + 2 , i = 0.3 and σ i j = 0 for | i j | 3 .
Then, we generate samples X 1 , , X n in two different ways:
(i) X k i . i . d N ( 0 , Σ p × p ) ;
(ii) X k are independent from multivariate t-distribution T p ( α ; 0 , Σ p × p ) with freedom α = 5 .
Experimental settings
(i) Set p = { 50 , 100 , 200 } , n = { 200 , 300 } , ε = 1 / 2 , ϵ = 0.5 and δ = 1 / 400 ;
(ii) Set p = 100 , n = 200 , ε = 1 / 2 , δ = 1 / 400 and ϵ = { 0.1 , 0.4 , 0.6 , 0.7 } ;
(iii) Set p = 100 , n = 200 , ε = 1 / 2 , ϵ = 0.5 and δ = { 1 / 50 , 1 / 100 , 1 / 200 , 1 / 500 } .
In each setting, we choose the tuning parameter η in the threshold by 10-fold cross-validation, as proposed by Cai and Liu [4]. Moreover, we measure the errors by the spectral and Frobenius norms, respectively. We run each experiment 50 times and take the average errors as the final result (with standard errors in parentheses).
Experimental results
From Table 1 and Table 2, we find that the output Σ ˜ + τ of Algorithm 1 (differentially private algorithm) performs better than the output Σ ˜ 1 + of the DP-thresholding algorithm proposed by Wang and Xu [10]. The above property also holds for our Algorithm 2 (locally differentially private algorithm) compared with the LDP-thresholding algorithm of Wang and Xu. Moreover, we observe that the dimension p increase leads to the errors becoming larger. Meanwhile, if the sample size n becomes larger, then the errors become smaller. It can also be seen that the errors under Model 1 are larger than those under Model 2, since the covariance matrix in Model 1 is more dense.
Table 3 and Table 4 demonstrate that if the privacy parameter ϵ increases (resulting in weaker privacy protection), the errors will become smaller. Moreover, Table 5 and Table 6 show that when the privacy parameter δ decreases, the errors will become larger. These phenomena are also consistent with our theoretical results. In addition, the following numerical results are in line with the theoretical findings indicated by Remark 8, i.e., the error estimation of Algorithm 1 is smaller than that of Algorithm 2.

4.2. Real Data Application

To demonstrate the performances of the outputs Σ ˜ + τ and Σ ˇ + τ of Algorithms 1 and 2, and compare them with the outputs Σ ˜ 1 + and Σ ˜ 2 + of the DP-thresholding and LDP-thresholding algorithms proposed by Wang and Xu [10], respectively, we use quadratic discriminant analysis (QDA) presented by Liang et al. [22]. We apply these estimators on the human gut microbiome dataset collected by Wu et al. [23], which contains 98 healthy individuals at the University of Pennsylvania. In fact, the bacterial community in the dataset was categorized into 87 genera, which appeared in at least one sample. We select p = 40 bacterial genera that appeared in at least four samples. Then, according to the body mass index (BMI), we divide the dataset with 40 bacterial genera into a lean group (BMI < 25 , n = 63 ) and an obese group (BMI 25 , n = 35 ).
We randomly select 13 lean subjects and 7 obese subjects via the stratified sampling approach in order to constitute the testing set (roughly 1 / 5 of the subjects in each group); the remaining subjects form the training set. Moreover, a two-sample t test is performed between the two groups for each bacterial genus in the training set. In the our analysis, p = 25 and 40 are considered. The data are assumed to be normally distributed as N ( μ k , Σ ) , and the two groups are assumed to have the same covariance matrix Σ but different means μ k , k = 1 for the obese group and k = 2 for the lean group. Note that the parameters δ , ϵ of the additive noise in Algorithms 1 and 2 are set as 1 / 196 and 0.5 , respectively. In addition, we set θ = 4 2 η K 2 4 + ε M / 3 in Algorithms 1 and 2 as the tuning parameter, which is selected by cross-validation conducted on the real data.
In order to compare the advantage, the outputs Σ ˜ + τ , Σ ˇ + τ , Σ ˜ 1 + and Σ ˜ 2 + are used to replace Σ ^ in the following QDA score functions:
δ k ( y ) = 1 2 log ( det ( Σ ^ ) ) 1 2 ( y μ ^ k ) T Σ ^ 1 ( y μ ^ k ) + log ( π ^ k ) ,
where π ^ k = n k / n is the proportion of group k subjects in the training set and μ ^ k = 1 n k k t h g r o u p y i is the within-group average vector in the training set. The classification rule is
k ^ ( y ) = arg max k δ k ( y ) , k = 1 , 2 .
We use the testing set to evaluate the estimation performance through the average number of misclassifications. Our result is based on 20 replications of the above procedure. Table 7 presents the average number of misclassifications for every output of the corresponding algorithms. It can be seen that the output Σ ˜ + τ of our Algorithm 1 performs better than the output Σ ˜ 1 + of the DP-thresholding algorithm proposed by Wang and Xu [10]. Similarly, the output Σ ˇ + τ of Algorithm 2 is more effective than the output Σ ˜ 2 + of the LDP-thresholding algorithm [10].

5. Discussion

In this paper, we study the problem of estimating the sparse covariance matrix under lower-bounded moment assumption while guaranteeing DP and LDP, and measure the error bounds by spectral and Frobenius norms, respectively. Furthermore, we conduct numerical experiments to support our theoretical analysis.
However, we do not derive the lower bound of the error estimation. This is possible to achieve, since there are many well-known technique methods that can be utilized, such as Fano’s lemma [24], Assuouad’s lemma [25] and Le Cam’s method [26]. Moreover, a noteworthy open problem is to consider the estimation in terms of expectation with the lower-bounded moment assumption while preserving DP and LDP. We leave these aspects as future work.

Author Contributions

H.L. conceived the study and wrote the original draft. J.W. conducted the theoretical analysis and edited the manuscript. All authors have read and agreed to the published version of the manuscript.

Funding

This paper is supported by the National Natural Science Foundation of China (No. 12171016).

Data Availability Statement

The data presented in this study are available within this paper.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A

In order to present Theorems 1 and 2, we introduce the following lemma:
Lemma A1.
Define event E as
E = | σ ˜ i j σ i j | 3 4 ξ log p n + 4 σ 1 ( log p ) 1 / 2 , 1 i , j p ,
and ξ : = 4 2 3 η K 2 4 + ε M , where K , γ and M are given in Conditions (i)–(ii). Then, for any η 2 and some ε > 0 ,
P ( E ) 1 O ( log p ) 1 2 p η + 2 + n ε + p 5 2 .
Proof. 
Note that (3) and (4) in Algorithm 1 show that
σ ˜ i j = σ ^ i j + n i j ,
thus, we know
P ( E c ) = P | σ ^ i j + n i j σ i j | 2 η K 2 4 + ε M log p n + 3 σ 1 log p , 1 i , j p P | σ ^ i j σ i j | 2 η K 2 4 + ε M log p n , 1 i , j p + P { | n i j | 3 σ 1 log p , 1 i , j p } : = R 1 + R 2 .
Since n i j N ( 0 , σ 1 2 ) , it holds that
P { | n i j | 3 σ 1 log p , 1 i , j p } 2 p 5 2
due to P { | X | > t } 2 e t 2 2 σ 2 for X N ( 0 , σ 2 ) in [27].
According to Condition (i), we have E | X i | 4 + ε K σ i i 2 + ε / 2 . Therefore,
D ( X i X j ) E | X i X j | 2 E | X i | 4 E | X j | 4 K 4 4 + ε σ i i σ j j .
By applying the Σ U q ( s n , p , M ) in Condition (ii), it follows that
D ( X i X j ) K 4 4 + ε M 2 .
On the other hand, Condition (i) implies that
D ( X i X j ) τ 0 σ i i σ j j .
Define event E 1 as
E 1 = θ ^ i j 2 D ( X i X j ) , 1 i , j p
and
P ( E 1 c ) P { | θ ^ i j D ( X i X j ) | D ( X i X j ) , 1 i , j p } P | θ ^ i j D ( X i X j ) | τ 0 σ i i σ j j , 1 i , j p
holds thanks to (A5). Using Lemma 2(ii) in [4], we know that for some ε > 0 ,
P ( E 1 c ) O ( n ε ) .
Hence, we obtain that for any η 2 and some ε > 0 ,
R 1 P | σ ^ i j + n i j σ i j | 2 η ( D ( X i X j ) log p ) / n , 1 i , j p P | σ ^ i j + n i j σ i j | 2 η ( D ( X i X j ) log p ) / n , 1 i , j p E 1 + P ( E 1 c ) O ( log p ) 1 2 p η + 2 + n ε ,
where the first inequality holds because of (A4), and the last inequality follows from Lemma 1 and (A6). □
Proof of Theorem 1.
(i) Considering that Σ is a positive definite matrix, we have
Σ ˜ + τ Σ 2 Σ ˜ + τ Σ ˜ τ 2 + Σ ˜ τ Σ 2 max { i : λ ˜ i 0 } | λ ˜ i | + Σ ˜ τ Σ 2 max { i : λ ˜ i 0 } | λ ˜ i λ i ( Σ ) | + Σ ˜ τ Σ 2 2 Σ ˜ τ Σ 2 .
Hence, it suffices to show that there exists the constant C 1 : = C 1 ( q , η , K , M , ε ) , such that
Σ ˜ τ Σ 2 C 1 s n , p log p n + log δ 1 log p n 2 ϵ 2 1 q
under event E.
Since Σ ˜ τ Σ is symmetric, this, along with the Gersgorin theorem, yields
Σ ˜ τ Σ 2 Σ ˜ τ Σ 1 Σ ˜ τ Σ τ 1 + Σ τ Σ 1 : = R 1 + R 2 ,
where Σ τ = ( σ i j τ ) p × p with σ i j τ = σ i j · I | σ i j | > λ and
λ = 4 2 3 η K 2 4 + ε M log p n + 4 σ 1 ( log p ) 1 / 2 .
We first estimate R 2 . Clearly, it follows that
R 2 = max j i = 1 p [ | σ i j τ σ i j | I ( | σ i j | λ ) + | σ i j τ σ i j | I ( | σ i j | > λ ) ] = max j i = 1 p | σ i j | I ( | σ i j | λ ) max j i = 1 p | σ i j | q λ 1 q
from the definition of σ i j τ .
The remaining work sets out to estimate R 1 . It is easy to see that
R 1 max j i = 1 p | σ ˜ i j σ i j | I ( | σ ˜ i j | > λ , | σ i j | > λ ) + max j i = 1 p | σ i j | I ( | σ ˜ i j | λ , | σ i j | > λ ) + max j i = 1 p | σ ˜ i j | I ( | σ ˜ i j | > λ , | σ i j | λ ) = R 11 + R 12 + R 13 .
When event E occurs, we obtain
R 11 3 4 max j i = 1 p λ I ( | σ ˜ i j | > λ , | σ i j | > λ ) 3 4 max j i = 1 p λ 1 q | σ i j | q
and
R 12 max j i = 1 p | σ ˜ i j σ i j | I ( | σ i j | > λ ) + max j i = 1 p | σ ˜ i j | I ( | σ ˜ i j | λ , | σ i j | > λ ) 7 4 max j i = 1 p λ I ( | σ i j | > λ ) 7 4 max j i = 1 p λ 1 q | σ i j | q .
For R 13 , it holds that
R 13 max j i = 1 p | σ i j | I ( | σ i j | λ ) + max j i = 1 p | σ ˜ i j σ i j | I ( | σ ˜ i j | > λ , | σ i j | λ ) max j i = 1 p λ 1 q | σ i j | q + R 131 .
Through the condition δ ( 0 , 1 ) in Algorithm 1, we know that constant c 1 > 0 exists, such that log ( 1.25 / δ ) c 1 log δ 1 and σ 1 2 c 1 log δ 1 / n ϵ . This, along with the definition of λ in (A9), leads to
λ 4 2 3 η K 2 4 + ε M log p n + 8 c 1 log δ 1 ϵ 2 log p n 2 .
On the other hand, we obtain
R 131 3 4 max j i = 1 p λ I | σ ˜ i j | > λ , 1 4 λ | σ i j | < λ + 3 4 max j i = 1 p λ I | σ ˜ i j | > λ , | σ i j | < 1 4 λ 3 4 max j i = 1 p λ I 1 4 λ | σ i j | < λ + 3 4 max j i = 1 p λ I | σ ˜ i j σ i j | > 3 4 λ = T 1 + T 2
under event E.
Obviously, T 1 in (A16) is controlled by 4 q max j i = 1 p λ 1 q | σ i j | q . For T 2 , (A15) and the assumption in Condition (i) imply λ is a bounded quantity. In addition, I | σ ˜ i j σ i j | > 3 4 λ = 0 if event E occurs. Thus, T 2 in (A16) is 0 under event E. Therefore,
R 131 4 q max j i = 1 p λ 1 q | σ i j | q .
This, along with (A14), reveals that
R 13 4 q + 1 max j i = 1 p λ 1 q | σ i j | q
under event E. Then, we have
R 1 4 q + 7 2 max j i = 1 p λ 1 q | σ i j | q ,
thanks to (A11)–(A13) and (A17).
Furthermore, substituting (A18) and (A10) into (A8) indicates that
Σ ˜ τ Σ 2 4 q + 9 2 max j i = 1 p λ 1 q | σ i j | q .
Hence, we derive
Σ ˜ τ Σ 2 4 q + 9 2 4 2 3 η K 2 4 + ε M log p n + 8 c 1 log δ 1 log p n 2 ϵ 2 1 q max j i = 1 p | σ i j | q C 1 s n , p log p n + log δ 1 log p n 2 ϵ 2 1 q
due to (A15) and Σ U q ( s n , p , M ) , which reaches the conclusion of (A7).
(ii) Similar to the discussions in (i), we derive that
Σ ˜ + τ Σ F 2 2 ( Σ ˜ + τ Σ ˜ τ F 2 + Σ ˜ τ Σ F 2 ) 2 i : λ ˜ i 0 λ ˜ i 2 + Σ ˜ τ Σ F 2 2 i = 1 p [ λ ˜ i λ i ( Σ ) ] 2 + Σ ˜ τ Σ F 2 = 4 Σ ˜ τ Σ F 2 .
Therefore, the remaining work is conducted to show that the constant C 5 : = C 5 ( q , η , K , M , ε ) exists, such that
1 p Σ ˜ τ Σ F 2 C 5 s n , p log p n + log δ 1 log p n 2 ϵ 2 2 q
under event E.
Obviously, it holds that
1 p Σ ˜ τ Σ F 2 = 1 p j = 1 p i = 1 p | σ ˜ i j τ σ i j | 2 max j i = 1 p | σ ˜ i j τ σ i j | 2 2 max j i = 1 p | σ ˜ i j τ σ i j τ | 2 + max j i = 1 p | σ i j τ σ i j | 2 .
Using similar discussion techniques as those applied for estimating R 1 , R 2 in (A11) and (A10), we know that
1 p Σ ˜ τ Σ F 2 C q max j i = 1 p λ 2 q | σ i j | q
under event E, where C q denotes a constant related to q. Thus, we have
1 p Σ ˜ τ Σ F 2 C q 4 2 3 η K 2 4 + ε M log p n + 8 c 1 log δ 1 log p n 2 ϵ 2 2 q max j i = 1 p | σ i j | q C 5 s n , p log p n + log δ 1 log p n 2 ϵ 2 2 q
due to (A15) and Σ U q ( s n , p , M ) . This is the expected conclusion of (A19). □
Proof of Theorem 2.
To show the conclusions (i)–(ii) of Theorem 2, we define the event as
E 2 = | σ ˇ i j σ i j | 3 4 ξ log p n + 4 σ 2 ( log p ) 1 / 2 , 1 i , j p
and ξ : = 4 2 3 η K 2 4 + ε M .
Based on Lemma 1 and P { | X | > t } 2 e t 2 2 σ 2 for X N ( 0 , σ 2 ) in [27], and using the similar proving process of Lemma A1, we know that for any η 2 and some ε > 0 ,
P ( E 2 ) 1 O ( log p ) 1 2 p η + 2 + n ε + p 5 2 .
By considering the similar discussions used to prove Theorem 1, we deduce that constants C 6 : = C 6 ( q , η , K , M , ε ) , C 7 : = C 7 ( q , η , K , M , ε ) exist, such that
Σ ˇ + τ Σ 2 C 6 s n , p log p n + log δ 1 log p n ϵ 2 1 q
and
1 p Σ ˇ + τ Σ F C 7 s n , p log p n + log δ 1 log p n ϵ 2 1 q / 2
under event E 2 . Furthermore, there exists c 2 > 0 , such that ( log p ) / n c 2 ( log δ 1 log p ) / n ϵ 2 . Hence, the above two inequalities reduce to
Σ ˇ + τ Σ 2 C 3 s n , p log δ 1 log p n ϵ 2 1 q 2
and
1 p Σ ˇ + τ Σ F C 4 s n , p log δ 1 log p n ϵ 2 1 2 q 4
under event E 2 . This, along with (A20), enables us to reach the desired conclusions of Theorem 2. □

References

  1. Sifaou, H.; Kammoun, A.; Alouini, M.-S. High-dimensional linear discriminant analysis classifier for spiked covariance model. J. Mach. Learn. Res. 2020, 21, 4508–4531. [Google Scholar]
  2. Chaudhuri, S.; Drton, M.; Richardson, T. Estimation of a covariance matrix with zeros. Biometrika 2007, 94, 199–216. [Google Scholar] [CrossRef]
  3. Xu, K.; Hero, A. Dynamic stochastic blockmodels for time-evolving social networks. IEEE J. Sel. Topics Signal Process. 2014, 8, 552–562. [Google Scholar] [CrossRef]
  4. Cai, T.; Liu, W. Adaptive thresholding for sparse covariance matrix estimation. J. Amer. Statist. Assoc. 2011, 106, 672–684. [Google Scholar] [CrossRef]
  5. Belomestny, D.; Mathias, T.; Tsybakov, A. Sparse covariance matrix estimation in high-dimensional deconvolution. Bernoulli 2019, 25, 1901–1938. [Google Scholar] [CrossRef]
  6. Tang, J.; Korolova, A.; Bai, X.; Wang, X.; Wang, X. Privacy loss in Apple’s implementation of differential privacy on MacOS 10.12. arXiv 2017, arXiv:1709.02753. [Google Scholar]
  7. Ding, B.; Kulkarni, J.; Yekhanin, S. Collecting Telemetry Data Privately. In Proceedings of the Thirty-First Annual Conference on Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; pp. 3571–3580. [Google Scholar]
  8. Erlingsson, U.; Pihur, V.; Korolova, A. Randomized aggregatable privacy-preserving ordinal response. In Proceedings of the Twenty-First ACM Conference on Computer and Communications Security, Scottsdale, AZ, USA, 3–7 November 2014; pp. 1054–1067. [Google Scholar]
  9. Dwork, C.; McSherry, F.; Nissim, K.; Smith, A. Calibrating noise to sensitivity in private data analysis. In Proceedings of the Theory of Cryptography Conference, New York, NY, USA, 4–7 March 2006; pp. 265–284. [Google Scholar]
  10. Wang, D.; Xu, J. Differentially private high dimensional sparse covariance matrix estimation. Theoret. Comput. Sci. 2021, 865, 119–130. [Google Scholar] [CrossRef]
  11. Cai, T.; Wang, Y.; Zhang, L. The cost of privacy: Optimal rates of convergence for parameter estimation with differentially privacy. Ann. Statist. 2021, 49, 2825–2850. [Google Scholar] [CrossRef]
  12. Jiang, W.; Xie, C.; Zhang, Z. Wishart mechanism for differentially private principal components analysis. In Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, Phoenix, AZ, USA, 12–17 February 2016; pp. 1730–1736. [Google Scholar]
  13. Amin, K.; Dick, T.; Kulesza, A.; Munoz, A.; Vassilvitskii, S. Differentially private covariance estimation. In Proceedings of the Thirty-Third Conference on Neural Information Processing Systems, Vancouver, BC, Canada, 8–14 December 2019; pp. 14213–14222. [Google Scholar]
  14. Kamath, G.; Li, J.; Singhal, V.; Ullman, J. Privately learning high-dimensional distributions. In Proceedings of the Thirty-Second Annual Conference on Learning Theory, Phoenix, AZ, USA, 4–8 March 2019; pp. 1853–1902. [Google Scholar]
  15. Vershynin, R. Introduction to the Non-Asymptotic Analysis of Random Matrices; Cambridge University Press: Cambridge, UK, 2012; pp. 210–268. [Google Scholar]
  16. Dwork, C.; Kenthapadi, K.; McSherry, F.; Mironov, I.; Naor, M. Our data, ourselves: Privacy via distributed noise generation. In Proceedings of the Twenty-Fourth Annual International Conference on Theory and Applications of Cryptographic Techniques, St. Petersburg, Russia, 28 May–1 June 2006; pp. 486–503. [Google Scholar]
  17. Su, W.; Guo, X.; Zhang, H. Differentially private precision matrix estimation. Acta Math. Sin. 2020, 36, 1107–1124. [Google Scholar] [CrossRef]
  18. Dwork, C.; Roth, A. The algorithmic foundations of differential privacy. Found. Trends Theor. Comput. Sci. 2014, 9, 211–407. [Google Scholar] [CrossRef]
  19. Thorin, G. Generalizing those of M. riesz and hadamard with some applications. Comm. Sem. Math. Univ. Lund 1948, 9, 1–58. [Google Scholar]
  20. Kasiviswanathan, S.; Lee, H.; Nissim, K.; Raskhodnikova, S.; Smith, A. What can we learn privately? SIAM J. Comput. 2011, 40, 793–826. [Google Scholar] [CrossRef]
  21. Dwork, C.; Talwar, K.; Thakurta, A.; Zhang, L. Analyze gauss: Optimal bounds for privacy-preserving principal component analysis. In Proceedings of the Forty-Sixth Annual ACM Symposium on Theory of Computing, New York, NY, USA, 31 May–3 June 2014; pp. 11–20. [Google Scholar]
  22. Liang, W.; Wu, Y.; Chen, H. Sparse covariance matrix estimation for ultrahigh dimensional data. Stat 2022, 11, e479. [Google Scholar] [CrossRef]
  23. Wu, G.; Chen, J.; Hoffmann, C.; Bittinger, K.; Chen, Y.Y.; Keilbaugh, S.A.; Bewtra, M.; Knights, D.; Walters, W.A.; Knight, R.; et al. Linking Long-Term Dietary Patterns with Gut Microbial Enterotypes. Science 2011, 335, 105–108. [Google Scholar] [CrossRef] [PubMed]
  24. Tsybakov, A. Introduction to Nonparametric Estimation; Springer: New York, NY, USA, 2009. [Google Scholar]
  25. Assouad, P. Deux remarques sur l’estimation. C. R. Acad. Sci. Paris Sér. 1983, 296, 1021–1024. [Google Scholar]
  26. Le Cam, L. Asymptotic Methods in Statistical Decision Theory; Springer: New York, NY, USA, 1986. [Google Scholar]
  27. Vershynin, R. High-Dimensional Probability: An Introduction with Applications in Data Science; Cambridge University Press: Cambridge, UK, 2018; pp. 672–684. [Google Scholar]
Table 1. Comparison of different outputs of the algorithms with normal distributions.
Table 1. Comparison of different outputs of the algorithms with normal distributions.
Spectral NormFrobenius Norm
( p , n ) Σ ˜ + τ Σ ˜ 1 + Σ ˇ + τ Σ ˜ 2 + Σ ˜ + τ Σ ˜ 1 + Σ ˇ + τ Σ ˜ 2 +
Model 1
(50,200)1.92 (0.12)2.50 (0.14)4.31 (0.23)4.95 (0.48)4.41 (0.17)5.90 (0.31)8.15 (0.36)9.77 (0.40)
(50,300)1.52 (0.13)1.89 (0.15)3.70 (0.29)4.18 (0.25)3.74 (0.16)4.65 (0.13)6.58 (0.30)7.65 (0.44)
(100,200)2.13 (0.13)2.79 (0.08)5.44 (0.29)7.67 (0.43)6.83 (0.17)9.34 (0.23)10.71 (0.25)14.11 (0.39)
(100,300)1.76 (0.08)2.18 (0.11)4.73 (0.17)6.73 (0.43)5.86 (0.17)7.13 (0.26)8.81 (0.37)11.07 (0.28)
(200,300)1.89 (0.07)2.56 (0.14)6.08 (0.25)8.81 (0.41)8.73 (0.10)10.10 (0.23)11.68 (0.27)15.28 (0.19)
Model 2
(50,200)1.01 (0.16)1.56 (0.15)3.46 (0.29)4.11 (0.29)3.32 (0.21)4.90 (0.28)6.42 (0.29)8.95 (0.46)
(50,300)0.74 (0.16)0.92 (0.15)3.13 (0.27)3.51 (0.24)2.87 (0.06)3.23 (0.13)5.20 (0.26)7.16 (0.37)
(100,200)1.28 (0.16)1.78 (0.06)4.19 (0.28)5.04 (0.23)4.99 (0.15)8.07 (0.31)8.03 (0.33)13.35 (0.30)
(100,300)0.82 (0.13)1.43 (0.09)3.60 (0.26)4.30 (0.18)4.29 (0.07)5.30 (0.25)5.75 (0.41)9.53 (0.39)
(200,300)0.93 (0.12)1.63 (0.11)4.00 (0.17)5.35 (0.17)6.28 (0.09)8.93 (0.34)8.94 (0.31)13.84 (0.24)
Table 2. Comparison of different outputs of the algorithms with t-distributions.
Table 2. Comparison of different outputs of the algorithms with t-distributions.
Spectral NormFrobenius Norm
( p , n ) Σ ˜ + τ Σ ˜ 1 + Σ ˇ + τ Σ ˜ 2 + Σ ˜ + τ Σ ˜ 1 + Σ ˇ + τ Σ ˜ 2 +
Model 1
(50,200)4.48 (0.10)5.31 (0.10)8.64 (0.26)10.45 (0.13)9.44 (0.21)11.41 (0.20)13.75 (0.26)15.44 (0.42)
(50,300)3.69 (0.20)4.63 (0.08)7.79 (0.48)9.36 (0.37)7.95 (0.28)9.80 (0.14)12.74 (0.46)13.63 (0.33)
(100,200)4.81 (0.08)5.56 (0.04)9.98 (0.35)13.75 (0.12)14.10 (0.25)16.78 (0.09)20.32 (0.28)22.97 (0.28)
(100,300)4.35 (0.09)5.08 (0.10)8.73 (0.35)11.69 (0.44)12.53 (0.27)14.84 (0.21)18.72 (0.48)19.42 (0.32)
(200,300)4.59 (0.04)5.42 (0.03)10.62 (0.30)14.61 (0.31)18.91 (0.21)20.59 (0.24)23.09 (0.20)24.10 (0.14)
Model 2
(50,200)2.81 (0.27)4.23 (0.44)7.34 (0.48)9.42 (0.40)6.06 (0.33)7.29 (0.28)10.85 (0.44)11.78 (0.27)
(50,300)2.27 (0.21)3.35 (0.42)6.19 (0.49)7.47 (0.43)4.86 (0.26)5.88 (0.30)9.53 (0.35)10.77 (0.42)
(100,200)3.91 (0.36)4.61 (0.39)8.46 (0.27)10.95 (0.30)9.68 (0.25)13.17 (0.40)14.58 (0.21)16.69 (0.32)
(100,300)2.94 (0.17)3.73 (0.26)6.69 (0.39)8.52 (0.46)7.63 (0.29)10.83 (0.31)11.95 (0.37)13.65 (0.20)
(200,300)3.56 (0.27)4.46 (0.34)9.36 (0.24)11.48 (0.22)12.25 (0.36)15.81 (0.25)15.46 (0.32)18.42 (0.22)
Table 3. Comparison of different privacy levels ϵ with normal distributions under Model 2.
Table 3. Comparison of different privacy levels ϵ with normal distributions under Model 2.
Spectral NormFrobenius Norm
ϵ Σ ˜ + τ Σ ˜ 1 + Σ ˇ + τ Σ ˜ 2 + Σ ˜ + τ Σ ˜ 1 + Σ ˇ + τ Σ ˜ 2 +
Model 1
0.12.60 (0.06)3.04 (0.02)7.49 (0.23)9.84 (0.22)7.69 (0.21)11.14 (0.13)12.74 (0.26)17.54 (0.35)
0.41.51 (0.15)2.12 (0.06)5.25 (0.33)6.88 (0.38)5.95 (0.29)9.11 (0.17)9.82 (0.46)14.72 (0.32)
0.61.08 (0.14)1.70 (0.07)3.31 (0.25)4.07 (0.17)4.66 (0.14)7.64 (0.31)7.67 (0.09)12.67 (0.15)
0.71.03 (0.16)1.65 (0.10)3.03 (0.14)3.34 (0.16)4.45 (0.11)7.02 (0.33)7.16 (0.04)11.66 (0.06)
Table 4. Comparison of different privacy levels ϵ with t-distributions under Model 2.
Table 4. Comparison of different privacy levels ϵ with t-distributions under Model 2.
Spectral NormFrobenius Norm
ϵ Σ ˜ + τ Σ ˜ 1 + Σ ˇ + τ Σ ˜ 2 + Σ ˜ + τ Σ ˜ 1 + Σ ˇ + τ Σ ˜ 2 +
Model 1
0.16.98 (0.28)8.29 (0.28)15.03 (0.16)17.59 (0.17)12.99 (0.21)16.58 (0.47)19.70 (0.35)22.18 (0.24)
0.44.71 (0.39)5.72 (0.36)10.37 (0.42)12.88 (0.31)10.78 (0.43)14.34 (0.45)16.14 (0.18)18.16 (0.14)
0.63.02 (0.30)3.82 (0.37)6.90 (0.39)9.15 (0.21)8.89 (0.40)12.39 (0.42)13.25 (0.47)15.38 (0.22)
0.71.84 (0.41)2.55 (0.31)4.54 (0.19)6.28 (0.24)7.38 (0.41)10.65 (0.39)11.52 (0.30)13.22 (0.44)
Table 5. Comparison of different privacy levels δ with normal distributions under Model 1.
Table 5. Comparison of different privacy levels δ with normal distributions under Model 1.
Spectral NormFrobenius Norm
ϵ Σ ˜ + τ Σ ˜ 1 + Σ ˇ + τ Σ ˜ 2 + Σ ˜ + τ Σ ˜ 1 + Σ ˇ + τ Σ ˜ 2 +
Model 1
1/501.85 (0.14)2.47 (0.11)4.08 (0.21)4.92 (0.22)6.37 (0.16)8.67 (0.24)6.33 (0.31)8.62 (0.34)
1/1001.99 (0.14)2.58 (0.09)4.49 (0.19)5.51 (0.27)6.47 (0.47)8.89 (0.36)7.66 (0.26)11.14 (0.15)
1/2002.08 (0.11)2.62 (0.09)4.92 (0.27)6.43 (0.45)6.68 (0.21)9.12 (0.29)9.17 (0.18)13.09 (0.17)
1/5002.32 (0.14)2.96 (0.08)5.63 (0.31)7.97 (0.25)7.04 (0.17)9.66 (0.22)11.15 (0.17)15.68 (0.21)
Table 6. Comparison of different privacy levels δ with t-distributions under Model 1.
Table 6. Comparison of different privacy levels δ with t-distributions under Model 1.
Spectral NormFrobenius Norm
ϵ Σ ˜ + τ Σ ˜ 1 + Σ ˇ + τ Σ ˜ 2 + Σ ˜ + τ Σ ˜ 1 + Σ ˇ + τ Σ ˜ 2 +
Model 1
1/504.40 (0.07)5.29 (0.06)6.95 (0.35)10.31 (0.28)13.42 (0.19)16.22 (0.21)17.94 (0.40)19.17 (0.36)
1/1004.64 (0.06)5.38 (0.05)8.01 (0.29)11.24 (0.14)13.69 (0.20)16.39 (0.13)18.70 (0.37)20.56 (0.39)
1/2005.73 (0.08)5.45 (0.03)8.91 (0.35)13.00 (0.21)13.94 (0.18)16.55 (0.13)19.10 (0.18)21.96 (0.28)
1/5005.05 (0.04)5.78 (0.02)10.51 (0.39)14.47 (0.26)14.33 (0.17)16.93 (0.07)20.66 (0.38)23.37 (0.17)
Table 7. Comparison of different outputs of algorithms.
Table 7. Comparison of different outputs of algorithms.
Average Number of Misclassifications
p Σ ˜ + τ Σ ˜ 1 + Σ ˇ + τ Σ ˜ 2 +
252.255.704.306.45
403.204.955.156.90
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Li, H.; Wang, J. Differentially Private Sparse Covariance Matrix Estimation under Lower-Bounded Moment Assumption. Mathematics 2023, 11, 3670. https://doi.org/10.3390/math11173670

AMA Style

Li H, Wang J. Differentially Private Sparse Covariance Matrix Estimation under Lower-Bounded Moment Assumption. Mathematics. 2023; 11(17):3670. https://doi.org/10.3390/math11173670

Chicago/Turabian Style

Li, Huimin, and Jinru Wang. 2023. "Differentially Private Sparse Covariance Matrix Estimation under Lower-Bounded Moment Assumption" Mathematics 11, no. 17: 3670. https://doi.org/10.3390/math11173670

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop