Covariate-Adjusted Precision Matrix Estimation Under Lower Polynomial Moment Assumption

Shuwei Hu

doi:10.3390/math13213562

Abstract

Multiple regression analysis has a wide range of applications. The analysis of error structures in regression model

Y = Γ X + Z

has also attracted much attention. This paper focuses on large-scale precision matrix of the error vector that only has lower polynomial moments. We mainly study upper bounds of the proposed estimator under different norms in term of the probability estimation. It is shown that our estimator achieves the same optimal convergence order as under Gaussian assumption on the data. Simulation experiments further validate that our method has advantages.

Keywords:

high-dimensional data analysis; non-Gaussian errors; lower polynomial moment; precision matrix estimation

MSC:

62H12; 15A09; 62H10

1. Introduction

Multivariate regression methods are growing increasingly important in numerous fields, including chemometrics [1], neuroscience [2], genomics [3], and finance [4]. In genomics research, multivariate models connect gene expression

Y

to genetic variants

X

and infer gene regulatory networks via Gaussian graphical models (GGMs)—a process that relies heavily on precision matrices. As emphasized by Liu and Yu [5], the error structure (closely linked to precision matrices) of the multivariate linear regression model for responses,

Y = Γ X + Z

, is critical for reliable inference but often overlooked in high dimensions settings. Compared with the covariance matrix, the precision matrix of the random error

Z

is more directly interpretable (e.g., it captures conditional dependence relationships in gene networks) and more relevant to practical applications, thus attracting considerable research attention.

Cai et al. [6] proposed CLIME (Constrained

l_{1}

Minimization Estimation) method for precision matrix and obtained probability estimation under exponential and polynomial moment conditions, respectively. Cai et al. [7] further developed the ACLIME (Adaptive Constrained

l_{1}

Minimization Estimation) method and established optimal convergence rate for precision matrix under Gaussian assumption on the data. By replacing the sample covariance matrix with a pilot estimator, Avella-Medina et al. [8] discussed optimal precision matrix estimation under finite 2 +

ε

moment condition. This robust estimation framework focuses on independent random vectors, meaning it cannot address the coupling between regression effects and error structures.

While these methods have achieved notable progress in precision matrix estimation, their frameworks primarily focus on single random vectors (e.g., univariate responses) and fail to account for regression relationships between multiple variables. Cai et al. [9] proposed a two-stage constrained

l_{1}

minimization approach to estimate the coefficient matrix and the precision matrix of sub-Gaussian error

Z

, which means the data requires finite moments of any order. Chen et al. [10] used a scaled Lasso and pairwise regression strategy to estimate these two matrices, which considers Gaussian data and does not provide any convergence rate. Ten et al. [11] developed a bias-corrected method to estimate the covariance matrix of the error vector

Z

, but this method cannot infer the conditional independence between error variables. Additionally, its robustness is based on the Gaussian distribution framework, making it inadequate for in-depth analysis of error structures in non-Gaussian scenarios.

This paper adopts a two-stage constrained

l_{1}

minimization method to investigate probability precision matrix estimation for error vectors that only possess

4 γ + 4 + ε (γ, ε > 0)

order moment. Our theoretical results demonstrate that the proposed estimator not only achieves the same convergence rate as Cai et al.’s method but also exhibits superior numerical performance.

Throughout this paper, we use

X ≲ Y

to denote

X \leq C Y

for some positive constant C, and

X ≳ Y

denotes

Y ≲ X

. For a vector

β = {(β_{1}, \dots, β_{p})}^{T} \in R^{p}

, we define

{∥ β ∥}_{1} = \sum_{i = 1}^{p} |β_{i}|

and

{∥ β ∥}_{\infty} = max_{i} |β_{i}|

. For a matrix

A = (a_{i j}) \in R^{p \times q}

,

{∥ A ∥}_{max} = max_{i j} | a_{i j} |

,

{∥ A ∥}_{l_{1}} = max_{1 \leq j \leq q} \sum_{i = 1}^{p} |a_{i j}|

,

{∥ A ∥}_{l_{\infty}} = max_{1 \leq i \leq p} \sum_{j = 1}^{q} |a_{i j}|

,

{∥ A ∥}_{1} = \sum_{i = 1}^{p} \sum_{j = 1}^{q} | a_{i j} |

,

{∥ A ∥}_{F} = (\sum_{i = 1}^{p} \sum_{j = 1}^{q} | a_{i j} {|^{2})}^{1 / 2}

and

{∥ A ∥}_{2}

stands for the spectrum norm of a matrix

A

.

The rest is organized as follows. Section 2 details the construction of the proposed estimator; Section 3 presents the theoretical properties of the estimator; Section 4 reports simulation studies and an analysis of human gut microbiome data; Section 5 provides discussions. All technical proofs are included in Appendix A.

2. Preliminaries

We begin with a genetical genomics dataset. Let

Y = {(Y_{1}, \dots, Y_{p})}^{T}

be a p-dimensional random vector of gene expressions and

X = {(X_{1}, \dots, X_{q})}^{T}

be a q-dimensional random vector of genetic markers. We consider the following multiple regression model:

Y = Γ X + Z,

(1)

where

Γ \in R^{p \times q}

is an unknown coefficient matrix. The random vector

Z = {(Z_{1}, \dots, Z_{p})}^{T}

has a mean of

0

, a covariance matrix

Σ = (σ_{i j}) = E Z Z^{T}

, and a precision matrix

Ω = Σ^{- 1} = (ω_{i j}) = (ω_{1}, \dots, ω_{p})

. Assuming that

X

and

Z

are independent, we observe n to be independent and identically distributed (i.i.d) random samples

(X_{1}, Y_{1}), \dots, (X_{n}, Y_{n})

generated from the model (1). Accordingly, the samples of

Z

can be derived from the model (1).

In genomic studies, the coefficient matrix

Γ

in the model (1) is sparse, as each gene is typically regulated by only a small number of genetic regulators. Similarly, the precision matrix

Ω

is expected to be sparse, reflecting the fact that genetic interaction networks generally exhibit limited connectivity. Motivated by these sparsity properties and the need to establish rigorous convergence rates for estimators, we introduce the following conditions.

Condition 1.

For some constants

γ, c > 0

, the dimensions satisfy

(p \lor q) \leq c n^{γ}

, where

p \lor q = max {p, q}

. Additionally, there exist constants

ε > 0

and

K > 0

such that

max_{1 \leq i \leq q} E | X_{i} |^{4 γ + 4 + ε} \leq K, max_{1 \leq i \leq p} E | Z_{i} |^{4 γ + 4 + ε} \leq K, max_{1 \leq j \leq p} E {| Z^{T} ω_{j} |}^{4 γ + 4 + ε} \leq K .

Condition 2.

The regression coefficient matrix

Γ \in V_{δ_{1}} (s_{1} (q))

with

0 ⩽ δ_{1} < 1

, where

V_{δ_{1}} (s_{1} (q)) = {Γ \in R^{p \times q} : max_{1 ⩽ i ⩽ p} \sum_{j = 1}^{q} {|γ_{i j}|}^{δ_{1}} ⩽ s_{1} (q)} .

Condition 3.

The precision matrix

Ω = {(ω_{i j})}_{p \times p} \in U_{δ_{2}} (s_{2} (p))

with

0 ⩽ δ_{2} < 1

, where

\begin{matrix} U_{δ_{2}} (s_{2} (p)) = {Ω > 0 : {∥ Ω ∥}_{l_{\infty}} ⩽ M_{p}, max_{1 ⩽ i ⩽ p} \sum_{j = 1}^{p} {|ω_{i j}|}^{δ_{2}} ⩽ s_{2} (p)} . \end{matrix}

Condition 4.

There exists some

N_{q} > 0

such that

{∥ Σ_{X}^{- 1} ∥}_{l_{\infty}} ⩽ N_{q}

, where

Σ_{X}

is the covariance matrix of

X

.

Remark 1.

Condition 1 imposes a polynomial moment constraint on

X

,

Z

, and

Z^{T} ω_{j}

. This condition is weaker than the sub-Gaussian moment assumption adopted by Cai et al. [9], as sub-Gaussian random variables require finite moments of all orders. The dimension growth constraint

(p \lor q) \leq c n^{γ}

implies that the growth rates of p and q are controlled by a polynomial function

n^{γ}

of the sample size n, with γ governing the maximum allowable growth rate of the dimensions relative to n.

Conditions 2 and 3 formalize the sparsity assumptions for Γ and Ω, respectively, where

s_{1} (q)

and

s_{2} (p)

represent the sparse measurements. Under the dimension growth constraint in Condition 1, it follows that

log (p q) = o (n)

. This ensures that the sparsity measures

s_{1} (q)

,

s_{2} (p)

and the convergence rates of the proposed estimators are all consistent with the polynomial growth of dimensions (see Theorems 2 and 3). Similar parameter spaces have been adopted in Bickel and Levina [12], Cai et al. [6] and Avella-Medina et al. [8]. However, our framework imposes no eigenvalue restrictions.

Note that

Σ_{X}^{- 1}

corresponds to the precision matrix of

X

, so the constraint in Condition 4 is analogous to the bounded norm assumption for Ω in Condition 3.

The following details the construction process of the proposed estimators.

Using the independent and identically distributed random samples

(X_{1}, Y_{1}), \dots, (X_{n}, Y_{n})

, we define the sample means as

\bar{X} = 1 / n \sum_{k = 1}^{n} X_{k}

,

\bar{Y} = 1 / n \sum_{k = 1}^{n} Y_{k}

and

\bar{Z} = 1 / n \sum_{k = 1}^{n} Z_{k}

. From the model (1), it follows that

Y_{k} - \bar{Y} = Γ (X_{k} - \bar{X}) + Z_{k} - \bar{Z} .

Similar to the sample covariance matrix, we define the following sample matrix,

{\hat{Σ}}_{Y X} : = 1 / n \sum_{k = 1}^{n} (Y_{k} - \bar{Y}) {(X_{k} - \bar{X})}^{T}

,

{\hat{Σ}}_{X X} : = 1 / n \sum_{k = 1}^{n} (X_{k} - \bar{X}) {(X_{k} - \bar{X})}^{T}

and

{\hat{Σ}}_{Z X} : = 1 / n \sum_{k = 1}^{n} (Z_{k} - \bar{Z}) {(X_{k} - \bar{X})}^{T}

. It is straightforward to verify that

{\hat{Σ}}_{Y X} - Γ {\hat{Σ}}_{X X} = {\hat{Σ}}_{Z X}

.

To estimate

Γ

and

Ω

, we first construct an estimator for

Γ

via an optimization approach. Let

\hat{Γ} = {({\hat{γ}}_{1}, \dots, {\hat{γ}}_{p})}^{T}

,

{\hat{Σ}}_{Y X} = ({\hat{Σ}}_{Y X, 1}, \dots, {\hat{Σ}}_{Y X, p})

. For each

j \in {1, \dots, p}

,

{\hat{γ}}_{j} : = arg min_{β_{j} \in R^{q}} {{∥ β_{j} ∥}_{1}, {∥ {\hat{Σ}}_{Y X, j} - β_{j}^{T} {\hat{Σ}}_{X X} ∥}_{max} \leq λ_{n, p}},

(2)

where

λ_{n, p} = C_{1} \sqrt{log (p q) / n}

is a tuning parameter with a positive constant

C_{1} > 0

. It follows that the above row-wise optimization problems are equivalent to the matrix-level optimization problem,

\hat{Γ} \in min_{Γ \in R^{p \times q}} {{∥ Γ ∥}_{1}, ∥ {\hat{Σ}}_{Y X} - Γ {\hat{Σ}}_{X X} ∥_{max} \leq λ_{n, p}} .

Secondly, we construct an estimator for

Ω

. Substitute

\hat{Γ}

into the model (1), and let

{\hat{Σ}}_{Y Y} = 1 / n \sum_{k = 1}^{n} (Y_{k} - \hat{Γ} X_{k}) {(Y_{k} - \hat{Γ} X_{k})}^{T}

. We estimate

Ω

by CLIME method proposed by Cai et al. [6]. Let

{\hat{Ω}}^{1} : = ({\hat{ω}}_{1}^{1} \dots {\hat{ω}}_{p}^{1})

and

{\hat{ω}}_{j}^{1} : = arg min_{β_{j} \in R^{p}} {{∥ β_{j} ∥}_{1}, ∥ {\hat{Σ}}_{Y Y} β_{j} - e_{j} ∥_{max} \leq τ_{n, p}}, j \in {1, \dots, p},

(3)

where

e_{j}

is the j-th standard basis vector in

R^{p}

, and

τ_{n, p} = C_{2} \sqrt{log (p q) / n}

is a tuning parameter with a constant

C_{2} > 0

. We then symmetrize

{\hat{Ω}}^{1}

to obtain the final estimator

\hat{Ω} = ({\hat{ω}}_{i j})

, defined as

{\hat{ω}}_{i j} : = {\hat{ω}}_{i j}^{1} 1 (| {\hat{ω}}_{i j}^{1} | \leq | {\hat{ω}}_{j i}^{1} |) + {\hat{ω}}_{j i}^{1} 1 (| {\hat{ω}}_{i j}^{1} | > | {\hat{ω}}_{j i}^{1} |),

where

1 (\cdot)

denotes the indicator function.

Remark 2.

We obtain the proposed estimators via two key steps, corresponding to Equations (2) and (3). In the case where

q = 0

, estimation of Γ is unnecessary, and the precision matrix estimator can be directly derived via (3) (CLIME, Cai et al. [6]). When

q = 1

, several methods have been developed to estimate Γ, including those based on

l_{1}

minimization (Tibshirani [13]) and the Dantzig selector (Candes and Tao [14]). In this paper, we consider the high-dimensional setting where both p and q may be large, but satisfy

(p \lor q) \leq n^{γ}

for some constant

γ > 0

.

3. Main Result

In this section, we present our main results.

Theorem 1.

Under Condition 1, there exists a constant

C = C (γ, δ, ε, K) > 0

such that

P {∥ {\hat{Σ}}_{Y X} - Γ {\hat{Σ}}_{X X} ∥_{max} \leq C \sqrt{\frac{log (p q)}{n}}} = 1 - O ({(p q)}^{- δ} + n^{- ε / 4}),

(4)

P {∥ {\hat{Σ}}_{X X} - Σ_{X} ∥_{max} \leq C \sqrt{\frac{log (p q)}{n}}} = 1 - O ({(p q)}^{- δ} + n^{- ε / 8})

(5)

and

P {∥ {\hat{Σ}}_{Z Z} Ω - I_{p} ∥_{max} \leq C \sqrt{\frac{log (p q)}{n}}} = 1 - O ({(p q)}^{- δ} + n^{- ε / 4}) .

(6)

Remark 3.

As shown in Equation (2), Equation (4) provides the theoretical foundation for constructing the estimator

\hat{Γ}

. Furthermore, Equations (5) and (6) indicate that

{\hat{Σ}}_{X X}

serves as a pilot estimator of the covariance matrix

Σ_{X}

, while

{\hat{Σ}}_{Z Z}

is a pilot estimator of the precision matrix Ω; see Avella-Medina et al. [8].

The following results establish error bounds for the coefficient matrix estimator

\hat{Γ}

under different matrix norms.

Theorem 2.

Suppose Conditions 1, 2 and 4 hold. Let

Γ \in V_{δ_{1}} (s_{1} (q))

with

s_{1} (q) = o (N_{q}^{δ_{1} - 1} {(\frac{n}{log (p q)})}^{\frac{1 - δ 2}{)}} .

Then, for some constant

C = C (γ, δ, ε, K) > 0

, the estimator

\hat{Γ}

defined in (2) satisfies

P {∥ \hat{Γ} - Γ ∥_{l_{\infty}} \leq C N_{q}^{1 - δ_{1}} s_{1} (q) {(\frac{log (p q)}{n})}^{\frac{1 - δ_{1}}{2}}} = 1 - O ({(p q)}^{- δ} + n^{- ε / 4}),

(7)

P {∥ \hat{Γ} - Γ ∥_{max} \leq C N_{q} \sqrt{\frac{log (p q)}{n}}} = 1 - O ({(p q)}^{- δ} + n^{- ε / 4})

(8)

and

P {\frac{1}{p} {∥ \hat{Γ} - Γ ∥}_{F} \leq C N_{q}^{2 - δ_{1}} s_{1} (q) {(\frac{log (p q)}{n})}^{1 - \frac{δ_{1}}{2}}} = 1 - O ({(p q)}^{- δ} + n^{- ε / 4}) .

(9)

Remark 4.

Since

N_{q}

is bounded and

(p \lor q) \leq n^{γ}

implies that

\sqrt{log (p q) / n} \to 0

as

n \to \infty

, the sparsity condition

s_{1} (q) = o (N_{q}^{δ_{1} - 1} {(n / log (p q))}^{(1 - δ_{1}) / 2})

in Theorem 2 ensures that the upper bound of the above probability tends to zero. Moreover, since

\sqrt{n / log (p q)} \to \infty

as

n \to \infty

, this sparsity requirement is mild.

Theorem 3.

Suppose Conditions 1–4 hold. Let

Γ \in V_{δ_{1}} (s_{1} (q))

and

Ω \in U_{δ_{2}} (s_{2} (p))

with

s_{2} (p) ≲ {(1 + M_{p})}^{- 1} N_{q}^{δ_{1} - 2} {(n / log (p q))}^{(1 - δ_{1}) / 2}

. Then, for some constant

C = C (γ, δ, ε, K) > 0

, the estimator

\hat{Ω}

defined in (2) satisfies

P {∥ \hat{Ω} - Ω ∥_{max} \leq C M_{p} \sqrt{\frac{log (p q)}{n}}} = 1 - O ({(p q)}^{- δ} + n^{ε / 8})

(10)

and

P {∥ \hat{Ω} - Ω ∥_{2} \leq C M_{p}^{1 - δ_{2}} s_{2} (p) {(\frac{log (p q)}{n})}^{\frac{1 - δ_{1}}{2}}} = 1 - O ({(p q)}^{- δ} + n^{ε / 8}) .

(11)

Remark 5.

Since

M_{p}

is a positive deterministic sequence, which may be bounded or slowly diverging as n and p grow, the condition

s_{2} (p) \leq C {(1 + M_{p})}^{- 1} N_{q}^{δ_{1} - 2} {(\frac{n}{log (p q)})}^{(1 - δ_{1}) / 2}

implies the corresponding condition in Theorem 2. Moreover, the convergence rate given in (11) matches that established by Cai et al. [9], confirming its optimality.

4. Numerical Results

4.1. Simulation Analysis

In this section, we evaluate the performance of the proposed method through simulation studies and compare it with two existing methods proposed by Cai et al. [6] and Friedman et al. [15].

For the

p \times q

coefficient matrix

Γ = (γ_{i j})

, the sparsity level is controlled by the parameter

s_{1}

. Specifically, each element

γ_{i j}

is generated independently from

U (- 1, 1) \times Ber (1, s_{1})

, where

U (- 1, 1)

denotes the uniform distribution over

[- 1, 1]

, and

Ber (1, s_{1})

denotes a Bernoulli random variable that takes 1 with probability

s_{1}

and 0 with probability

1 - s_{1}

. This generation mechanism implies

γ_{i j}

is non-zero with probability

s_{1}

(and zero otherwise), so a smaller

s_{1}

indicates a higher sparsity level for

Γ

.

The sparsity of precision matrix

Ω = (ω_{i j})

is controlled by the parameter

s_{2}

. To ensure

Ω

is positive definite, we first construct a matrix

B = (b_{i j})

with each element

b_{i j}

generated independently from

U (- 1, 1) \times Ber (1, s_{2})

(consistent with the sparsity mechanism of

Γ

). We then define

Ω = B + ε I_{p}

, where

ε = max (- λ_{min} (B), 0) + 0.01

ensures

Ω

is positive definite.

We simulate covariate vectors

X_{k} \overset{i . i . d .}{\sim} t (0, Σ, 10)

and error vectors

Z_{k} \overset{i . i . d .}{\sim} t (0, Σ, 10)

for

k = 1, \dots, 100

, where

t (0, Σ, 10)

stands for a multivariate Student-t distribution with 10 degrees of freedom and covariance matrix

Σ = Ω^{- 1}

. The response vectors

Y_{k}

are then computed via the regression model

Y_{k} = Γ X_{k} + Z_{k}

.

In the following experiments, we control the sparsity of

Γ

and

Ω

using

s_{1}

and

s_{2}

, respectively, and consider three models with distinct dimensionality and sparsity settings:

Model 1:

(p, q, n, s_{1}, s_{2}) = (50, 30, 100, 0.1, 0.1)

;

Model 2:

(p, q, n, s_{1}, s_{2}) = (80, 60, 160, 0.08, 0.08)

;

Model 3:

(p, q, n, s_{1}, s_{2}) = (200, 150, 300, 0.05, 0.05)

.

The three models correspond to three dimensions: low, medium, and high. Such differences allow us to examine how methods perform under varying degrees of dimensionality and sparsity.

The tuning parameters

λ_{n, p}

and

τ_{n, p}

are selected using five-fold cross-validation (CV) for our estimator. Specifically, we divide all the samples into five disjoint subgroups (also known as folds), and let

T_{v}

denote the index set of samples in the v-th fold (

v = 1, \dots, 5

).

Define the five fold CV score as

CV (λ_{n, p}, τ_{n, p}) = \sum_{v = 1}^{5} [log (det ({\hat{Ω}}_{- v} (λ_{n, p}, τ_{n, p}))) - tr ({\hat{Σ}}_{YY, v} {\hat{Ω}}_{- v} (λ_{n, p}, τ_{n, p}))],

where

n_{v}

is the number of samples in

T_{v}

and

{\hat{Σ}}_{YY, v} = \frac{1}{n_{v}} \sum_{k \in T_{v}} (Y_{k} - {\hat{Γ}}_{- v} (λ_{n, p}) X_{k}) {(Y_{k} - {\hat{Γ}}_{- v} (λ_{n, p}) X_{k})}^{T} .

Here,

{\hat{Ω}}_{- v} (λ_{n, p}, τ_{n, p})

and

{\hat{Γ}}_{- v} (λ_{n, p})

are estimates of

Ω

and

Γ

computed using the sample set

(⋃_{v = 1}^{5} T_{v}) ∖ T_{v}

.

We then determine the optimal tuning parameters as

(λ_{n, p}^{*}, τ_{n, p}^{*}) = arg max CV (λ_{n, p}, τ_{n, p})

, and use this pair to compute the final estimates of

Γ

and

Ω

with the full sample set.

The proposed method is compared with two existing approaches, those of Cai et al. [6] and Friedman et al. [15], which do not account for covariate effects. We apply the same loss function

log (det (Ω)) - tr ({\hat{Σ}}_{YY} Ω)

to select the tuning parameters for those methods with

{\hat{Σ}}_{YY} = \frac{1}{n} \sum_{k = 1}^{n} (Y_{k} - \bar{Y}) {(Y_{k} - \bar{Y})}^{T}

.

Based on 50 independent replications, we compute the average errors (standard errors) under three norms for CLIME, GLASSO and our method.

Table 1 reports the numerical results. In Model 1, the standard errors of the three methods are comparable, but the proposed method still exhibits advantages due to its smaller average errors. As the model setting changes, with dimensionality increasing and sparsity becoming stricter (Models 2 and 3), the proposed method outperforms the compared methods across all three norms.

Table 1. Average errors (standard errors) under three methods.

In summary, the results confirm that our method is particularly advantageous in medium-to-high-dimensional and highly sparse settings, where covariate adjustment plays a critical role in reducing estimation bias and improving stability. Its performance gains become more prominent as dimensionality increases, making it suitable for modern high-dimensional data analysis.

4.2. Application to Real Data

In this section, we apply our precision matrix estimator

\hat{Ω}

to analyze the human gut microbiome dataset from Wu et al. (Science, 2011) [16], which has also been studied by Cao et al. [17], He et al. [18], Li et al. [19], and Zhang et al. [20]. Our focus is on identifying differences in latent bacterial genus correlations between lean and obese individuals. The dataset includes 98 healthy subjects, with 63 classified as lean (BMI

< 25

) and 35 as obese (BMI

\geq 25

).

Data Preprocessing and Network Construction

To ensure reliable signals, we retained bacterial genera present in at least 20% of samples within each group. This filtering step resulted in 30 retained bacterial genera, so the dimension $p = 30$ .
Zero counts in the filtered dataset were imputed with 0.5 and raw counts were normalized to relative abundances per sample to account for varying sequencing depths.
For both groups, tuning parameters $λ_{n, p}$ and $τ_{n, p}$ were selected via five-fold cross-validation; see Section 4.1. To evaluate the stability of support recovery, we generated 63 bootstrap samples for the lean group and 35 for the obese group, repeated the analysis 100 times, and calculated the average occurrence frequency of each edge. Edges with a frequency ≥ 50% (appearing in at least 50 of 100 resamplings) were retained as “stable edges” for final network construction.

Quantitative Characteristics of Microbial Networks

Table 2 summarizes key quantitative features of the inferred networks, including structural complexity, association patterns, and stability.

Table 2. Quantitative characteristics of gut microbial networks in lean and obese groups.

Network Visualization and Interpretation

Figure 1 and Figure 2 visualize the stable microbial networks for the lean and obese groups, respectively, to intuitively display differences in conditional associations between bacterial genera.

Figure 1. Conditional correlation network of gut microbiota in the lean group (BMI

< 25

). Nodes represent 30 retained bacterial genera. Edges denote stable associations (bootstrap frequency

\geq 50 %

): green lines indicate positive correlations, red lines indicate negative correlations, and darker colors represent stronger associations.

Figure 2. Conditional correlation network of gut microbiota in the obese group (BMI

\geq 25

). Nodes represent 30 retained bacterial genera. Edges denote stable associations (bootstrap frequency

\geq 50 %

): green lines indicate positive correlations, red lines indicate negative correlations, and darker colors represent stronger associations.

Biological Implications of Network Differences

Our analysis reveals meaningful differences between the gut microbial networks of lean and obese individuals, which align with and extend prior findings in gut microbiome research:

Predominant competitive conditional interactions: Both groups exhibit more negative than positive correlations between bacterial genera (lean group: 71.4% negative correlations; obese group: 60.0% negative correlations). This result is consistent with the findings of Cao et al. [17], Wang et al. [21], Zhang et al. [20] and Coyte et al. [22], and supports the notion that gut microbial interactions are primarily competitive.
Reduced network complexity in obesity: The obese group had fewer stable edges (five, compared to seven in the lean group) and a lower mean edge strength (0.25, compared to 0.32 in the lean group). These observations indicate weakened conditional associations between bacterial genera in obese individuals, suggesting a decline in gut microbial network complexity.
Network stability: The lean group’s network had a higher stability score (0.72, compared to 0.58 in the obese group), confirming more reproducible conditional associations and reflecting a robust gut microbial structure. In contrast, the lower stability of the obese group’s network suggested greater inter-individual variability in microbial interactions, a well-documented hallmark of obesity-related gut dysbiosis. This finding also aligned with prior reports of reduced modularity in obese gut microbial networks (Greenblum et al. [23]).

These results validate our multivariate method’s ability to capture biologically meaningful microbial interactions, highlighting competitive balance and core taxa preservation as key to gut health.

5. Conclusions

This paper proposes a two-stage procedure for covariate-adjusted precision matrix estimation in high dimensions. The error vector only requires a bounded lower polynomial moment condition, which is a weaker assumption than the sub-Gaussian conditions of existing methods. We establish non-asymptotic probabilistic upper bounds for the coefficient estimator

\hat{Γ}

and precision matrix estimator

\hat{Ω}

under multiple matrix norms. Despite relaxed assumptions, both estimators achieve optimal convergence rates matching those from stronger sub-Gaussian frameworks. Numerical simulations confirm the method outperforms covariate-unadjusted approaches (Cai et al. [6], Friedman et al. [15]). In a gut microbiome analysis, it successfully captures meaningful microbial network differences between lean and obese groups. Future work may extend this approach to other low moment condition pilot estimators (Avella-Medina et al. [8]).

Funding

This paper is supported by the National Natural Science Foundation of China (No. 12171016).

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Conflicts of Interest

The author declares no conflict of interest.

Appendix A

First, we use Bernstein inequality to prove a lemma.

Theorem A1

(Massart (2007) [24]). Assume that

Z_{1}, \dots, Z_{n}

are i.i.d. random variables,

|Z_{k}| \leq M (k = 1, \dots, n)

and

\sum_{k = 1}^{n} E Z_{k}^{2} \leq b_{n}^{2}

. Then, for

t > 0

,

P {| \sum_{k = 1}^{n} (Z_{k} - E Z_{k}) | \geq b_{n} \sqrt{2 t} + \frac{M t}{3}} \leq 2 e^{- \frac{t}{2}} .

Lemma A1.

Assume Condition 1 holds. Then, for any fixed

δ > 0

, there exists a constant

C = C (γ, δ, ε, K) > 0

such that

P {max_{1 \leq i \leq q} | \sum_{k = 1}^{n} X_{k i} | \geq C \sqrt{n log (p q)}} = O ({(p q)}^{- δ} + n^{- \frac{ε}{4}}),

(A1)

P {max_{1 \leq i \leq q, 1 \leq j \leq p} | \sum_{k = 1}^{n} X_{k i} Z_{k j} | \geq C \sqrt{n log (p q)}} = O ({(p q)}^{- δ} + n^{- \frac{ε}{4}}),

(A2)

P {max_{1 \leq i, j \leq q} | \sum_{k = 1}^{n} (X_{k i} X_{k j} - E X_{k i} X_{k j}) | \geq C \sqrt{n log (p q)}} = O ({(p q)}^{- δ} + n^{- \frac{ε}{8}})

(A3)

and

P {max_{1 \leq i, j \leq q} | \sum_{k = 1}^{n} (Z_{k i} Z_{k}^{T} ω_{j} - E Z_{k i} Z_{k}^{T} ω_{j}) | \geq C \sqrt{n log (p q)}} = O ({(p q)}^{- δ} + n^{- \frac{ε}{8}}) .

(A4)

Proof.

We first prove inequality (A1). Since

E X_{k i} = 0

, it suffices to show

I : = P {max_{1 \leq i \leq q} | \sum_{k = 1}^{n} (X_{k i} - E X_{k i}) | \geq C \sqrt{n log (p q)}} = O ({(p q)}^{- δ} + n^{- \frac{ε}{4}}) .

(A5)

By the triangle inequality for probability, it follows that

\begin{matrix} I \leq P {max_{1 \leq i \leq q} | \sum_{k = 1}^{n} (X_{k i} 1 (| X_{k i} | > n^{1 / 4}) - E X_{k i} 1 (| X_{k i} | > n^{1 / 4})) | \geq \frac{C}{2} \sqrt{n log (p q)}} \\ + P {max_{1 \leq i \leq q} | \sum_{k = 1}^{n} (X_{k i} 1 (| X_{k i} | \leq n^{1 / 4}) - E X_{k i} 1 (| X_{k i} | \leq n^{1 / 4})) | \geq \frac{C}{2} \sqrt{n log (p q)}} : = I_{1} + I_{2} . \end{matrix}

(A6)

To estimate

I_{1}

, note that

E | X_{k i} |^{4 γ + 4 + ε} ≲ 1

. Then,

\sum_{k = 1}^{n} | E X_{k i} 1 (| X_{k i} | > n^{1 / 4}) | \leq n max_{i} E (| X_{k i} |^{3} n^{1 / 2}) ≲ \sqrt{n log p}

and

\begin{matrix} I_{1} & \leq P {max_{1 \leq i \leq q} | \sum_{k = 1}^{n} X_{k i} 1 (| X_{k i} | > n^{1 / 4}) | > C \sqrt{n log p}} \leq P {\cup_{i = 1}^{q} \cup_{k = 1}^{n} {| X_{k i} | > n^{1 / 4}}} \\ \leq q n max_{k, i} P {| X_{k i} |^{4 γ + 4 + ε} > n^{(4 γ + 4 + ε) / 4}} \leq n^{γ + 1} n^{- (γ + 1 + ε / 4)} = n^{- \frac{ε}{4}} \end{matrix}

(A7)

due to Markov inequality and

max_{i} E {| X_{k i} |}^{4 γ + 4 + ε} ≲ 1

.

In order to estimate

I_{2}

, define

ξ_{k i} : = X_{k i} 1 (| X_{k i} | \leq n^{1 / 4}) - E X_{k i} 1 (| X_{k i} | \leq n^{1 / 4})

, then

E ξ_{k i} = 0 (k = 1, \dots, n)

,

| ξ_{k i} | \leq 2 n^{1 / 4}

and

\sum_{k = 1}^{n} E ξ_{k i}^{2} \leq \sum_{k = 1}^{n} E X_{k i}^{2} ≲ K^{2 / (4 γ + 4 + ε)} n

.

Apply Theorem A1 with

t = 2 δ log (p q)

and one obtains

P {| \sum_{k = 1}^{n} ξ_{k i} | \geq K^{1 / (4 γ + 4 + ε)} \sqrt{n} \sqrt{4 δ log (p q)} + \frac{2 n^{1 / 4}}{3} 2 δ log (p q)} \leq 2 {(p q)}^{- δ} .

By

(p \lor q) \leq c n^{γ}

, one obtains

\sqrt{log (p q)} ⩽ n^{1 / 4}

. Therefore,

n^{1 / 4} log (p q) \leq \sqrt{n log (p q)}

and

I_{2} = P {| \sum_{k = 1}^{n} η_{k i} | \geq C \sqrt{n log (p q)}} \leq 2 {(p q)}^{- δ}

with

C > 2 K^{1 / (4 γ + 4 + ε)} \sqrt{δ} + 4 δ / 3

. This with (A6) and (A7) leads to (A1).

Since

X

and

Z

are independent,

E X_{k i} Z_{k j} = E X_{k i} \cdot E Z_{k j} = 0

. By Condition 1,

E | X_{k i} Z_{k j} |^{4 γ + 4 + ε} = E | X_{k i} |^{4 γ + 4 + ε} \cdot E {| Z_{k j} |}^{4 γ + 4 + ε} ≲ 1

. Replacing

X_{k i}

with

X_{k i} Z_{k j}

and repeating the argument for (A1) yields (A2).

To prove (A3), let

η_{k} = X_{k i} X_{k j}

and define

J : = P {max_{1 \leq i, j \leq q} | \sum_{k = 1}^{n} (η_{k} - E η_{k}) | \geq C \sqrt{n log (p q)}}

. Split

J

into tail and truncated parts:

J \leq J_{1} + J_{2},

where

J_{1}

corresponds to

η_{k} 1 (| η_{k} | > \sqrt{n / log (p q)})

and

J_{2}

to the truncated term.

Note that

E | η_{k} |^{2} \leq {max}_{i} E {| X_{k i} |}^{4} ≲ 1

. Then,

\sum_{k = 1}^{n} | E η_{k} 1 (| η_{k} | > \sqrt{\frac{n}{log (p q)}}) | \leq n max_{k} E (| η_{k} |^{2} \sqrt{\frac{log (p q)}{n}}) ≲ \sqrt{n log (p q)} .

Moreover,

J_{1} \leq P {max_{i} | \sum_{k = 1}^{n} η_{k} 1 (| η_{k} | > \sqrt{\frac{n}{log (p q)}}) | > C \sqrt{n log (p q)}} .

This with

| η_{k} | \leq 1 / 2 (X_{k i}^{2} + X_{k j}^{2})

,

(p \lor q) \leq c n^{γ}

and Condition 1 give

J_{1} \leq q n max_{k, i} P {| X_{k i} |^{2} > \sqrt{\frac{n}{log (p q)}}} ≲ n^{γ + 1} {(\frac{log (p q)}{n})}^{γ + 1 + ε / 4} ≲ \frac{{(log (p q))}^{γ + 1 + ε / 4}}{n^{ε / 4}} ≲ n^{- ε / 8} .

To estimate

J_{2}

, apply Bernstein’s inequality to the truncated

η_{k}

(bounded by

2 \sqrt{n / log (p q)}

), leading to

J_{2} = O ({(p q)}^{- δ})

. This gives

J = O ({(p q)}^{- δ}) + n^{- ε / 8}

, and (A3) holds

Finally, we estimate (A4). By Condition 1,

E | Z_{k i} Z_{k}^{T} ω_{j} |^{2} \leq (E | Z_{k i} |^{4})^{1 / 2} (E | Z_{k}^{T} ω_{j} {|^{4})}^{1 / 2} ≲ 1

. Replacing

η_{k} = X_{k i} X_{k j}

with

η_{k} = Z_{k i} Z_{k}^{T} ω_{j}

and repeating the argument for (A3) yields (A4). The proof is done. □

Lemma A2.

Assume Condition 1 holds. Then, for any fixed

δ > 0

, there exists a constant

C = C (γ, δ, ε, K) > 0

such that

P (max_{1 \leq j \leq q} | \frac{1}{n} \sum_{k = 1}^{n} (X_{k j}^{2} - E X_{1 j}^{2}) | \leq C \sqrt{\frac{log (p q)}{n}}) = 1 - O ({(p q)}^{- δ} + n^{- ε / 8}) .

Consequently,

max_{1 \leq j \leq q} \frac{1}{n} \sum_{k = 1}^{n} X_{k j}^{2} \leq max_{1 \leq j \leq q} E X_{1 j}^{2} + C \sqrt{\frac{log (p q)}{n}}

with the same probability bound.

Proof.

Let

T_{n} : = n^{1 / 4}

and split

X_{k j}^{2}

into truncated and tail parts,

U_{k j} : = X_{k j}^{2} 1 {| X_{k j} | \leq T_{n}}, V_{k j} : = X_{k j}^{2} 1 {| X_{k j} | > T_{n}} .

Then,

\frac{1}{n} \sum_{k = 1}^{n} (X_{k j}^{2} - E X_{1 j}^{2}) = \frac{1}{n} \sum_{k = 1}^{n} (U_{k j} - E U_{1 j}) + \frac{1}{n} \sum_{k = 1}^{n} (V_{k j} - E V_{1 j}) .

Since

| U_{k j} - E U_{1 j} | \leq 2 T_{n}^{2} = 2 n^{1 / 2}

and

\sum_{k = 1}^{n} Var (U_{k j}) \leq n E X_{1 j}^{4} \leq K n

due to Condition 1, then Bernstein’s inequality (Theorem A1) yields, and for

t = 2 δ log (p q)

,

P (| \sum_{k = 1}^{n} (U_{k j} - E U_{1 j}) | \geq C_{1} (\sqrt{n t} + n^{1 / 2} t)) \leq 2 e^{- t} .

Divide by n and use

(p \lor q) \leq n^{γ}

so that

log (p q) = o (n)

; since

t ≍ log (p q)

and

n^{1 / 2} t / n = t / \sqrt{n}

, we obtain

P (| \frac{1}{n} \sum_{k = 1}^{n} (U_{k j} - E U_{1 j}) | \geq C_{2} \sqrt{\frac{log (p q)}{n}}) \leq 2 {(p q)}^{- δ} .

(A8)

A union bound over

j = 1, \dots, q

preserves the

{(p q)}^{- δ}

rate.

By Condition 1, for some

r = 4 γ + 4 + ε > 4

and constant

K_{1}

,

E {|X_{1 j}|}^{r} \leq K_{1} \Rightarrow E V_{1 j} = E [X_{1 j}^{2} 1 \{|X_{1 j}| > T_{n}\}] \leq \frac{E {|X_{1 j}|}^{r}}{T_{n}^{r - 2}} ≲ n^{- (r - 2) / 4} .

Similarly,

P (| X_{k j} | > T_{n}) ≲ n^{- r / 4}

so that

P (\sum_{k = 1}^{n} 1 {| X_{k j} | > T_{n}}) = O_{P} (n^{1 - r / 4})

uniformly in j. Hence

max_{1 \leq j \leq q} | \frac{1}{n} \sum_{k = 1}^{n} (V_{k j} - E V_{1 j}) | = O_{P} (n^{- (r - 2) / 4}) = O_{P} (n^{- 1 / 2 - ε / 4}),

(A9)

which is negligible compared to

\sqrt{log (p q) / n}

since

l o g (p q) = O (l o g n)

under

(p \lor q) \leq n^{γ}

.

Combine (A8) and (A9) and apply a union bound over j to conclude the stated probability bound for the maximum over j. □

Proof of Theorem 1.

We first prove inequality (4). By the regression model

Y = Γ X + Z

, centering both sides by sample means gives

Y_{k} - \bar{Y} = Γ (X_{k} - \bar{X}) + Z_{k} - \bar{Z}

. Multiplying both sides by

{(X_{k} - \bar{X})}^{T}

and averaging over k yields

{\hat{Σ}}_{Y X} - Γ {\hat{Σ}}_{X X} = {\hat{Σ}}_{Z X},

where

{\hat{Σ}}_{Z X} = 1 / n \sum_{k = 1}^{n} (Z_{k} - \bar{Z}) {(X_{k} - \bar{X})}^{T}

. Thus, it suffices to show

I : = P {∥ \frac{1}{n} \sum_{k = 1}^{n} (Z_{k} - \bar{Z}) {(X_{k} - \bar{X})}^{T} ∥_{max} \geq λ_{n, p}} = O ({(p q)}^{- δ} + n^{- ε / 4}),

(A10)

where

λ_{n, p} = C_{1} \sqrt{log (p q) / n}

with the constant

C_{1}

to be specified.

Since

1 / n \sum_{k = 1}^{n} (Z_{k i} - {\bar{Z}}_{i}) (X_{k j} - {\bar{X}}_{j}) = 1 / n \sum_{k = 1}^{n} Z_{k i} X_{k j} - {\bar{Z}}_{i} {\bar{X}}_{j}

, then

I \leq P {max_{i j} | 1 / n \sum_{k = 1}^{n} Z_{k i} X_{k j} | \geq λ_{n, p}} + P {max_{i j} | {\bar{Z}}_{i} {\bar{X}}_{j} | \geq λ_{n, p}} .

Recall that

λ_{n, p} = C_{1} \sqrt{log (p q) / n}

and

(p \lor q) \leq n^{γ}

yields

log (p q) = o (n)

. Then,

P {max_{i j} | {\bar{Z}}_{i} {\bar{X}}_{j} | \geq λ_{n, p}} \leq P {max_{i} | {\bar{Z}}_{i} | \geq C_{1} \sqrt{\frac{log (p q)}{n}}} + P {max_{j} | {\bar{X}}_{j} | \geq C_{1} \sqrt{\frac{log (p q)}{n}}} .

By (A1) and (A2) in Lemma A1, we have

I = O ({(p q)}^{- δ} + n^{- ε / 4})

, and (4) holds.

To prove (5), since

{∥ {\hat{Σ}}_{X X} - Σ_{X} ∥}_{max} = max_{i j} | 1 / n (X_{k i} X_{k j} - E X_{k i} X_{k j}) - {\bar{X}}_{i} {\bar{X}}_{j} |

, then

\begin{matrix} P {{∥ {\hat{Σ}}_{X X} - Σ_{X} ∥}_{max} & \geq λ_{n, p}} \leq P {max_{i j} | \frac{1}{n} (X_{k i} X_{k j} - E X_{k i} X_{k j}) | \geq 1 / 2 λ_{n, p}} \\ + P {max_{i j} | {\bar{X}}_{i} {\bar{X}}_{j} | \geq 1 / 2 λ_{n, p}} = O ({(p q)}^{- δ} + n^{- ε / 4}) \end{matrix}

due to Lemma A1.

Recall

E Z = 0

and

{\hat{Σ}}_{Z Z} = 1 / n \sum_{k = 1}^{n} Z_{k} Z_{k}^{T}

. Then,

{\hat{Σ}}_{Z Z} Ω - I_{p} = ({\hat{Σ}}_{Z Z} - Σ) Ω = \frac{1}{n} \sum_{k = 1}^{n} (Z_{k} Z_{k}^{T} Ω - E Z_{k} Z_{k}^{T} Ω)

and

P {{∥ {\hat{Σ}}_{Z Z} Ω - I_{p} ∥}_{max} \geq λ_{n, p}} \leq P {max_{i j} | \frac{1}{n} \sum_{k = 1}^{n} (Z_{k i} Z_{k}^{T} ω_{j} - E Z_{k i} Z_{k}^{T} ω_{j}) | \geq λ_{n, p}} .

Hence, (6) holds due to (A4) in Lemma A1. □

Lemma A3

(Cai et al. (2011) [6]). Supposing matrix

A = (a_{1}, \dots, a_{n}) = {(a_{i j})}_{m \times n}

satisfies

max_{1 ⩽ i ⩽ m} \sum_{j = 1}^{n} {|a_{i j}|}^{δ} ⩽ s (p)

,

\hat{A}

is any estimator of

A

and

{∥ {\hat{a}}_{j} ∥}_{1} \leq {∥ a_{j} ∥}_{1}

. Then

∥ \hat{A} {- A ∥}_{l_{\infty}} ≲ s (p) {∥ \hat{A} - A ∥}_{max}^{1 - δ} .

Proof of Theorem 2.

Define two events

A_{1} : = {∥ {\hat{Σ}}_{Y X} - Γ {\hat{Σ}}_{X X} ∥_{max} \leq λ_{n, p}}

and

A_{2} : = {∥ {\hat{Σ}}_{X X} - Σ_{X} ∥_{max} \leq λ_{n, p}}

. Then, by (4) and (5), we have

P (A_{1} \cap A_{2}) = 1 - O ({(p q)}^{- δ} + n^{- ε / 8})

. It suffices to show that on

A_{1} \cap A_{2}

,

∥ \hat{Γ} {- Γ ∥}_{l_{\infty}} \leq C N_{q}^{1 - δ_{1}} s_{1} (q) {(\frac{log (p q)}{n})}^{\frac{1 - δ_{1}}{2}} .

(A11)

Since

{\hat{γ}}_{j}

is the solution of (2), then

{∥ {\hat{γ}}_{j} ∥}_{1} \leq {∥ γ_{j} ∥}_{1}

and

{∥ {\hat{Σ}}_{Y X} - \hat{Γ} {\hat{Σ}}_{X X} ∥}_{max} \leq λ_{n, p}

. This with event

A_{1} \cap A_{2}

gives

{∥ (\hat{Γ} - Γ) {\hat{Σ}}_{X X} ∥}_{max} \leq 2 λ_{n p}

and

\begin{matrix} {∥ (\hat{Γ} - Γ) Σ_{X} ∥}_{max} & \leq {∥ (\hat{Γ} - Γ) {\hat{Σ}}_{X X} ∥}_{max} + {∥ (\hat{Γ} - Γ) (Σ_{X} - {\hat{Σ}}_{X X}) ∥}_{max} \\ \leq 2 λ_{n, p} + {∥ \hat{Γ} - Γ ∥}_{l_{\infty}} λ_{n, p} . \end{matrix}

This with Condition 4 implies

∥ \hat{Γ} {- Γ ∥}_{max} \leq {∥ (\hat{Γ} - Γ) Σ_{X} ∥}_{max} {∥ Σ_{X}^{- 1} ∥}_{l_{1}} \leq N_{q} λ_{n, p} (2 + ∥ \hat{Γ} - Γ ∥_{l_{\infty}}) .

(A12)

Recall that Condition 2 hold and

{∥ {\hat{γ}}_{j} ∥}_{1} \leq {∥ γ_{j} ∥}_{1}

. Then, by Lemma A3, we have

∥ \hat{Γ} {- Γ ∥}_{l_{\infty}} \leq C s_{1} (q) ∥ \hat{Γ} {- Γ ∥}_{max}^{1 - δ_{1}} ≲ s_{1} (q) N_{q}^{1 - δ_{1}} λ_{n, p}^{1 - δ_{1}} (1 + ∥ \hat{Γ} - Γ ∥_{l_{\infty}}^{1 - δ_{1}}) .

If

∥ \hat{Γ} {- Γ ∥}_{l_{\infty}} \leq 1

, then we have

∥ \hat{Γ} {- Γ ∥}_{l_{\infty}} \leq C s_{1} (q) N_{q}^{1 - δ_{1}} λ_{n, p}^{1 - δ_{1}} .

If

∥ \hat{Γ} {- Γ ∥}_{l_{\infty}} \geq 1

, then by known condition

s_{1} (q) = o (N_{q}^{δ_{1} - 1} {(n / log (p q))}^{(1 - δ_{1}) / 2})

, for large n,

∥ \hat{Γ} {- Γ ∥}_{l_{\infty}} \leq C s_{1} (q) N_{q}^{1 - δ_{1}} λ_{n, p}^{1 - δ_{1}} + \frac{1}{2} {∥ \hat{Γ} - Γ ∥}_{l_{\infty}} .

Thus, we proved (A11) when

A_{1} \cap A_{2}

happens and (7) holds. Substituting it into (A12), we can obtain that (8) holds. Finally, (9) follows from (7) and (8) and the inequality

p^{- 1} {∥ \hat{Γ} - Γ ∥}_{F}^{2} \leq {∥ \hat{Γ} - Γ ∥}_{max} {∥ \hat{Γ} - Γ ∥}_{l_{\infty}}

. The proof is done. □

Proof of Theorem 3.

Recall (3). We define event

A : = {{∥ {\hat{Σ}}_{Y Y} Ω - I_{p} ∥}_{max} \leq τ_{n, p}}

with

τ_{n, p} = C_{2} \sqrt{log (p q) / n}

. We first show that on A, the desired bounds for

\hat{Ω}

hold, and then verify

P (A) = 1 - O ({(p q)}^{- δ} + n^{- ε / 8})

.

By the definitions of

{\hat{ω}}_{j}

and

{\hat{ω}}_{j}^{1}

, it holds on A:

{∥ {\hat{ω}}_{j} ∥}_{1} \leq {∥ {\hat{ω}}_{j}^{1} ∥}_{1} \leq {∥ ω_{j} ∥}_{1} .

(A13)

Hence, to estimate (11), one needs only to prove that when event A happens,

∥ \hat{Ω} {- Ω ∥}_{max} ≲ M_{p} \sqrt{\frac{log p}{n}}

(A14)

thanks to Lemma A3 and

∥ \hat{Ω} {- Ω ∥}_{2} \leq {∥ \hat{Ω} - Ω ∥}_{l_{\infty}}

.

Note that the symmetrized estimator

\hat{Ω}

satisfies

∥ \hat{Ω} {- Ω ∥}_{max} \leq {∥ {\hat{Ω}}^{1} - Ω ∥}_{max}

. Using

Ω = Ω {\hat{Σ}}_{Y Y} {\hat{Ω}}^{1} + Ω (I_{p} - {\hat{Σ}}_{Y Y} {\hat{Ω}}^{1})

, we have

{\hat{Ω}}^{1} - Ω = (I_{p} - Ω {\hat{Σ}}_{Y Y}) {\hat{Ω}}^{1} - Ω (I_{p} - {\hat{Σ}}_{Y Y} {\hat{Ω}}^{1}) .

This with matrix norm inequality

{∥ AB ∥}_{max} \leq {∥ A ∥}_{max} {∥ B ∥}_{l_{1}}

yields

∥ \hat{Ω} {- Ω ∥}_{max} \leq {∥ I_{p} - Ω {\hat{Σ}}_{Y Y} ∥}_{max} {∥ {\hat{Ω}}^{1} ∥}_{l_{1}} + {∥ {\hat{Σ}}_{Y Y} {\hat{Ω}}^{1} - I_{p} ∥}_{max} {∥ Ω ∥}_{l_{1}} .

On event A,

{∥ I_{p} - Ω {\hat{Σ}}_{Y Y} ∥}_{max} = {∥ I_{p} - {\hat{Σ}}_{Y Y} Ω ∥}_{max} \leq τ_{n, p}

, and

{∥ I_{p} - {\hat{Σ}}_{Y Y} {\hat{Ω}}^{1} ∥}_{max} \leq τ_{n, p}

from (3). Moreover, it follows from (A13)

∥ \hat{Ω} {- Ω ∥}_{max} \leq 2 τ_{n, p} {∥ Ω ∥}_{l_{1}},

which concludes the expected estimate (A14) thanks to

Ω \in U_{δ_{2}} (s_{2} (p))

and

τ_{n, p} = C_{2} \sqrt{log (p q) / n}

.

Now, one needs only to prove

P (A) = P {∥ {\hat{Σ}}_{Y Y} Ω - I_{p} ∥_{max} \leq τ_{n, p}} = 1 - O (p^{- δ} + n^{ε / 8})

.

Since

{∥ {\hat{Σ}}_{Y Y} Ω - I_{p} ∥}_{max} \leq {∥ ({\hat{Σ}}_{Y Y} - {\hat{Σ}}_{Z Z}) Ω ∥}_{max} + {∥ I_{p} - {\hat{Σ}}_{Z Z} Ω ∥}_{max}

, then by (6), we have

P {∥ {\hat{Σ}}_{Z Z} Ω - I_{p} ∥_{max} \leq τ_{n, p}} = 1 - O ({(p q)}^{- δ} + n^{- ε / 4})

and it suffices to show

P {∥ {\hat{Σ}}_{Y Y} - {\hat{Σ}}_{Z Z} ∥_{max} \leq C M_{p}^{- 1} τ_{n, p}} = 1 - O ({(p q)}^{- δ} + n^{- ε / 8})

(A15)

due to

∥ ({\hat{Σ}}_{Y Y} - {\hat{Σ}}_{Z Z}) {Ω ∥}_{max} \leq {∥ {\hat{Σ}}_{Y Y} - {\hat{Σ}}_{Z Z} ∥}_{max} {∥ Ω ∥}_{l_{\infty}}

,

Ω \in U_{δ_{2}} (s_{2} (p))

and

{∥ Ω ∥}_{l_{\infty}} ⩽ M_{p}

.

Recall

E Z = 0

and

{\hat{Σ}}_{Z Z} = 1 / n \sum_{k = 1}^{n} Z_{k} Z_{k}^{T}

. Substituting

Y_{k} = Γ X_{k} + Z_{k}

into

{\hat{Σ}}_{Y Y} = 1 / n \sum_{k = 1}^{n} (Y_{k} - \hat{Γ} X_{k}) {(Y_{k} - \hat{Γ} X_{k})}^{T}

and denote

Δ_{n} = (δ_{i j}) : = \hat{Γ} - Γ

, and we have

{\hat{Σ}}_{Y Y} = \frac{1}{n} \sum_{k = 1}^{n} (Z_{k} - Δ_{n} X_{k}) {(Z_{k} - Δ_{n} X_{k})}^{T}

and

{∥ {\hat{Σ}}_{Y Y} - {\hat{Σ}}_{Z Z} ∥}_{max} \leq {∥ \frac{2}{n} \sum_{k = 1}^{n} Z_{k} X_{k}^{T} Δ_{n}^{T} ∥}_{max} + {∥ \frac{1}{n} \sum_{k = 1}^{n} Δ_{n} X_{k} X_{k}^{T} Δ_{n}^{T} ∥}_{max} .

Hence, we only need to prove

P {∥ \frac{1}{n} \sum_{k = 1}^{n} Z_{k} X_{k}^{T} Δ_{n}^{T} ∥_{max} \leq C M_{p}^{- 1} τ_{n, p}} = 1 - O ({(p q)}^{- δ} + n^{- ε / 8})

(A16)

and

P {∥ \frac{1}{n} \sum_{k = 1}^{n} Δ_{n} X_{k} X_{k}^{T} Δ_{n}^{T} ∥_{max} \leq C M_{p}^{- 1} τ_{n, p}} = 1 - O ({(p q)}^{- δ} + n^{- ε / 8}) .

(A17)

To estimate (A16), by

s_{1} (p) \leq C {(1 + M_{p})}^{- 1} N_{q}^{δ_{1} - 2} {(n / log (p q))}^{(1 - δ_{1}) / 2}

and (7), we have

{∥ \frac{1}{n} \sum_{k = 1}^{n} Z_{k} X_{k}^{T} Δ_{n}^{T} ∥}_{max} \leq {∥ \hat{Γ} - Γ ∥}_{l_{\infty}} max_{i j} | \frac{1}{n} \sum_{k = 1}^{n} Z_{k i} X_{k j} | \leq C M_{p}^{- 1} τ_{n, p}

with probability at least

1 - O ({(p q)}^{- δ} + n^{- ε / 4})

. Thus, (A16) holds. It remains to show that (A17) holds, since

max_{i l} | \frac{1}{n} \sum_{k = 1}^{n} (\sum_{j = 1}^{q} δ_{i j} X_{k j}) (\sum_{j = 1}^{q} δ_{l j} X_{k j}) | = max_{i} | \frac{1}{n} \sum_{k = 1}^{n} {(\sum_{j = 1}^{q} δ_{i j} X_{k j})}^{2} | \leq max_{i} | \sum_{j = 1}^{q} δ_{i j}^{2} \frac{1}{n} \sum_{k = 1}^{n} X_{k j}^{2} | .

(A18)

With Lemma A2, in the event of probability at least

1 - O ({(p q)}^{- δ} + n^{- ε / 8})

,

max_{1 \leq j \leq q} | \frac{1}{n} \sum_{k = 1}^{n} X_{k j}^{2} | \leq max_{1 \leq j \leq q} E X_{1 j}^{2} + C \sqrt{\frac{log (p q)}{n}} \leq C^{'},

where

C^{'}

depends only on the moment bound in Condition 1. Substituting this into (A18) yields

max_{i} | \sum_{j = 1}^{q} δ_{i j}^{2} \frac{1}{n} \sum_{k = 1}^{n} X_{k j}^{2} | \leq {∥ \hat{Γ} - Γ ∥}_{max} {∥ \hat{Γ} - Γ ∥}_{l_{\infty}} max_{j} | \frac{1}{n} \sum_{k = 1}^{n} X_{k j}^{2} | \leq C M_{p}^{- 1} τ_{n, p},

using the bounds from Theorem 2 on

∥ \hat{Γ} {- Γ ∥}_{max}

,

∥ \hat{Γ} {- Γ ∥}_{l_{\infty}}

and

s_{1} (p)

, thereby establishing (A17). The proof is done. □

References

Wold, S.; Sjostrom, M.; Eriksson, L. PLS-regression: A basic tool of chemometrics. Chemom. Intell. Lab. Syst. 2001, 58, 109–130. [Google Scholar] [CrossRef]
Harrison, L.; Penny, W.; Friston, K. Multivariate autoregressive modeling of fMRI time series. NeuroImage 2003, 19, 1477–1491. [Google Scholar] [CrossRef]
Meng, C.; Kuster, B.; Culhane, A.C.; Gholami, A.M. A multivariate approach to the integration of multi-omics datasets. BMC Bioinform. 2014, 15, 162. [Google Scholar] [CrossRef]
Lee, C.F.; Lee, A.C.; Lee, J. Handbook of Quantitative Finance and Risk Management; Springer: Berlin/Heidelberg, Germany, 2010. [Google Scholar]
Liu, R.; Yu, G. Estimation of the Error Structure in Multivariate Response Linear Regression Models. WIREs Comput. Stat. 2025, 17, e70021. [Google Scholar] [CrossRef]
Cai, T.; Liu, W.; Luo, X. A Constrained l1 Minimization Approach to Sparse Precision Matrix Estimation. J. Am. Stat. Assoc. 2011, 106, 594–607. [Google Scholar] [CrossRef]
Cai, T.T.; Liu, W.; Zhou, H.H. Estimating sparse precision matrix: Optimal rates of convergence and adaptive estimation. Ann. Stat. 2016, 44, 455–488. [Google Scholar] [CrossRef]
Avella-Medina, M.; Battey, H.S.; Fan, J.; Li, Q. Robust estimation of high-dimensional covariance and precision matrices. Biometrika 2018, 105, 271–284. [Google Scholar] [CrossRef] [PubMed]
Cai, T.T.; Li, H.; Liu, W.; Xie, J. Covariate-adjusted precision matrix estimation with an application in genetical genomics. Biometrika 2012, 100, 139–156. [Google Scholar] [CrossRef]
Chen, M.; Ren, Z.; Zhao, H.; Zhou, H. Asymptotically Normal and Efficient Estimation of Covariate-Adjusted Gaussian Graphical Model. J. Am. Stat. Assoc. 2016, 111, 394–406. [Google Scholar] [CrossRef]
Tan, K.; Romon, G.; Bellec, P.C. Noise covariance estimation in multi-task high-dimensional linear models. Bernoulli 2024, 30, 1695–1722. [Google Scholar] [CrossRef]
Bickel, P.J.; Levina, E. Covariance regularization by thresholding. Ann. Stat. 2008, 36, 2577–2604. [Google Scholar] [CrossRef] [PubMed]
Tibshirani, R. Regression Shrinkage and Selection via The Lasso: A Retrospective. J. R. Stat. Soc. Ser. B Stat. Methodol. 2011, 73, 273–282. [Google Scholar] [CrossRef]
Candes, E.; Tao, T. The Dantzig selector: Statistical estimation when p is much larger than n. Ann. Stat. 2007, 35, 2313–2351. [Google Scholar] [CrossRef] [PubMed]
Friedman, J.; Hastie, T.; Tibshirani, R. Sparse Inverse Covariance Estimation with the Graphical Lasso. Biostatistics 2008, 9, 432–441. [Google Scholar] [CrossRef]
Wu, G.D.; Chen, J.; Hoffmann, C.; Bittinger, K.; Chen, Y.Y.; Keilbaugh, S.A.; Bewtra, M.; Knights, D.; Walters, W.A.; Knight, R.; et al. Linking Long-Term Dietary Patterns with Gut Microbial Enterotypes. Science 2011, 334, 105–108. [Google Scholar] [CrossRef]
Cao, Y.; Lin, W.; Li, H. Large Covariance Estimation for Compositional Data Via Composition-Adjusted Thresholding. J. Am. Stat. Assoc. 2019, 114, 759–772. [Google Scholar] [CrossRef]
He, Y.; Liu, P.; Zhang, X.; Zhou, W. Robust covariance estimation for high-dimensional compositional data with application to microbial communities analysis. Stat. Med. 2021, 40, 3499–3515. [Google Scholar] [CrossRef]
Li, D.; Srinivasan, A.; Chen, Q.; Xue, L. Robust Covariance Matrix Estimation for High-Dimensional Compositional Data with Application to Sales Data Analysis. J. Bus. Econ. Stat. 2023, 41, 1090–1100. [Google Scholar] [CrossRef]
Zhang, S.; Wang, H.; Lin, W. CARE: Large Precision Matrix Estimation for Compositional Data. J. Am. Stat. Assoc. 2025, 120, 305–317. [Google Scholar] [CrossRef]
Wang, J.; Liang, W.; Li, L.; Wu, Y.; Ma, X. A new robust covariance matrix estimation for high-dimensional microbiome data. Aust. N. Z. J. Stat. 2024, 66, 281–295. [Google Scholar] [CrossRef]
Coyte, K.Z.; Schluter, J.; Foster, K.R. The ecology of the microbiome: Networks, competition, and stability. Science 2015, 350, 663–666. [Google Scholar] [CrossRef]
Greenblum, S.; Turnbaugh, P.J.; Borenstein, E. Metagenomic systems biology of the human gut microbiome reveals topological shifts associated with obesity and inflammatory bowel disease. Proc. Natl. Acad. Sci. USA 2012, 109, 594–599. [Google Scholar] [CrossRef]
Massart, P. Concentration Inequalities and Model Selection; Lecture Notes in Mathematics; Springer: Berlin/Heidelberg, Germany, 2007; Volume 1896, pp. xiv+337. [Google Scholar]

Figure 1. Conditional correlation network of gut microbiota in the lean group (BMI

< 25

). Nodes represent 30 retained bacterial genera. Edges denote stable associations (bootstrap frequency

\geq 50 %

): green lines indicate positive correlations, red lines indicate negative correlations, and darker colors represent stronger associations.

Figure 1. Conditional correlation network of gut microbiota in the lean group (BMI

< 25

). Nodes represent 30 retained bacterial genera. Edges denote stable associations (bootstrap frequency

\geq 50 %

): green lines indicate positive correlations, red lines indicate negative correlations, and darker colors represent stronger associations.

Figure 2. Conditional correlation network of gut microbiota in the obese group (BMI

\geq 25

). Nodes represent 30 retained bacterial genera. Edges denote stable associations (bootstrap frequency

\geq 50 %

): green lines indicate positive correlations, red lines indicate negative correlations, and darker colors represent stronger associations.

Figure 2. Conditional correlation network of gut microbiota in the obese group (BMI

\geq 25

). Nodes represent 30 retained bacterial genera. Edges denote stable associations (bootstrap frequency

\geq 50 %

): green lines indicate positive correlations, red lines indicate negative correlations, and darker colors represent stronger associations.

Table 1. Average errors (standard errors) under three methods.

(p,q,n, $s_{1}$ , $s_{2}$ )	Method	Spectral Norm	Frobenius Norm	Matrix $l_{1}$ Norm
Model 1	CLIME	2.68 (0.03)	2.58 (0.06)	1.98 (0.59)
	GLASSO	2.75 (0.03)	2.61 (0.05)	2.51 (0.16)
	Our Method	2.12 (0.09)	1.78 (0.20)	1.54 (0.15)
Model 2	CLIME	2.78 (0.04)	2.68 (0.03)	2.49 (0.04)
	GLASSO	2.94 (0.02)	2.90 (0.02)	2.59 (0.04)
	Our Method	2.75 (0.03)	2.61 (0.05)	2.51 (0.16)
Model 3	CLIME	2.78 (0.04)	2.68 (0.03)	2.49 (0.04)
	GLASSO	2.94 (0.02)	2.90 (0.02)	2.59 (0.04)
	Our Method	2.75 (0.03)	2.61 (0.05)	2.51 (0.16)

Table 2. Quantitative characteristics of gut microbial networks in lean and obese groups.

Metric	Lean Group	Obese Group
Number of retained genera (nodes)	30	30
Number of stable edges	7	5
Positive correlations (proportion)	2 (28.6%)	2 (40.0%)
Negative correlations (proportion)	5 (71.4%)	3 (60.0%)
Network stability score ¹	0.72	0.58

¹ Stability score refers to the average bootstrap frequency of all stable edges, with a range of [0, 1] and higher values indicating more reliable networks.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Article Metrics

Citations

Article Access Statistics

Journal Statistics

Multiple requests from the same IP address are counted as one view.