Efficient Estimation and Response Variable Selection in Sparse Partial Envelope Model

Yu Wu; Jing Zhang

doi:10.3390/math12233758

and

¹

Business School, Nanjing University, Nanjing 210093, China

²

College of Science, Nanjing University of Posts and Telecommunications, Nanjing 210023, China

^*

Author to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Mathematics2024, 12(23), 3758;https://doi.org/10.3390/math12233758

This article belongs to the Section D1: Probability and Statistics

Version Notes

Order Reprints

Abstract

In this paper, we propose a sparse partial envelope model that performs response variable selection efficiently under the partial envelope model. We discuss its theoretical properties including consistency, an oracle property and the asymptotic distribution of the sparse partial envelope estimator. A large-sample situation and high-dimensional situation are both considered. Numerical experiments demonstrate that the sparse partial envelope estimator has excellent response variable selection performance both in the large-sample situation and the high-dimensional situation. Moreover, simulation studies and real data analysis suggest that the sparse partial envelope estimator has a much more competitive performance than the standard estimator, the oracle partial envelope estimator, the active partial envelope estimator and the sparse envelope estimator, whether it is in the large-sample situation or the high-dimensional situation.

Keywords:

response variable selection; dimension reduction; partial envelope model; Grassmann manifold; oracle property

MSC:

62B05; 62H12; 62R07

1. Introduction

Consider the multivariate linear regression model, which has the stochastic predictor vector

X \in R^{p}

and the multivariate response vector

Y \in R^{r}

,

\begin{matrix} Y = α + β X + ε, \end{matrix}

(1)

where

α \in R^{r}

is the unknown intercept,

β \in R^{r \times p}

is the unknown regression coefficient matrix, the error vector

ε

has mean zero and unknown covariance matrix

Σ > 0

, which is independent of X. The data comprise n independent realizations

Y_{i}

of Y, which are observed at corresponding values

X_{i}

of

X (i = 1, \dots, n)

. The multivariate linear regression model (1) is a cornerstone of multivariate statistics, and researching the interrelation between X and Y through the regression coefficient matrix

β \in R^{r \times p}

is the main focus. In this article, we not only center on the correlation between X and Y, but we also pay attention to the response variable selection. Although a response has zero coefficients, it can still improve the estimation efficiency of the nonzero coefficients. In this paper, we suppose that

n > p

since one of the main focuses of this article is response variable selection.

Cook et al. [] introduced a response envelope model, and it is based on the idea that a projection of the response vector Y may be immaterial to the purpose of estimating

β

, whereas contributing extraneous variation induces the estimator of

β

to be more variable. Envelope estimation explains such extraneous variation, and this operation makes the estimator of

β

potentially much more efficient. On the basis of the research results in Cook et al. [], many scholars have extended the idea of envelopes to a more general background and have proposed a lot of new models to obtain greater efficiency gains (Su and Cook [,,], Cook et al. [], Su et al. [], Cook and Zhang [,,], Khare et al. [], Li and Zhang [], Zhang and Li [], Pan et al. [], Zhu and Su []).

Su et al. [] proposed the sparse envelope model, which carries out variable selection on the responses and maintains the efficiency gains provided by the envelope model, and discussed response variable selection in both the standard multivariate linear regression and the envelope background. Meanwhile, they established consistency and the oracle property and acquired the asymptotic distribution of the sparse envelope estimator. Zhu and Su [] proposed the envelope-based sparse partial least squares (SPLS) by employing a connection between envelope models and partial least squares (PLS). They established the consistency, oracle property and asymptotic normality of the envelope-based sparse partial least squares estimator, and considered the large-sample scenario and high-dimensional scenario. Meanwhile, they developed the envelope-based sparse partial least squares estimators under the setting of generalized linear models and discussed its theoretical properties including consistency, oracle property and asymptotic distribution.

In this article, based on the research work of Su et al. [], we propose a novel sparse partial envelope model that can implement response variable selection and improve parameter estimation efficiency. It has the following major contributions. First, the theoretical properties of the sparse partial envelope estimator can be studied. We build

\sqrt{n}

-consistency, asymptotic normality and the oracle property in the large-sample situation and investigate the rate of convergence and selection consistency in the high-dimensional situation. Second, through the simulation studies, we find that the sparse partial envelope model has excellent response variable selection performances both in small r large n situations and small n large r situations. Furthermore, the sparse partial envelope estimator is much more efficient than the standard estimator, the oracle partial envelope estimator, the active partial envelope estimator and the sparse envelope estimator in small r large n situations. Third, via the real data analysis, we also discover that the sparse partial envelope model has an obvious efficiency gain than the sparse envelope model in small n large r situations. In short, the sparse partial envelope model is more flexible than the sparse envelope model.

The rest of the article is organized as follows. Section 2 reviews the envelope model and the partial envelope model. Section 3 demonstrates how the sparse partial envelope model is proposed. Section 4 shows theoretical properties in the sparse partial envelope model estimators. Simulation studies are carried out in Section 5. A real data analysis is given in Section 6. Some remarks are displayed in Section 7. The proofs of theorems and propositions are provided in the part of Appendix.

2. Review of the Envelope Model and Partial Envelope Model

The envelope model (Cook et al. []) is designed to discover the smallest subspace

E \subseteq R^{r}

that satisfies the following two conditions:

\begin{matrix} (a) Q_{S} Y | X \sim Q_{S} Y, \\ (b) Q_{S} Y ⨿ P_{S} Y | X . \end{matrix}

(2)

The sign ‘∼’ represents identically distributed, and the sign ‘⨿’ represents statistically independent. The symbol

P_{(\cdot)}

projects onto the subspace and

Q = I_{r} - P

. Property (a) implies that the distribution of

Q_{S} Y

does not rely on X, so

Q_{S} Y

has no information about

β

. Property (b) implies that

Q_{S} Y

is conditionally independent of

P_{S} Y

given X, and thus,

Q_{S} Y

cannot transmit information about

β

by virtue of a link with

P_{S} Y

. The entire immaterial information in Y can be obtained by detecting the smallest subspace

S

that satisfies the requirements in (2). Let

B = span (β)

. Cook et al. [] showed that conditions (2) are equal to the following two conditions:

\begin{matrix} (2 a) B \subseteq S, \\ (2 b) Σ = P_{S} Σ P_{S} + Q_{S} Σ Q_{S} . \end{matrix}

(3)

Condition (2b) holds if and only if

P_{S} Y

and

Q_{S} Y

are uncorrelated given X, and it is equivalent to claiming that

S

is a reducing subspace of

Σ

. Conditions (3) indicate that we can obtain all of the immaterial information by selecting

S

to be the intersection of all reducing subspaces of

Σ

that include

B

, which is called the

Σ

-envelope of

B

and denoted by

E_{Σ} (B)

or shortened to

E

.

Let

u = \dim {E_{Σ} (B)}

, and let

Γ \in R^{r \times u}

and

Γ_{0} \in R^{r \times (r - u)}

denote semi-orthogonal basis matrices for

E_{Σ} (B)

and

E_{Σ}^{⊥} (B)

separately. Via imposing conditions (3) on the standard model (1), the coordinate form of the envelope model can be expressed as below:

\begin{matrix} Y = α + Γ η X + ε, Σ = Γ Ω Γ^{T} + Γ_{0} Ω_{0} Γ_{0}^{T}, \end{matrix}

(4)

where

β = Γ η

,

η \in R^{u \times p}

is the coordinates of

β

with respect to

Γ

, and

Ω \in R^{u \times u}

and

Ω_{0} \in R^{(r - u) \times (r - u)}

are both positive definite matrices. Su and Cook [] extended the envelope model to the partial envelope model. The partial envelopes pay attention to the coefficients consistent with the predictors of interest. They partition X into two sets of predictors

X_{1} \in R^{p_{1}}

and

X_{2} \in R^{p_{2}}

,

p_{1} + p_{2} = p

,

p_{1} < r

, and partition correspondingly the columns of

β

into

β_{1}

and

β_{2}

. Then, model (1) can be rephrased as

Y = α + β_{1} X_{1} + β_{2} X_{2} + ε

, where

β_{1}

is corresponding to the coefficients of interest. The

Σ

-envelope for

B_{1} = span (β_{1})

is mainly thought over, leaving

β_{2}

as an unrestricted parameter. This generates the parametric structure

B_{1} \subseteq E_{Σ} (B_{1})

and

Σ = P_{E_{1}} Σ P_{E_{1}} + Q_{E_{1}} Σ Q_{E_{1}}

, where

P_{E_{1}}

denotes the projection onto

E_{Σ} (B_{1})

called the partial envelope for

B_{1}

. This is the same as the envelope structure, except the partial envelop is correlated with

B_{1}

instead of the larger space

B

. In order to emphasize the partial envelope,

E_{Σ} (B)

is regarded as the full envelope. Due to

B_{1} \subseteq B

, the partial envelope is included in the full envelope,

E_{Σ} (B_{1}) \subseteq E_{Σ} (B)

. More analogous descriptions of the envelope model and the partial envelope model can be found in Zhang et al. [], Zhang et al. [], and Zhang and Huang [].

Let

u_{1} = \dim {E_{Σ} (B_{1})}

, and let

Γ \in R^{r \times u_{1}}

be a semi-orthogonal matrix with

Γ^{T} Γ = I_{u_{1}}

, and its columns form a basis for

E_{Σ} (B_{1})

. Let

(Γ, Γ_{0}) \in R^{r \times r}

be an orthogonal matrix and

η \in R^{u_{1} \times p_{1}}

be the coordinates of

β_{1}

related to the basis matrix

Γ

. Then, a coordinate version of the partial envelope model can be written as follows:

\begin{matrix} Y = α + Γ η X_{1} + β_{2} X_{2} + ε, Σ = Γ Ω Γ^{T} + Γ_{0} Ω_{0} Γ_{0}^{T}, \end{matrix}

(5)

where

Ω \in R^{u_{1} \times u_{1}}

and

Ω_{0} \in R^{(r - u_{1}) \times (r - u_{1})}

are both positive definite matrices, and they serve as coordinates of

Σ_{E_{1}}

and

Σ_{E_{1}^{⊥}}

separately which are related to the basis matrices

Γ

for

E_{Σ} (B_{1})

and

E_{Σ}^{⊥} (B_{1})

. For a more in-depth description,

R_{1 ∣ 2}

represents the population residuals from the multivariate linear regression of

X_{1}

on

X_{2}

. The predictor X is centered. In this way, the linear model can be re-parameterized as

Y = α + β_{1} R_{1 ∣ 2} + β_{2}^{*} X_{2} + ε

, where

β_{2}^{*}

is a linear combination of

β_{1}

and

β_{2}

. Next,

R_{Y ∣ 2} = Y - α - β_{2}^{*} X_{2}

denotes the population residuals from the regression of Y on

X_{2}

alone. A linear model which contains

β_{1}

alone is written as

R_{Y ∣ 2} = β_{1} R_{1 ∣ 2} + ε

. The partial envelope

E_{Σ} (B_{1})

is identical to the full envelope for

B_{1}

in the regression of

R_{Y ∣ 2}

on

R_{1 ∣ 2}

. In other words, the partial envelope can be interpreted in terms of

Q_{S} Y | X \sim Q_{S} Y, Q_{S} Y ⨿ P_{S} Y ∣ X

, which is applied to the regression of

R_{Y ∣ 2}

on

R_{1 ∣ 2}

. The predictors are centered, so the maximum likelihood estimator of

α

is simply

\hat{α} = \bar{Y}

. The estimators of the remaining parameters require the estimator of

E_{Σ} (B_{1})

. See Zhang et al. [] for a similar characterization.

The estimator of the partial envelope

E_{Σ} (B_{1})

is acquired by solving the following optimization problem:

\begin{matrix} {\hat{E}}_{Σ} (B_{1}) = \underset{span (Γ) \in G (r, u_{1})}{arg min} \{\log | Γ^{T} {\hat{Σ}}_{res} Γ | + \log | Γ^{T} {\hat{Σ}}_{R_{Y ∣ 2}}^{- 1} Γ |\}, \end{matrix}

(6)

where

G (r, u_{1})

denotes an

r \times u_{1}

Grassmann manifold, which is the set of all

u_{1}

-dimensional subspaces in an r-dimensional space,

{\hat{Σ}}_{res}

denotes the sample covariance matrix of the residuals from the least squares regression of

R_{Y ∣ 2}

on

R_{1 ∣ 2}

, and

{\hat{Σ}}_{R_{Y ∣ 2}}

denotes the sample covariance matrix of

R_{Y ∣ 2}

. The optimization is carried out on

G (r, u_{1})

, (6) is a Grassmann manifold optimization problem. The objective function is non-convex. Because the estimation of

E_{Σ} (B_{1})

contains manifold optimization, it can be slow in high-dimensional settings. To solve these problems, Cook et al. [] converted the problems into a non-manifold optimization through a reparameterization of

Γ

. In general,

Γ_{1}

, which is composed of the first

u_{1}

rows of

Γ

, is assumed to be nonsingular. Then,

\begin{matrix} Γ = (\begin{matrix} Γ_{1} \\ Γ_{2} \end{matrix}) = (\begin{matrix} I_{u_{1}} \\ A \end{matrix}) Γ_{1} \equiv G_{A} Γ_{1}, \end{matrix}

where

A = Γ_{2} Γ_{1}^{- 1}

and

G_{A} = Γ Γ_{1}^{- 1}

. Note that

A \in R^{(r - u_{1}) \times u_{1}}

characterizes

E_{Σ} (B_{1})

since A depends on

Γ

only through

span (Γ)

. Because A is unconstrained, under this parameterization, the optimization problem in (6) is converted to the following unconstrained non-manifold optimization problem:

\begin{matrix} \hat{A} = \underset{A \in R^{(r - u_{1}) \times u_{1}}}{arg min} \{- 2 \log | G_{A}^{T} G_{A} | + \log | G_{A}^{T} {\hat{Σ}}_{res} G_{A} | + \log | G_{A}^{T} {\hat{Σ}}_{R_{Y ∣ 2}}^{- 1} G_{A} |\} . \end{matrix}

(7)

Once we obtain

\hat{A}

from (7),

{\hat{E}}_{Σ} (B_{1}) = span ({\hat{G}}_{A})

, and then the partial envelope estimator of

β_{1}

is

{\hat{β}}_{1, penv} = P_{\hat{E}, 1} {\hat{β}}_{1, ols}

, where

{\hat{β}}_{1, ols}

is the ordinary least squares estimator of

β_{1}

. By the results of Su and Cook [], the partial envelope estimator

{\hat{β}}_{1, penv}

is as efficient as or more efficient than

{\hat{β}}_{1, ols}

.

3. Sparse Partial Envelope Model

We first define active responses and inactive responses. In Su et al. [], if the corresponding rows of

Γ

are composed of zeros, such response variables are called inactive; if its corresponding rows in

Γ

consist of nonzero values, such response variables are called active. Because

β_{1} = Γ η

, the regression coefficients of the inactive responses are zero. However, an active response may also have zero regression coefficients. Properties of the active responses were researched by Su et al. [] (Proposition 1) under the background of the sparse response envelope. Analogous results are built for the sparse partial envelope model: if the regression coefficients of an active response are all zero, then the response must have a relation to a response that has nonzero regression coefficients. This proposition demonstrates that if an active response has zero regression coefficients, it still provides information in estimating the nonzero regression coefficients. This is a fresh characteristic of response variable selection.

Without the loss of generality, we can write

Y = {(Y_{A}^{T}, Y_{I}^{T})}^{T}

, and let q denote the dimension of

Y_{A} (q ⩽ r)

, where

Y_{A} \in R^{q}

denotes the active responses and

Y_{I} \in R^{r - q}

denotes the inactive responses. The subscripts

A

and

I

are employed to a quantity if it is associated with active and inactive responses. For instance,

r_{A}

and

r_{I}

denote the number of active and inactive responses, and

r_{A} + r_{I} = r

. Then,

Γ

,

Γ_{0}

and

β_{2}

for the partial envelope model (5) have the following sparse structure:

\begin{matrix} Γ = (\begin{matrix} Γ_{A} \\ 0 \end{matrix}), Γ_{0} = (\begin{matrix} Γ_{A, 0} & 0 \\ 0 & I_{r - q} \end{matrix}) R, β_{2} = (\begin{matrix} β_{2, A} \\ 0 \end{matrix}), \end{matrix}

(8)

where

Γ_{A} \in R^{q \times u_{1}}

is a semi-orthogonal matrix,

Γ_{A, 0} \in R^{q \times (q - u_{1})}

is its completion,

R \in

R^{(r - u_{1}) \times (r - u_{1})}

is an orthogonal matrix, and

β_{2, A} \in R^{q \times (p - p_{1})}

denotes the coefficients for the active responses and the zero matrix has the dimension

(r - q) \times (p - p_{1})

. Because

Γ^{T} Y = Γ_{A}^{T} Y_{A}

, the inactive responses do not occur in the material part.

We call (5) the sparse partial envelope model if

Γ

,

Γ_{0}

and

β_{2}

have the sparse structure (8). Its estimator of

β_{1}

is the sparse partial envelope estimator. Under the sparse partial envelope model,

β_{1} = Γ η

also has a sparse structure, and we denote the coefficients for the active responses as

β_{1, A} = Γ_{A} η \in R^{q \times p_{1}}

and the zero matrix has the dimension

(r - q) \times p_{1}

. The completion of

Γ

has the general form

Γ_{0} = {\tilde{Γ}}_{0} R

, where

{\tilde{Γ}}_{0} \in R^{r \times (r - u_{1})}

is a completion with a block diagonal structure, and R represents a rotation of the orthogonal basis. The dimension of

Γ_{A}^{T} Y_{A}

should be at most the dimension of

Y_{A}

, so

u_{1} ⩽ q

. When

u = q

, there is no immaterial information in the active responses for the sparse envelope model, and

Γ_{A} = I_{q}

, but as long as

u_{1} < u

, there is still immaterial information in the active responses for the sparse partial envelope model. When

q = r

, there are no inactive responses and all rows in

Γ

are nonzero. Then, the sparse partial envelope model is equivalent to the partial envelope model.

The parameterization of A maintains the sparse structure of

Γ

. In other words,

Γ = G_{A} Γ_{1}

indicates that a row in

Γ

is composed of all zeros if and only if the corresponding row in A is composed of all zeros. Hence we can determine the inactive responses from the sparsity structure of A. In order to make the estimator of

β_{1}

a sparse estimator, the row-wise sparsity in A is induced by adding an adaptive group lasso penalty to the objective function in (7):

\begin{matrix} \hat{A} = \underset{A \in R^{(r - u_{1}) \times u_{1}}}{arg min} \{- 2 log |G_{A}^{T} G_{A}| + log |G_{A}^{T} {\hat{Σ}}_{res} G_{A}| + log |G_{A}^{T} {\hat{Σ}}_{R_{Y ∣ 2}}^{- 1} G_{A}| \\ + λ \sum_{i = 1}^{r - u_{1}} ω_{i} {∥a_{i}∥}_{2}\}, \end{matrix}

(9)

where

{∥ \cdot ∥}_{2}

is the norm of a vector,

a_{i}^{T}

denotes the ith row of A,

λ

is the tuning parameter and the

ω_{i}

s are the adaptive weights. According to Zou [], we set

ω_{i} = 1 / {∥{\hat{a}}_{i}∥}_{2}^{γ}

, where

γ

is a tuning parameter and

{\hat{a}}_{i}

is a

\sqrt{n}

-consistent estimator of

a_{i}

. The tuning parameter

γ

can be selected from a small candidate set such as

{0.5, 1, 2, 4, 8}

(Zou []).

If r grows to infinity with n, we denote r by

r_{n}

. When

r_{n} > n

,

{\hat{Σ}}_{R_{Y ∣ 2}}

and

{\hat{Σ}}_{res}

are both singular, and this is problematic because the objective function in (9) relies on

{\hat{Σ}}_{R_{Y ∣ 2}}^{- 1}

and the optimization algorithm that is employed to solve (9) needs

{\hat{Σ}}_{res}^{- 1}

. These issues can be solved by employing sparse permutation invariant covariance estimation (Rothman et al. [], SPICE) to acquire estimators of

Σ_{R_{Y ∣ 2}}^{- 1}

and

Σ^{- 1}

. Since sparse permutation invariant covariance estimation is the only one that does not need a sparsity structure for the target parameter for the sake of establishing the consistency of its estimator. In the sparse partial envelope model,

Σ_{R_{Y ∣ 2}}^{- 1}

and

Σ^{- 1}

may not involve zero elements. We employ sparse permutation invariant covariance estimators of

Σ_{R_{Y ∣ 2}}^{- 1}

and

Σ^{- 1}

and denote them as

{\hat{Σ}}_{R_{Y ∣ 2}, spice}^{- 1}

and

{\hat{Σ}}_{res, spice}^{- 1}

. Then, we acquire

{\hat{Σ}}_{R_{Y ∣ 2}, spice}

and

{\hat{Σ}}_{res, spice}

by taking the inverses of

{\hat{Σ}}_{R_{Y ∣ 2}, spice}^{- 1}

and

{\hat{Σ}}_{res, spice}^{- 1}

and substitute

{\hat{Σ}}_{R_{Y ∣ 2}, spice}^{- 1}

and

{\hat{Σ}}_{res, spice}

for

{\hat{Σ}}_{R_{Y ∣ 2}}^{- 1}

and

{\hat{Σ}}_{res}

in (9), and the objective function is

\begin{matrix} \hat{A} = \underset{A \in R^{(r_{n} - u_{1}) \times u_{1}}}{arg min} \{- 2 log |G_{A}^{T} G_{A}| + log |G_{A}^{T} {\hat{Σ}}_{res, spice} G_{A}| + log |G_{A}^{T} {\hat{Σ}}_{R_{Y ∣ 2}, spice}^{- 1} G_{A}| \\ + λ \sum_{i = 1}^{r_{n} - u_{1}} ω_{i} {∥a_{i}∥}_{2}\} . \end{matrix}

(10)

The optimizations of (9) and (10) are similar to the optimization problems discussed in Su et al. [] and Zhu and Su []. Once we obtain

\hat{A}

from (9) or (10),

\hat{Γ}

can be formed by employing an orthogonal basis of

span ({\hat{G}}_{A})

, and

{\hat{Γ}}_{0}

is taken as a complement of

\hat{Γ}

. The sparse partial envelope estimators of

β_{1}

and

Σ

are

\begin{matrix} \hat{β_{1}} = P_{\hat{Γ}} {\hat{β}}_{1, ols}, \hat{Σ} = P_{\hat{Γ}} {\hat{Σ}}_{res} P_{\hat{Γ}} + Q_{\hat{Γ}} {\hat{Σ}}_{R_{Y ∣ 2}} Q_{\hat{Γ}} . \end{matrix}

(11)

The estimators for the constituent parameters are

\hat{η} = {\hat{Γ}}^{T} {\hat{β}}_{1, ols}, \hat{Ω} = {\hat{Γ}}^{T} {\hat{Σ}}_{res} \hat{Γ}

and

{\hat{Ω}}_{0} = {\hat{Γ}}_{0}^{T} {\hat{Σ}}_{R_{Y ∣ 2}} {\hat{Γ}}_{0}

. Apart from the fact that

\hat{Γ}

and

{\hat{Γ}}_{0}

have the special structures in (8), the sparse partial envelope estimators have the identical form to the partial envelope estimators. Because the sparse partial envelope estimator is asymptotically equivalent to the maximum likelihood estimator of the oracle partial envelope model, see Section 4, we can employ likelihood-based procedures such as the Akaike information criterion, the Bayesian information criterion or likelihood ratio testing to select

u_{1}

. For selecting

λ

and

u_{1}

, we prefer cross-validation over the Bayesian information criterion and other likelihood-based procedures.

4. Theoretical Properties of the Sparse Partial Envelope Estimator

In this part, we investigate theoretical properties of the sparse partial envelope estimator. Theorems 1–3 provide the consistency and oracle properties of the sparse partial envelope estimator in the large-sample scenario when r is fixed and n tends to infinity. Theorems 4 and 5 give the selection consistency and convergence rate in the scenario when both

r_{n}

and n tend to infinity. Set

ξ_{i} = λ ω_{i}

, and let

ξ_{\max, n} = λ max (ω_{1}, \dots, ω_{q - u_{1}})

and

ξ_{\min, n} = λ min (ω_{q - u_{1} + 1}, \dots, ω_{r - u_{1}})

.

Theorem 1.

Suppose that the sparse partial envelope models (5) and (8) hold, the errors ε are independent and have finite fourth moment, and we further suppose

n^{1 / 2} ξ_{\max, n} \to 0

as n tends to infinity. Then, there exists a local minimizer

\hat{A}

of (9), such that

P_{\hat{Γ}}

is a

\sqrt{n}

-consistent estimator of

P_{Γ}

, and

\hat{β_{1}}

is a

\sqrt{n}

-consistent estimator of

β_{1}

.

Theorem 1 establishes the

\sqrt{n}

-consistency of the sparse partial envelope estimator of

P_{Γ}

and

β_{1}

. Notice that although the objective function for the sparse partial envelope estimator originates from the normal likelihood, we do not require normality to establish the

\sqrt{n}

-consistency of

{\hat{E}}_{Σ} (B_{1})

and

\hat{β_{1}}

.

Proof of Theorem 1.

We denote the objective function in (9) as

f_{obj} (A)

. In order to prove Theorem 1, we will demonstrate that for any small

ϵ > 0

, there exists a sufficiently large constant C, such that

lim_{n \to \infty} pr \{inf_{Δ \in R^{(r - u_{1}) \times u_{1}}, {∥ Δ ∥}_{F} = C} f_{obj} (A + n^{- 1 / 2} Δ) > f_{obj} (A)\} > 1 - ϵ .

(12)

If (12) holds, then there exists a local minimizer

\hat{A}

of

f_{obj}

such that

∥ \hat{A} {- A ∥}_{F} = O_{p} (n^{- 1 / 2})

. Hence,

\hat{A}

is a

\sqrt{n}

-consistent estimator of A. Because

P_{Γ} = G_{A} {(I_{u_{1}} + A^{T} A)}^{- 1} G_{A}^{T}

is merely a function of A,

P_{\hat{Γ}}

is a

\sqrt{n}

-consistent estimator of

P_{Γ}

. Since

{\hat{β}}_{1} = P_{\hat{Γ}} {\hat{β}}_{1, ols}

, and

{\hat{β}}_{1, ols}

is a

\sqrt{n}

-consistent estimator of

β_{1}

, then

{\hat{β}}_{1}

is a

\sqrt{n}

-consistent estimator of

β_{1}

.

Now, we only need to prove (12). We compute

f_{obj} (A + n^{- 1 / 2} Δ) - f_{obj} (A)

by employing the Taylor expansion. Because the form of

f_{obj}

is a little complex, we write it into four parts

\begin{matrix} f_{obj} (A) & = - 2 log | G_{A}^{T} G_{A} | + log | G_{A}^{T} {\hat{Σ}}_{res} G_{A} | + log | G_{A}^{T} {\hat{Σ}}_{R_{Y ∣ 2}}^{- 1} G_{A} | + \sum_{i = 1}^{r - u_{1}} λ ω_{i} {∥ a_{i} ∥}_{2} \\ \equiv f_{1} (A) + f_{2} (A) + f_{3} (A) + f_{4} (A) . \end{matrix}

We first center on

f_{1} (A) = - 2 log | G_{A}^{T} G_{A} |

and then expand

f_{1} (A + n^{- 1 / 2} Δ)

,

\begin{matrix} f_{1} (A + n^{- 1 / 2} Δ) = f_{1} (A) + n^{- 1 / 2} \overset{\to Δ}{{d f}_{1}} (A) + \frac{1}{2} n^{- 1} \overset{\to Δ}{{d f}_{1}^{2}} (A) + o_{p} (n^{- 1}), \end{matrix}

where

\overset{\to Δ}{{d f}_{1}} (A)

and

\overset{\to Δ}{{d f}_{1}^{2}} (A)

are the first and second directional derivatives (Dattorro []). The first directional derivative is

\begin{matrix} \overset{\to Δ}{{d f}_{1}} (A) = tr \{{[\frac{d f_{1} (A)}{d A}]}^{T} Δ\} = - 4 tr [{(I_{u_{1}} + A^{T} A)}^{- 1} A^{T} Δ] . \end{matrix}

The second directional derivative is

\begin{matrix} \overset{\to Δ}{{d f}_{1}^{2}} (A) = & tr ({[\frac{\overset{\to Δ}{{d f}_{1}} (A)}{d A}]}^{T} Δ) \\ = & - 4 tr ({[\frac{d}{d A} tr \{{(I_{u_{1}} + A^{T} A)}^{- 1} A^{T} Δ\}]}^{T} Δ) \\ = & - 4 tr [\{- A {(I_{u_{1}} + A^{T} A)}^{- 1} (A^{T} Δ + Δ^{T} A) {(I_{u_{1}} + A^{T} A)}^{- 1} \\ {+ Δ {(I_{u_{1}} + A^{T} A)}^{- 1}\}}^{T} Δ] \\ = & 4 tr \{{(I_{u_{1}} + A^{T} A)}^{- 1} (A^{T} Δ + Δ^{T} A) {(I_{u_{1}} + A^{T} A)}^{- 1} A^{T} Δ \\ - {(I_{u_{1}} + A^{T} A)}^{- 1} Δ^{T} Δ\} . \end{matrix}

Let

Δ_{*} = (\begin{matrix} 0 \\ Δ \end{matrix}),

then

\begin{matrix} \overset{\to Δ}{{d f}_{1}^{2}} (A) \\ = & 4 tr [{(I_{u_{1}} + A^{T} A)}^{- 1} A^{T} Δ {(I_{u_{1}} + A^{T} A)}^{- 1} A^{T} Δ \\ + {(I_{u_{1}} + A^{T} A)}^{- 1} Δ^{T} \{A {(I_{u_{1}} + A^{T} A)}^{- 1} A^{T} - I_{r - u_{1}}\} Δ] \\ = & 4 tr [{(I_{u_{1}} + A^{T} A)}^{- 1} A^{T} Δ {(I_{u_{1}} + A^{T} A)}^{- 1} A^{T} Δ \\ + {(I_{u_{1}} + A^{T} A)}^{- 1} Δ_{*}^{T} \{G_{A} {(G_{A}^{T} G_{A})}^{- 1} G_{A}^{T} - I_{r}\} Δ_{*}] \\ = & 4 tr \{{(I_{u_{1}} + A^{T} A)}^{- 1} A^{T} Δ {(I_{u_{1}} + A^{T} A)}^{- 1} A^{T} Δ + {(I_{u_{1}} + A^{T} A)}^{- 1} Δ_{*}^{T} (Γ Γ^{T} - I_{r}) Δ_{*}\} \\ = & 4 tr \{{(I_{u_{1}} + A^{T} A)}^{- 1} A^{T} Δ {(I_{u_{1}} + A^{T} A)}^{- 1} A^{T} Δ - {(I_{u_{1}} + A^{T} A)}^{- 1} Δ_{*}^{T} Γ_{0} Γ_{0}^{T} Δ_{*}\} . \end{matrix}

Substitute

\overset{\to Δ}{{d f}_{1}} (A)

and

\overset{\to Δ}{{d f}_{1}^{2}} (A)

into the expansion for

f_{1} (A + n^{- 1 / 2} Δ)

, and we can acquire

\begin{matrix} f_{1} (A + n^{- 1 / 2} Δ) - f_{1} (A) = & - 4 n^{- 1 / 2} tr \{{(I_{u_{1}} + A^{T} A)}^{- 1} A^{T} Δ\} \\ + 2 n^{- 1} tr \{{(I_{u_{1}} + A^{T} A)}^{- 1} A^{T} Δ {(I_{u_{1}} + A^{T} A)}^{- 1} A^{T} Δ \\ - {(I_{u_{1}} + A^{T} A)}^{- 1} Δ_{*}^{T} Γ_{0} Γ_{0}^{T} Δ_{*}\} + o_{p} (n^{- 1}) . \end{matrix}

Now, we expand

f_{2} (A) = log | G_{A}^{T} {\hat{Σ}}_{res} G_{A} |

. The first directional derivative is

\begin{matrix} \overset{\to Δ}{{d f}_{2}} (A) = tr \{{[\frac{{d f}_{2} (A)}{d A}]}^{T} Δ\} = 2 tr \{{(G_{A}^{T} {\hat{Σ}}_{res} G_{A})}^{- 1} G_{A}^{T} {\hat{Σ}}_{res} Δ_{*}\} . \end{matrix}

Let

Σ_{R_{1 ∣ 2}}, Σ_{R_{Y ∣ 2}}

and

Σ_{R_{Y ∣ 2} R_{1 ∣ 2}}

be the variance matrix of

R_{1 ∣ 2}

, the variance matrix of

R_{Y ∣ 2}

and the covariance matrix of

R_{Y ∣ 2}

and

R_{1 ∣ 2}

in population, and let

{\hat{Σ}}_{R_{1 ∣ 2}}, {\hat{Σ}}_{R_{Y ∣ 2}}

and

{\hat{Σ}}_{R_{1 ∣ 2} R_{Y ∣ 2}}

be the corresponding sample versions. Then, taking advantage of Cook and Setodji [],

\begin{matrix} n^{1 / 2} ({\hat{Σ}}_{R_{Y ∣ 2} R_{1 ∣ 2}} - Σ_{R_{Y ∣ 2} R_{1 ∣ 2}}) & = n^{- 1 / 2} ({(R_{Y ∣ 2})}_{c}^{T} R_{1 ∣ 2} - n Σ_{R_{Y ∣ 2} R_{1 ∣ 2}}) + O_{p} (n^{- 1 / 2}), \\ n^{1 / 2} ({\hat{Σ}}_{R_{1 ∣ 2}} - Σ_{R_{1 ∣ 2}}) & = n^{- 1 / 2} (R_{1 ∣ 2}^{T} R_{1 ∣ 2} - n Σ_{R_{1 ∣ 2}}) + O_{p} (n^{- 1 / 2}), \\ n^{1 / 2} ({\hat{Σ}}_{R_{Y ∣ 2}} - Σ_{R_{Y ∣ 2}}) & = n^{- 1 / 2} ({(R_{Y ∣ 2})}_{c}^{T} {(R_{Y ∣ 2})}_{c} - n Σ_{R_{Y ∣ 2}}) + O_{p} (n^{- 1 / 2}), \end{matrix}

where

{(R_{Y ∣ 2})}_{c} \in R^{n \times r}

is the centred data matrix of

R_{Y ∣ 2}

, whose ith row is

{({(R_{Y ∣ 2})}_{i} - \bar{R_{Y ∣ 2}})}^{T}

. Because

{\hat{Σ}}_{res} = {\hat{Σ}}_{R_{Y ∣ 2}} - {\hat{Σ}}_{R_{Y ∣ 2} R_{1 ∣ 2}} {\hat{Σ}}_{R_{1 ∣ 2}}^{- 1} {\hat{Σ}}_{R_{1 ∣ 2} R_{Y ∣ 2}}

and

{\hat{Σ}}_{R_{1 ∣ 2}}^{- 1} - Σ_{R_{1 ∣ 2}}^{- 1} = - Σ_{R_{1 ∣ 2}}^{- 1} ({\hat{Σ}}_{R_{1 ∣ 2}} - Σ_{R_{1 ∣ 2}}) Σ_{R_{1 ∣ 2}}^{- 1} + O_{p} (n^{- 1})

,

\begin{matrix} {\hat{Σ}}_{res} = & ({\hat{Σ}}_{R_{Y ∣ 2}} - Σ_{R_{Y ∣ 2}} + Σ_{R_{Y ∣ 2}}) - ({\hat{Σ}}_{R_{Y ∣ 2} R_{1 ∣ 2}} - Σ_{R_{Y ∣ 2} R_{1 ∣ 2}} + Σ_{R_{Y ∣ 2} R_{1 ∣ 2}}) \\ ({\hat{Σ}}_{R_{1 ∣ 2}}^{- 1} - Σ_{R_{1 ∣ 2}}^{- 1} + Σ_{R_{1 ∣ 2}}^{- 1}) ({\hat{Σ}}_{R_{1 ∣ 2} R_{Y ∣ 2}} - Σ_{R_{1 ∣ 2} R_{Y ∣ 2}} + Σ_{R_{1 ∣ 2} R_{Y ∣ 2}}) \\ = & Σ + n^{- 1 / 2} \{- n^{- 1 / 2} ({(R_{Y ∣ 2})}_{c}^{T} R_{1 ∣ 2} - n Σ_{R_{Y ∣ 2} R_{1 ∣ 2}}) Σ_{R_{1 ∣ 2}}^{- 1} Σ_{R_{1 ∣ 2} R_{Y ∣ 2}} \\ + n^{- 1 / 2} Σ_{R_{Y ∣ 2} R_{1 ∣ 2}} Σ_{R_{1 ∣ 2}}^{- 1} (R_{1 ∣ 2}^{T} R_{1 ∣ 2} - n Σ_{R_{1 ∣ 2}}) Σ_{R_{1 ∣ 2}}^{- 1} Σ_{R_{1 ∣ 2} R_{Y ∣ 2}} \\ - n^{- 1 / 2} Σ_{R_{Y ∣ 2} R_{1 ∣ 2}} Σ_{R_{1 ∣ 2}}^{- 1} (R_{1 ∣ 2}^{T} {(R_{Y ∣ 2})}_{c} - n Σ_{R_{1 ∣ 2} R_{Y ∣ 2}}) \\ + n^{- 1 / 2} ({(R_{Y ∣ 2})}_{c}^{T} {(R_{Y ∣ 2})}_{c} - n Σ_{R_{Y ∣ 2}})\} + O_{p} (n^{- 1}) \\ \equiv & Σ + n^{- 1 / 2} (T_{1 n} + T_{2 n} + T_{3 n} + T_{4 n}) + O_{p} (n^{- 1}), \end{matrix}

where by means of the central limit theorem, each element in

T_{1 n}

,

T_{2 n}

,

T_{3 n}

and

T_{4 n}

converges in distribution to a normal random variable with mean 0. As

\begin{matrix} {(G_{A}^{T} {\hat{Σ}}_{res} G_{A})}^{- 1} = & {(G_{A}^{T} Σ G_{A})}^{- 1} - {(G_{A}^{T} Σ G_{A})}^{- 1} (G_{A}^{T} {\hat{Σ}}_{res} G_{A} - G_{A}^{T} Σ G_{A}) {(G_{A}^{T} Σ G_{A})}^{- 1} \\ + O_{p} (n^{- 1}) \\ = & {(G_{A}^{T} Σ G_{A})}^{- 1} \\ - n^{- 1 / 2} {(G_{A}^{T} Σ G_{A})}^{- 1} G_{A}^{T} (T_{1 n} + T_{2 n} + T_{3 n} + T_{4 n}) G_{A} {(G_{A}^{T} Σ G_{A})}^{- 1} \\ + O_{p} (n^{- 1}), \end{matrix}

we can expand

\overset{\to Z^{*}}{{d f}_{2}} (Γ)

as

\begin{matrix} 2 tr \{{(G_{A}^{T} {\hat{Σ}}_{res} G_{A})}^{- 1} G_{A}^{T} {\hat{Σ}}_{res} Δ_{*}\} \\ = & 2 tr \{{(G_{A}^{T} Σ G_{A})}^{- 1} G_{A}^{T} Σ Δ_{*}\} + 2 n^{- 1 / 2} tr \{{(G_{A}^{T} Σ G_{A})}^{- 1} G_{A}^{T} (T_{1 n} + T_{2 n} + T_{3 n} + T_{4 n}) Δ_{*} \\ - {(G_{A}^{T} Σ G_{A})}^{- 1} G_{A}^{T} (T_{1 n} + T_{2 n} + T_{3 n} + T_{4 n}) G_{A} {(G_{A}^{T} Σ G_{A})}^{- 1} G_{A}^{T} Σ Δ_{*}\} + O_{p} (n^{- 1}) \\ = & 2 tr \{{(I_{u_{1}} + A^{T} A)}^{- 1} G_{A}^{T} Δ_{*}\} \\ + 2 n^{- 1 / 2} tr [{(G_{A}^{T} Σ G_{A})}^{- 1} G_{A}^{T} (T_{1 n} + T_{2 n} + T_{3 n} + T_{4 n}) \{I_{r} - G_{A} {(I_{u_{1}} + A^{T} A)}^{- 1} G_{A}^{T}\} Δ_{*}] \\ + O_{p} (n^{- 1}) \\ = & 2 tr \{{(I_{u_{1}} + A^{T} A)}^{- 1} A^{T} Δ\} + 2 n^{- 1 / 2} tr \{{(G_{A}^{T} Σ G_{A})}^{- 1} G_{A}^{T} (T_{3 n} + T_{4 n}) Γ_{0} Γ_{0}^{T} Δ_{*}\} \\ + O_{p} (n^{- 1}) \\ = & 2 tr \{{(I_{u_{1}} + A^{T} A)}^{- 1} A^{T} Δ\} + 2 n^{- 1 / 2} tr \{Γ_{1} Ω^{- 1} Γ^{T} (T_{3 n} + T_{4 n}) Γ_{0} Γ_{0}^{T} Δ_{*}\} + O_{p} (n^{- 1}) . \end{matrix}

The second equality is because

Γ = G_{A} Γ_{1}

, so

\begin{matrix} Γ_{1}^{T} G_{A}^{T} G_{A} Γ_{1} = I & \Rightarrow Γ_{1}^{T} (I_{u_{1}} + A^{T} A) Γ_{1} = I \\ \Rightarrow I_{u_{1}} + A^{T} A = {(Γ_{1}^{T})}^{- 1} {(Γ_{1})}^{- 1} \\ \Rightarrow {(I_{u_{1}} + A^{T} A)}^{- 1} = Γ_{1} Γ_{1}^{T}, \end{matrix}

and

\begin{matrix} {(G_{A}^{T} Σ G_{A})}^{- 1} G_{A}^{T} Σ & = {\{{(Γ Γ_{1}^{- 1})}^{T} Σ Γ Γ_{1}^{- 1}\}}^{- 1} {(Γ Γ_{1}^{- 1})}^{T} Σ \\ = Γ_{1} Ω^{- 1} Ω Γ^{T} = Γ_{1} Γ_{1}^{T} G_{A}^{T} = {(I_{u_{1}} + A^{T} A)}^{- 1} G_{A}^{T} . \end{matrix}

By the Cauchy–Schwarz inequality for the matrix trace (Magnus and Neudecker []),

\begin{matrix} |tr \{Γ_{1} Ω^{- 1} Γ^{T} (T_{3 n} + T_{4 n}) Γ_{0} Γ_{0}^{T} Δ_{*}\}| & \leq {∥Δ_{*}∥}_{F} {∥Γ_{1} Ω^{- 1} Γ^{T} (T_{3 n} + T_{4 n}) Γ_{0} Γ_{0}^{T}∥}_{F} \\ = {∥ Δ ∥}_{F} {∥Γ_{1} Ω^{- 1} Γ^{T} (T_{3 n} + T_{4 n}) Γ_{0}∥}_{F} . \end{matrix}

The second directional derivative of

f_{2}

is

\begin{matrix} \overset{\to Δ}{{d f}_{2}^{2}} (A) = & 2 tr ({[\frac{d}{d A} tr \{{(G_{A}^{T} {\hat{Σ}}_{res} G_{A})}^{- 1} G_{A}^{T} {\hat{Σ}}_{res} Δ_{*}\}]}^{T} Δ) \\ = & 2 tr \{{(G_{A}^{T} {\hat{Σ}}_{res} G_{A})}^{- 1} Δ_{*}^{T} {\hat{Σ}}_{res} Δ_{*} \\ - {(G_{A}^{T} {\hat{Σ}}_{res} G_{A})}^{- 1} (G_{A}^{T} {\hat{Σ}}_{res} Δ_{*} + Δ_{*}^{T} {\hat{Σ}}_{res} G_{A}) {(G_{A}^{T} {\hat{Σ}}_{res} G_{A})}^{- 1} G_{A}^{T} {\hat{Σ}}_{res} Δ_{*}\} \\ = & 2 tr \{{(G_{A}^{T} Σ G_{A})}^{- 1} Δ_{*}^{T} Σ Δ_{*} \\ - {(G_{A}^{T} Σ G_{A})}^{- 1} (G_{A}^{T} Σ Δ_{*} + Δ_{*}^{T} Σ G_{A}) {(G_{A}^{T} Σ G_{A})}^{- 1} G_{A}^{T} Σ Δ_{*}\} \\ + O_{p} (n^{- 1 / 2}) \\ = & 2 tr [- {(I_{u_{1}} + A^{T} A)}^{- 1} G_{A}^{T} Δ_{*} {(I_{u_{1}} + A^{T} A)}^{- 1} G_{A}^{T} Δ_{*} \\ + {(G_{A}^{T} Σ G_{A})}^{- 1} Δ_{*}^{T} Σ \{I_{r} - G_{A} {(G_{A}^{T} Σ G_{A})}^{- 1} G_{A}^{T} Σ\} Δ_{*}] + O_{p} (n^{- 1 / 2}) \\ = & 2 tr [- {(I_{u_{1}} + A^{T} A)}^{- 1} A^{T} Δ {(I_{u_{1}} + A^{T} A)}^{- 1} A^{T} Δ \\ + \{{(Γ_{1}^{- 1})}^{T} Ω Γ_{1}^{- 1}}^{- 1} Δ_{*}^{T} Σ \{I_{r} - G_{A} {(I_{u_{1}} + A^{T} A)}^{- 1} G_{A}^{T}\} Δ_{*}] + O_{p} (n^{- 1 / 2}) \\ = & 2 tr \{- {(I_{u_{1}} + A^{T} A)}^{- 1} A^{T} Δ {(I_{u_{1}} + A^{T} A)}^{- 1} A^{T} Δ + Γ_{1} Ω^{- 1} Γ_{1}^{T} Δ_{*}^{T} Σ Γ_{0} Γ_{0}^{T} Δ_{*}\} \\ + O_{p} (n^{- 1 / 2}) \\ = & 2 tr \{- {(I_{u_{1}} + A^{T} A)}^{- 1} A^{T} Δ {(I_{u_{1}} + A^{T} A)}^{- 1} A^{T} Δ + Ω^{- 1} Γ_{1}^{T} Δ_{*}^{T} Γ_{0} Ω_{0} Γ_{0}^{T} Δ_{*} Γ_{1}\} \\ + O_{p} (n^{- 1 / 2}) . \end{matrix}

Substitute

\overset{\to Δ}{{d f}_{2}} (A)

and

\overset{\to Δ}{{d f}_{2}^{2}} (A)

into the expansion for

f_{2} (A + n^{- 1 / 2} Δ)

, and we obtain

\begin{matrix} f_{2} (A + n^{- 1 / 2} Δ) - f_{2} (A) \\ = & 2 n^{- 1 / 2} tr \{{(I_{u_{1}} + A^{T} A)}^{- 1} A^{T} Δ\} + 2 n^{- 1} tr \{Γ_{1} Ω^{- 1} Γ^{T} (T_{3 n} + T_{4 n}) Γ_{0} Γ_{0}^{T} Δ_{*}\} \\ + n^{- 1} tr \{- {(I_{u_{1}} + A^{T} A)}^{- 1} A^{T} Δ {(I_{u_{1}} + A^{T} A)}^{- 1} A^{T} Δ + Ω^{- 1} Γ_{1}^{T} Δ_{*}^{T} Γ_{0} Ω_{0} Γ_{0}^{T} Δ_{*} Γ_{1}\} \\ + o_{p} (n^{- 1}) \\ \geq & 2 n^{- 1 / 2} tr \{{(I_{u_{1}} + A^{T} A)}^{- 1} A^{T} Δ\} - 2 n^{- 1} {∥ Δ ∥}_{F} {∥Γ_{1} Ω^{- 1} Γ^{T} (T_{3 n} + T_{4 n}) Γ_{0}∥}_{F} \\ + n^{- 1} tr \{- {(I_{u_{1}} + A^{T} A)}^{- 1} A^{T} Δ {(I_{u_{1}} + A^{T} A)}^{- 1} A^{T} Δ + Ω^{- 1} Γ_{1}^{T} Δ_{*}^{T} Γ_{0} Ω_{0} Γ_{0}^{T} Δ_{*} Γ_{1}\} \\ + o_{p} (n^{- 1}) . \end{matrix}

Notice that

f_{3}

and

f_{2}

have a similar structure, except that

{\hat{Σ}}_{res}

is replaced by

{\hat{Σ}}_{R_{Y ∣ 2}}^{- 1}

. Let

T_{5 n} = - n^{- 1 / 2} Σ_{R_{Y ∣ 2}}^{- 1} ({(R_{Y ∣ 2})}_{c}^{T} {(R_{Y ∣ 2})}_{c} - n Σ_{R_{Y ∣ 2}}) Σ_{R_{Y ∣ 2}}^{- 1}

. By means of the central limit theorem,

T_{5 n}

converges in distribution to a normal random variable with mean 0. Then, we implement the same expansion to

f_{3}

and obtain

\begin{matrix} f_{3} (A + n^{- 1 / 2} Δ) - f_{3} (A) \\ = & 2 n^{- 1 / 2} tr \{{(I_{u_{1}} + A^{T} A)}^{- 1} A^{T} Δ\} + 2 n^{- 1} tr \{Γ_{1} (Ω + η Σ_{R_{1 ∣ 2}} η^{T}) Γ^{T} T_{5 n} Γ_{0} Γ_{0}^{T} Δ_{*}\} \\ + n^{- 1} tr \{- {(I_{u_{1}} + A^{T} A)}^{- 1} A^{T} Δ {(I_{u_{1}} + A^{T} A)}^{- 1} A^{T} Δ \\ + (Ω + η Σ_{R_{1 ∣ 2}} η^{T}) Γ_{1}^{T} Δ_{*}^{T} Γ_{0} Ω_{0}^{- 1} Γ_{0}^{T} Δ_{*} Γ_{1}\} \\ + o_{p} (n^{- 1}) \\ \geq & 2 n^{- 1 / 2} tr \{{(I_{u_{1}} + A^{T} A)}^{- 1} A^{T} Δ\} - 2 n^{- 1} {∥ Δ ∥}_{F} {∥Γ_{1} (Ω + η Σ_{R_{1 ∣ 2}} η^{T}) Γ^{T} T_{5 n} Γ_{0}∥}_{F} \\ + n^{- 1} tr \{- {(I_{u_{1}} + A^{T} A)}^{- 1} A^{T} Δ {(I_{u_{1}} + A^{T} A)}^{- 1} A^{T} Δ \\ + (Ω + η Σ_{R_{1 ∣ 2}} η^{T}) Γ_{1}^{T} Δ_{*}^{T} Γ_{0} Ω_{0}^{- 1} Γ_{0}^{T} Δ_{*} Γ_{1}\} \\ + o_{p} (n^{- 1}) . \end{matrix}

Now, we expand

f_{4} (A) = \sum_{i = 1}^{r - u_{1}} λ ω_{i} {∥ a_{i} ∥}_{2}

. Let

δ_{i}^{T}

be the ith row of

Δ

, then

\begin{matrix} f_{4} (A + n^{- 1 / 2} Δ) - f_{4} (A) & \geq \sum_{i = 1}^{q - u_{1}} (λ ω_{i} ∥ a_{i} + n^{- 1 / 2} δ_{i} ∥_{2} - λ ω_{i} {∥ a_{i} ∥}_{2}) \\ \geq - \frac{1}{2} (q - u_{1}) n^{- 1 / 2} ξ_{\max, n} max_{i} (∥ a_{i} ∥_{2}^{- 1} {∥ δ_{i} ∥}_{2}) {1 + o_{p} (1)} \\ = - \frac{1}{2} n^{- 1} (q - u_{1}) n^{1 / 2} ξ_{\max, n} max_{i} (∥ a_{i} ∥_{2}^{- 1} {∥ δ_{i} ∥}_{2}) {1 + o_{p} (1)} . \end{matrix}

The second inequality is based on the Taylor expansion at

a_{i}

. When

n^{1 / 2} ξ_{\max, n} \to 0

as

n \to \infty

,

n \{f_{4} (A + n^{- 1 / 2} Δ) - f_{4} (A)\} = o_{p} (1)

. Combine the results for

f_{1}

,

f_{2}

,

f_{3}

and

f_{4}

, and we have

\begin{matrix} f_{obj} (A + n^{- 1 / 2} Δ) - f_{obj} (A) \\ \geq & - 2 n^{- 1} {∥ Δ ∥}_{F} {∥Γ_{1} Ω^{- 1} Γ^{T} (T_{3 n} + T_{4 n}) Γ_{0}∥}_{F} - 2 n^{- 1} {∥ Δ ∥}_{F} {∥Γ_{1} (Ω + η Σ_{R_{1 ∣ 2}} η^{T}) Γ^{T} T_{5 n} Γ_{0}∥}_{F} \\ + n^{- 1} tr \{Ω^{- 1} Γ_{1}^{T} Δ_{*}^{T} Γ_{0} Ω_{0} Γ_{0}^{T} Δ_{*} Γ_{1} + (Ω + η Σ_{R_{1 ∣ 2}} η^{T}) Γ_{1}^{T} Δ_{*}^{T} Γ_{0} Ω_{0}^{- 1} Γ_{0}^{T} Δ_{*} Γ_{1} \\ - 2 {(I_{u_{1}} + A^{T} A)}^{- 1} Δ_{*}^{T} Γ_{0} Γ_{0}^{T} Δ_{*}\} - \frac{1}{2} n^{- 1} (q - u_{1}) n^{1 / 2} ξ_{\max, n} max_{i} (∥ a_{i} ∥_{2}^{- 1} {∥ δ_{i} ∥}_{2}) \\ + o_{p} (n^{- 1}) . \end{matrix}

Let

m_{1}

be the smallest eigenvalue of M. The matrix M appears in (5.7) of Cook et al. []. By Shapiro [], M is a positive definite matrix and

m_{1} > 0

. Then, we have

\begin{matrix} tr \{Ω^{- 1} Γ_{1}^{T} Δ_{*}^{T} Γ_{0} Ω_{0} Γ_{0}^{T} Δ_{*} Γ_{1} + (Ω + η Σ_{R_{1 ∣ 2}} η^{T}) Γ_{1}^{T} Δ_{*}^{T} Γ_{0} Ω_{0}^{- 1} Γ_{0}^{T} Δ_{*} Γ_{1} \\ - 2 {(I_{u_{1}} + A^{T} A)}^{- 1} Δ_{*}^{T} Γ_{0} Γ_{0}^{T} Δ_{*}\} \\ = & tr \{Ω^{- 1} Γ_{1}^{T} Δ_{*}^{T} Γ_{0} Ω_{0} Γ_{0}^{T} Δ_{*} Γ_{1} + (Ω + η Σ_{R_{1 ∣ 2}} η^{T}) Γ_{1}^{T} Δ_{*}^{T} Γ_{0} Ω_{0}^{- 1} Γ_{0}^{T} Δ_{*} Γ_{1} \\ - 2 Γ_{1}^{T} Δ_{*}^{T} Γ_{0} Γ_{0}^{T} Δ_{*} Γ_{1}\} \\ = & vec {(Γ_{0}^{T} Δ_{*} Γ_{1})}^{T} (Ω \otimes Ω_{0}^{- 1} + Ω^{- 1} \otimes Ω_{0} - 2 I_{u_{1}} \otimes I_{r - u_{1}} \\ + η Σ_{R_{1 ∣ 2}} η^{T} \otimes Ω_{0}^{- 1}) vec (Γ_{0}^{T} Δ_{*} Γ_{1}) \\ \equiv & vec {(Γ_{0}^{T} Δ_{*} Γ_{1})}^{T} M vec (Γ_{0}^{T} Δ_{*} Γ_{1}) \\ \geq & m_{1} {∥Γ_{0}^{T} Δ_{*} Γ_{1}∥}_{F}^{2} \\ = & m_{1} tr (Γ_{0}^{T} Δ_{*} Γ_{1} Γ_{1}^{T} Δ_{*}^{T} Γ_{0}) \\ = & m_{1} tr \{Γ_{0}^{T} Δ_{*} {(I_{u_{1}} + A^{T} A)}^{- 1} Δ_{*}^{T} Γ_{0}\} \\ = & m_{1} tr \{Δ_{*} {(I_{u_{1}} + A^{T} A)}^{- 1} Δ_{*}^{T} (I_{r} - Γ Γ^{T})\} \\ = & m_{1} tr [{(I_{u_{1}} + A^{T} A)}^{- 1} Δ_{*}^{T} \{I_{r} - G_{A} {(I_{u_{1}} + A^{T} A)}^{- 1} G_{A}^{T}\} Δ_{*}] \\ = & m_{1} tr [{(I_{u_{1}} + A^{T} A)}^{- 1} Δ^{T} \{I_{r - u_{1}} - A {(I_{u_{1}} + A^{T} A)}^{- 1} A^{T}\} Δ] \\ = & m_{1} tr \{{(I_{u_{1}} + A^{T} A)}^{- 1} Δ^{T} {(I_{u_{1}} + A^{T} A)}^{- 1} Δ\} \\ = & m_{1} vec {(Δ)}^{T} \{{(I_{u_{1}} + A^{T} A)}^{- 1} \otimes {(I_{u_{1}} + A^{T} A)}^{- 1}\} vec (Δ) \\ \geq & m_{1} m_{2}^{2} {∥ Δ ∥}_{F}^{2}, \end{matrix}

where

m_{2}

is the smallest eigenvalue of

{(I_{u_{1}} + A^{T} A)}^{- 1}

. When

{∥ Δ ∥}_{F} > C

for sufficiently large C, the terms with order

{∥ Δ ∥}_{F}^{2}

dominate the terms with order

{∥ Δ ∥}_{F}

. When

{∥ Δ ∥}_{F} = C

for sufficiently large C, conclusion (12) follows. □

Theorem 2.

Suppose that the conditions in Theorem 1 hold, and further suppose that

n^{1 / 2} ξ_{\min, n} \to \infty

. Then,

pr ({\hat{a}}_{i} = 0) \to 1

for

i = q - u_{1} + 1, \dots, r - u_{1}

.

Theorem 2 establishes the selection consistency of the sparse partial envelope estimator and shows that the inactive responses are identified by the sparse partial envelope model with the probability tending to 1.

Proof of Theorem 2.

We prove Theorem 2 by contradiction. Assume that

∥ {\hat{a}}_{i} ∥_{2} > 0

for

i = q + 1 - u_{1}, \dots, r - u_{1}

. Let

e_{i}

be the ith column of

I_{r}

. The first derivative of

f_{obj}

concerning

a_{i}

should be 0 evaluated at the local minimum

{\hat{a}}_{i}

. The first derivative of

f_{obj}

concerning

a_{i}^{T} (i = q + 1 - u_{1}, \dots, r - u_{1})

is

\begin{matrix} \frac{d f_{obj} (A)}{d a_{i}^{T}} |_{a_{i} = {\hat{a}}_{i}} = & - 4 e_{i}^{T} {\hat{G}}_{A} {(I_{u_{1}} + {\hat{A}}^{T} \hat{A})}^{- 1} + 2 e_{i}^{T} {\hat{Σ}}_{res} {\hat{G}}_{A} {({\hat{G}}_{A}^{T} {\hat{Σ}}_{res} {\hat{G}}_{A})}^{- 1} \\ + 2 e_{i}^{T} {\hat{Σ}}_{R_{Y ∣ 2}}^{- 1} {\hat{G}}_{A} {({\hat{G}}_{A}^{T} {\hat{Σ}}_{R_{Y ∣ 2}}^{- 1} {\hat{G}}_{A})}^{- 1} + \frac{λ ω_{i} {\hat{a}}_{i}^{T}}{{∥{\hat{a}}_{i}∥}_{2}} = 0 . \end{matrix}

(13)

Since

{\hat{Σ}}_{res}

,

{\hat{Σ}}_{R_{Y ∣ 2}}

and

\hat{A}

are

\sqrt{n}

-consistent estimators of

Σ

,

Σ_{R_{Y ∣ 2}}

and A, so

Σ = Γ Ω Γ^{T} + Γ_{0} Ω_{0} Γ_{0}^{T}

and

Σ_{R_{Y ∣ 2}} = Γ (Ω + η Σ_{R_{1 ∣ 2}} η^{T}) Γ^{T} + Γ_{0} Ω_{0} Γ_{0}^{T}

, then

\begin{matrix} - 4 e_{i}^{T} {\hat{G}}_{A} {(I_{u_{1}} + {\hat{A}}^{T} \hat{A})}^{- 1} + 2 e_{i}^{T} {\hat{Σ}}_{res} {\hat{G}}_{A} {({\hat{G}}_{A}^{T} {\hat{Σ}}_{res} {\hat{G}}_{A})}^{- 1} + 2 e_{i}^{T} {\hat{Σ}}_{R_{Y ∣ 2}}^{- 1} {\hat{G}}_{A} {({\hat{G}}_{A}^{T} {\hat{Σ}}_{R_{Y ∣ 2}}^{- 1} {\hat{G}}_{A})}^{- 1} \\ = & - 4 e_{i}^{T} G_{A} {(I_{u_{1}} + A^{T} A)}^{- 1} + 2 e_{i}^{T} Σ G_{A} {(G_{A}^{T} Σ G_{A})}^{- 1} \\ + 2 e_{i}^{T} Σ_{R_{Y ∣ 2}}^{- 1} G_{A} {(G_{A}^{T} Σ_{R_{Y ∣ 2}}^{- 1} G_{A})}^{- 1} + O_{p} (n^{- 1 / 2}) \\ = & - 4 a_{i}^{T} {(I_{u_{1}} + A^{T} A)}^{- 1} + 2 e_{i}^{T} G_{A} {(I_{u_{1}} + A^{T} A)}^{- 1} + 2 e_{i}^{T} G_{A} {(I_{u_{1}} + A^{T} A)}^{- 1} + O_{p} (n^{- 1 / 2}) \\ = & - 4 a_{i}^{T} {(I_{u_{1}} + A^{T} A)}^{- 1} + 2 a_{i}^{T} {(I_{u_{1}} + A^{T} A)}^{- 1} + 2 a_{i}^{T} {(I_{u_{1}} + A^{T} A)}^{- 1} + O_{p} (n^{- 1 / 2}) \\ = & O_{p} (n^{- 1 / 2}) . \end{matrix}

Then,

\begin{matrix} n^{1 / 2} \{- 4 e_{i}^{T} {\hat{G}}_{A} {(I_{u_{1}} + {\hat{A}}^{T} \hat{A})}^{- 1} + 2 e_{i}^{T} {\hat{Σ}}_{res} {\hat{G}}_{A} {({\hat{G}}_{A}^{T} {\hat{Σ}}_{res} {\hat{G}}_{A})}^{- 1} \\ + 2 e_{i}^{T} {\hat{Σ}}_{R_{Y ∣ 2}}^{- 1} {\hat{G}}_{A} {({\hat{G}}_{A}^{T} {\hat{Σ}}_{R_{Y ∣ 2}}^{- 1} {\hat{G}}_{A})}^{- 1}\} \\ = & O_{p} (1) . \end{matrix}

(14)

On the other side, let v be the element in

a_{i}

that has the largest absolute value, then

| v | / ∥ a_{i} ∥_{2} > \sqrt{u_{1}}

where

| \cdot |

denotes the absolute value. Because we have

n^{1 / 2} ξ_{min, n} \to \infty

, there is, at the lowest, one element in

n^{1 / 2} λ ω_{i} a_{i}^{T} / {∥ a_{i} ∥}_{2}

that tends to infinity. With (13), this is a contradiction of (14). Hence, for

i = q + 1 - u_{1}, \dots, r - u_{1}

, we have

pr ({\hat{a}}_{i} = 0) \to 1

. □

Next, we define the oracle partial envelope estimator and study its properties. Under the partial envelope model, the inactive response involves information on

β_{1, A}

through its covariance with the active response. Then, the oracle partial envelope model is defined as

(\begin{matrix} Y_{A} \\ Y_{I} \end{matrix}) = α + Γ η X_{1} + (\begin{matrix} β_{2, A} \\ 0 \end{matrix}) X_{2} + ε, Σ = Γ Ω Γ^{T} + Γ_{0} Ω_{0} Γ_{0}^{T}, Γ = (\begin{matrix} Γ_{A} \\ 0 \end{matrix}) .

(15)

The oracle partial envelope model (15) is analogous to the sparse partial envelope models (5) and (8), and the difference between them is that, in (15), we know q and which rows in

Γ

and

β_{2}

are formed from only zeros. A subscript ‘O’ is attached if an estimator is the oracle partial envelope estimator. Let

{\hat{Σ}}_{{(R_{Y ∣ 2})}_{A} ∣ R_{1 ∣ 2}} \in R^{q \times q}

be the sample covariance matrix of the residuals from the regression of

{(R_{Y ∣ 2})}_{A}

on

R_{1 ∣ 2}

, and let

{({\hat{Σ}}_{R_{Y ∣ 2}}^{- 1})}_{A} \in R^{q \times q}

be the

q \times q

upper left block of

{\hat{Σ}}_{R_{Y ∣ 2}}^{- 1}

. Let

{\tilde{Ω}}_{0} = {\tilde{Γ}}_{0}^{T} Σ {\tilde{Γ}}_{0}

. Under the basis

{\tilde{Γ}}_{0}

, we denote the coordinates

Ω_{0}

as

{\tilde{Ω}}_{0}

, where

{\tilde{Ω}}_{0} = (\begin{matrix} {\tilde{Ω}}_{0, A} & {\tilde{Ω}}_{0, A I} \\ {\tilde{Ω}}_{0, A I}^{T} & {\tilde{Ω}}_{0, I} \end{matrix}), {\tilde{Ω}}_{0, A} \in R^{(q - u_{1}) \times (q - u_{1})}, {\tilde{Ω}}_{0, I} \in R^{(r - q) \times (r - q)} .

Proposition 1 provides the maximum likelihood estimator

{\hat{β}}_{1, A, O}

and its asymptotic distribution. Let

{\tilde{Ω}}_{0, A ∣ I} = {\tilde{Ω}}_{0, A} - {\tilde{Ω}}_{0, A I} {\tilde{Ω}}_{0, I}^{- 1} {\tilde{Ω}}_{0, I A}

. The symbol ‘

\overset{d}{⟶}

’ denotes a convergence in the distribution.

Proposition 1.

Suppose that the oracle partial envelope model (15) holds and the errors are normally distributed. Then, the maximum likelihood estimator of

β_{1, A}

under the oracle partial model is

{\hat{β}}_{1, A, O} = P_{{\hat{Γ}}_{A, O}} {\hat{β}}_{1, A, ols}

, where

span ({\hat{Γ}}_{A, O}) = \underset{span (G) \in G (q, u_{1})}{arg min} log | G^{T} {\hat{Σ}}_{{(R_{Y ∣ 2})}_{A} ∣ R_{1 ∣ 2}} G | + log | G^{T} {({\hat{Σ}}_{R_{Y ∣ 2}}^{- 1})}_{A} G | .

Furthermore,

\sqrt{n} \{vec ({\hat{β}}_{1, A, O}) - vec (β_{1, A})\} \overset{d}{⟶} N (0, V_{O}),

where

V_{O} = Σ_{R_{1 ∣ 2}}^{- 1} \otimes Γ_{A} Ω Γ_{A}^{T} + (η^{T} \otimes Γ_{A, 0}) T^{- 1} (η \otimes Γ_{A, 0}^{T})

, and

T = η Σ_{R_{1 ∣ 2}} η^{T} \otimes {\tilde{Ω}}_{0, A ∣ I}^{- 1} + Ω \otimes {\tilde{Ω}}_{0, A ∣ I}^{- 1} + Ω^{- 1} \otimes {\tilde{Ω}}_{0, A} - 2 I_{u_{1}} \otimes I_{q - u_{1}}

.

By means of Proposition 1, we can find that

{(R_{Y ∣ 2})}_{I}

occurs in the objective function for

span ({\hat{Γ}}_{A, O})

and hence influences

{\hat{β}}_{1, A, O}

. The active partial envelope model that involves only the active partial responses is defined as follows:

\begin{matrix} Y_{A} = α_{A} + Γ_{A} η X_{1} + β_{2, A} X_{2} + ε_{A}, Σ_{A} = Γ_{A} Ω Γ_{A}^{T} + Γ_{A, 0} {\tilde{Ω}}_{0, A} Γ_{A, 0}^{T} . \end{matrix}

(16)

Proposition 2.

Suppose that the conditions in Proposition 1 hold. Then, the maximum likelihood estimator of

β_{1, A}

under the active partial envelope model is

{\hat{β}}_{1, A, 2} = P_{{\hat{Γ}}_{A, 2}} {\hat{β}}_{1, A, ols}

, where

span ({\hat{Γ}}_{A, 2}) = \underset{span (G) \in G (q, u_{1})}{arg min} log | G^{T} {\hat{Σ}}_{{(R_{Y ∣ 2})}_{A} ∣ R_{1 ∣ 2}} G | + log | G^{T} {\hat{Σ}}_{{(R_{Y ∣ 2})}_{A}}^{- 1} G | .

Furthermore,

\sqrt{n} \{vec ({\hat{β}}_{1, A, 2}) - vec (β_{1, A})\} \overset{d}{⟶} N (0, V_{3}),

where

V_{3} = Σ_{R_{1 ∣ 2}}^{- 1} \otimes Γ_{A} Ω Γ_{A}^{T} + (η^{T} \otimes Γ_{A, 0}) T_{2}^{- 1} (η \otimes Γ_{A, 0}^{T})

, and

T_{2} = η Σ_{R_{1 ∣ 2}} η^{T} \otimes {\tilde{Ω}}_{0, A}^{- 1} + Ω \otimes {\tilde{Ω}}_{0, A}^{- 1} + Ω^{- 1} \otimes {\tilde{Ω}}_{0, A} - 2 I_{u_{1}} \otimes I_{q - u_{1}}

.

From Proposition 2, we see that

{\tilde{Ω}}_{0, A ∣ I}^{- 1} ⩾ {\tilde{Ω}}_{0, A}^{- 1}, T_{2}^{- 1} ⩾ T^{- 1}

, so

V_{3} \geq V_{O}

, and the oracle partial envelope model (15) is more efficient than the active partial envelope model (16) in estimating

β_{1, A}

. Hence, containing

Y_{I}

also improves efficiency in the oracle partial envelope model.

Proof of Proposition 1 and Proposition 2.

We prove Proposition 2 based on the standard theory of the partial envelope model in Su and Cook []. Next, we prove Proposition 1. In this proof, we do not employ the normality of the errors but just require that the errors have finite fourth moments. The derivation of the maximum likelihood estimator of

β_{1, A}

is analogous to the derivation of the maximum likelihood estimator of

β_{1}

under the partial envelope model in Su and Cook []. We take advantage of Proposition 4.1 in Shapiro [] to obtain the asymptotic variance. First, we match our notations with Shapiro’s by checking the assumptions in Proposition 4.1 in Shapiro []. Similar to those in the proof of Theorem 2 in Su and Cook [], we can verify that when the errors have finite fourth moments, x is asymptotically normally distributed. We use Shapiro’s

ξ

to denote our

{\{vec {(β_{1, A})}^{T}, vech {(Σ)}^{T}\}}^{T}

. Let l be the log-likelihood function in (14) and let

l_{\max}

be its maximum value. The minimum discrepancy function is defined as

f_{MDF} = l_{\max} - l

. Since

f_{MDF}

is acquired from the normal likelihood function, it satisfies the four conditions in Section 3 of Shapiro []. We use Shapiro’s

θ

to denote our

{\{vec {(η)}^{T}, vec {(Γ_{A})}^{T}, vech {(Ω)}^{T}, vech {(Ω_{0})}^{T}\}}^{T}

. Hence, the function g connects

ξ

and

θ

, and

ξ = g (θ)

is twice differentiable. All the conditions in Proposition 1 are satisfied. Let

{\hat{Σ}}_{O}

be the estimator of

Σ

under the oracle partial envelope model (15), then

n^{1 / 2} [{\{vec {({\hat{β}}_{1, A, O})}^{T}, vech {({\hat{Σ}}_{O})}^{T}\}}^{T} - {\{vec {(β_{1, A})}^{T}, vech {(Σ)}^{T}\}}^{T}]

is asymptotically normally distributed with zero mean and some covariance matrix.

We employ the normal errors to provide closed-form expressions for the asymptotic variance of

vec ({\hat{β}}_{1, A, O})

. Proposition 2 demonstrates that the asymptotic variance has the form

H {(H^{T} J H)}^{†} H^{T}

, where ‘†’ denotes Moore–Penrose inverse, J is the Fisher information, and H is the Jacobian matrix

H = (\begin{matrix} I_{p_{1}} \otimes Γ_{A} & η^{T} \otimes I_{q} & 0 & 0 \\ 0 & 2 C_{r} (I_{r} \otimes Γ Ω - Γ_{0} Ω_{0} Γ_{0}^{T} \otimes Γ) L & C_{r} (Γ \otimes Γ) E_{u_{1}} & C_{r} (Γ_{0} \otimes Γ_{0}) E_{r - u_{1}} \end{matrix}),

where

L = {(K_{q u_{1}}^{T}, 0)}^{T} \in R^{r u_{1} \times q u_{1}}

, and

K_{q u_{1}} \in R^{q u_{1} \times q u_{1}}

is a commutation matrix (Magnus and Neudecker []). After some algebra, we can obtain the closed form for the asymptotic variance of

vec ({\hat{β}}_{1, A, O})

:

\begin{matrix} n^{1 / 2} \{vec ({\hat{β}}_{1, A, O}) - vec (β_{1, A})\} \to N (0, V_{O}) \end{matrix}

in distribution, where

V_{O} = Σ_{R_{1 ∣ 2}}^{- 1} \otimes Γ_{A} Ω Γ_{A}^{T} + (η^{T} \otimes Γ_{A, 0}) T (η \otimes Γ_{A, 0}^{T})

, and

T = η Σ_{R_{1 ∣ 2}} η^{T} \otimes {\tilde{Ω}}_{0, A ∣ I}^{- 1} + Ω \otimes {\tilde{Ω}}_{0, A ∣ I}^{- 1} + Ω^{- 1} \otimes {\tilde{Ω}}_{0, A} - 2 I_{u_{1}} \otimes I_{q - u_{1}}

. Notice that we ignore

α

,

Σ_{X}

and

β_{2}

in the J and H matrices. This does not influence the results since they are not contained in the parameterization of

β_{1}

and

Σ

, and their maximum likelihood estimates are asymptotically independent of the estimates of

β_{1}

and

Σ

. □

Next, we continue to discuss the theoretical properties of the sparse partial envelope estimator.

Theorem 3.

Suppose that the conditions in Theorem 2 hold. Then, as

n \to \infty

,

\sqrt{n} \{vec ({\hat{β}}_{1, A}) - vec (β_{1, A})\}

is asymptotically normally distributed with mean zero and asymptotic variance is equal to that of

{\hat{β}}_{1, A, O}

. Further, assume that the errors are normally distributed, then we have a closed form for the asymptotic variance V:

V = Σ_{R_{1 ∣ 2}}^{- 1} \otimes Γ_{A} Ω Γ_{A}^{T} + (η^{T} \otimes Γ_{A, 0}) T^{- 1} (η \otimes Γ_{A, 0}^{T})

, where

T = η Σ_{R_{1 ∣ 2}} η^{T} \otimes {\tilde{Ω}}_{0, A ∣ I}^{- 1} + Ω \otimes {\tilde{Ω}}_{0, A ∣ I}^{- 1} + Ω^{- 1} \otimes {\tilde{Ω}}_{0, A} - 2 I_{u_{1}} \otimes I_{q - u_{1}}

.

From Theorem 3, we can see that the sparse partial envelope estimator is asymptotically normal and has an asymptotic distribution. Together with Theorem 2, it shows that the sparse partial envelope estimator enjoys the oracle property: the sparse partial envelope model selects the inactive responses with the probability tending to 1 and estimates the coefficients for the active responses with the same efficiency as does the oracle partial envelope model.

Proof of Theorem 3.

Let

{\hat{A}}_{A}

denote the nonzero rows in the sparse partial envelope estimator

\hat{A}

, and

{\hat{A}}_{O}

denote the nonzero rows in the oracle partial envelope estimator. Suppose we can prove

{\hat{A}}_{A} = {\hat{A}}_{O} + O_{p} (a_{n})

for a sequence

a_{n} = o (n^{- 1 / 2})

, then

P_{\hat{Γ}} = P_{{\hat{Γ}}_{O}} + O_{p} (a_{n})

with

P_{Γ} = G_{A} {(G_{A}^{T} G_{A})}^{- 1} G_{A}^{T}

. Hence,

{\hat{β}}_{1} - {\hat{β}}_{1, O} = (P_{\hat{Γ}} - P_{{\hat{Γ}}_{O}}) {\hat{β}}_{1, ols} = (P_{\hat{Γ}} - P_{{\hat{Γ}}_{O}}) ({\hat{β}}_{1, ols} - β_{1}) + (P_{\hat{Γ}} - P_{{\hat{Γ}}_{O}}) β_{1} = O_{p} (a_{n}) o_{p} (1) + O_{p} (a_{n}) = O_{p} (a_{n})

. So

n^{1 / 2} ({\hat{β}}_{1} - {\hat{β}}_{1, O}) \to 0

in probability. By Slutsky’s theorem,

n^{1 / 2} ({\hat{β}}_{1} - β_{1})

has the identical asymptotic distribution to

n^{1 / 2} ({\hat{β}}_{1, O} - β_{1})

. If we can prove

{\hat{A}}_{A} = {\hat{A}}_{O} + O_{p} (a_{n})

for

a_{n} = o (n^{- 1 / 2})

, therefore, the conclusion of Theorem 3 follows. Since

n^{1 / 2} ξ_{\max, n} \to 0

,

ξ_{\max, n} = o (n^{- 1 / 2})

. Concerning the selection of

a_{n}

, we can take

a_{n} = {(n^{- 1 / 2} ξ_{max, n})}^{1 / 2}

.

We set B to be a

(q - u_{1}) \times u_{1}

matrix, and

G_{B} = (\begin{matrix} I_{u_{1}} \\ B \end{matrix}) \in R^{q \times u_{1}} .

The objective function to estimate B is

\begin{matrix} f_{obj, A} (B) = & - 2 log | G_{B}^{T} G_{B} | + log | G_{B}^{T} {\hat{Σ}}_{{(R_{Y ∣ 2})}_{A} ∣ R_{1 ∣ 2}} G_{B} | \\ + log | G_{B}^{T} {({\hat{Σ}}_{R_{Y ∣ 2}}^{- 1})}_{A} G_{B} | + \sum_{i = 1}^{q - u_{1}} λ ω_{i} {∥ b_{i} ∥}_{2}, \end{matrix}

where

b_{i}

is the ith row of B. Since the sparse partial envelope model enjoys the selection consistency,

{\hat{A}}_{A} = arg {min}_{B \in R^{(q - u_{1}) \times u_{1}}} f_{obj, A} (B)

. For the sake of proving

{\hat{A}}_{A} = {\hat{A}}_{O} + O_{p} (a_{n})

, it is sufficient to demonstrate that for any small

ε > 0

, there exists a sufficiently large constant C, such that

lim_{n \to \infty} pr \{inf_{Δ \in R^{(q - u_{1}) \times u_{1}}, {∥ Δ ∥}_{F} = C} f_{obj, A} ({\hat{A}}_{O} + a_{n} Δ) > f_{obj, A} ({\hat{A}}_{O})\} > 1 - ϵ .

(17)

If (17) holds, then we have

{\hat{A}}_{A} = {\hat{A}}_{O} + O_{p} (a_{n})

for

a_{n} = o (n^{- 1 / 2})

. Now, we prove (17). Following the proof of Theorem 1, we expand

f_{obj, A} ({\hat{A}}_{O} + a_{n} Δ)

and calculate

f_{obj, A} ({\hat{A}}_{O} + a_{n} Δ) - f_{obj, A} ({\hat{A}}_{O})

. The objective function

f_{obj, A} (B)

can be partitioned into four parts:

f_{obj, A} (B) \equiv f_{1, A} (B) + f_{2, A} (B) + f_{3, A} (B) + f_{4, A} (B)

. The first directional derivatives of

f_{1, A} (B), f_{2, A} (B)

and

f_{3, A} (B)

at

{\hat{A}}_{O}

are

\begin{matrix} \overset{\to Δ}{{d f}_{1, A}} ({\hat{A}}_{O}) = tr \{\frac{d}{d B} f_{1, A} {(B)}^{T} |_{B = {\hat{A}}_{O}} Δ\}, \\ \overset{\to Δ}{{d f}_{2, A}} ({\hat{A}}_{O}) = tr \{\frac{d}{d B} f_{2, A} {(B)}^{T} |_{B = {\hat{A}}_{O}} Δ\}, \\ \overset{\to Δ}{{d f}_{3, A}} ({\hat{A}}_{O}) = tr \{\frac{d}{d B} f_{3, A} {(B)}^{T} |_{B = {\hat{A}}_{O}} Δ\} . \end{matrix}

Because

{\hat{A}}_{O}

is a minimizer of

f_{1, A} (B) + f_{2, A} (B) + f_{3, A} (B)

,

\begin{matrix} \frac{d}{d B} f_{1, A} {(B) |}_{B = {\hat{A}}_{O}} + \frac{d}{d B} f_{2, A} (B) {|_{B = {\hat{A}}_{O}} + \frac{d}{d B} f_{3, A} (B) |}_{B = {\hat{A}}_{O}} = 0 . \end{matrix}

Then,

\overset{\to Δ}{{d f}_{1, A}} ({\hat{A}}_{O}) + \overset{\to Δ}{{d f}_{2, A}} ({\hat{A}}_{O}) + \overset{\to Δ}{{d f}_{3, A}} ({\hat{A}}_{O}) = 0

.

The computation on the second directional derivatives of

f_{1, A} (B), f_{2, A} (B)

and

f_{3, A} (B)

at

{\hat{A}}_{O}

and the expansion of

f_{4, A} (B)

are parallel to those in Theorem 1. Combining all those terms together, we have

\begin{matrix} f_{obj, A} ({\hat{A}}_{O} + a_{n} Δ) - f_{obj, A} ({\hat{A}}_{O}) \\ \geq & a_{n}^{2} tr \{Ω^{- 1} Γ_{1}^{T} Δ_{* A}^{T} Γ_{A, 0} {\tilde{Ω}}_{0, A} Γ_{A, 0}^{T} Δ_{* A} Γ_{1} \\ + (Ω + η Σ_{R_{1 ∣ 2}} η^{T}) Γ_{1}^{T} Δ_{* A}^{T} Γ_{A, 0} {\tilde{Ω}}_{0, A ∣ I}^{- 1} Γ_{A, 0}^{T} Δ_{* A} Γ_{1} \\ - 2 {(I_{u_{1}} + A_{A}^{T} A_{A})}^{- 1} Δ_{* A}^{T} Γ_{A, 0} Γ_{A, 0}^{T} Δ_{* A}\} \\ - \frac{1}{2} a_{n} (q - u_{1}) ξ_{\max, n} max_{i} (∥ a_{i} ∥_{2}^{- 1} {∥ δ_{i} ∥}_{2}) + o_{p} (a_{n}^{2}), \end{matrix}

where

A_{A} \in R^{(q - u_{1}) \times u_{1}}

includes the nonzero rows in A and

Δ_{* A} = {(0_{u_{1} \times u_{1}}, Δ^{T})}^{T} \in R^{q \times u_{1}}

. According to the definition of

a_{n}

, we have

ξ_{\max, n} = o_{p} (a_{n})

, so the second term is dominated by the first term. If the trace in the first term is positive, then we can establish (17), and we have

\begin{matrix} tr \{Ω^{- 1} Γ_{1}^{T} Δ_{* A}^{T} Γ_{A, 0} {\tilde{Ω}}_{0, A} Γ_{A, 0}^{T} Δ_{* A} Γ_{1} + (Ω + η Σ_{R_{1 ∣ 2}} η^{T}) Γ_{1}^{T} Δ_{* A}^{T} Γ_{A, 0} {\tilde{Ω}}_{0, A ∣ I}^{- 1} Γ_{A, 0}^{T} Δ_{* A} Γ_{1} \\ - 2 {(I_{u_{1}} + A_{A}^{T} A_{A})}^{- 1} Δ_{* A}^{T} Γ_{A, 0} Γ_{A, 0}^{T} Δ_{* A}\} \\ = & vec {(Γ_{A, 0}^{T} Δ_{* A} Γ_{1})}^{T} \{Ω^{- 1} \otimes {\tilde{Ω}}_{0, A} \\ + (Ω + η Σ_{R_{1 ∣ 2}} η^{T}) \otimes {\tilde{Ω}}_{0, A ∣ I}^{- 1} - 2 I_{u_{1}} \otimes I_{q - u_{1}}\} vec (Γ_{A, 0}^{T} Δ_{* A} Γ_{1}) \\ \geq & vec {(Γ_{A, 0}^{T} Δ_{* A} Γ_{1})}^{T} \{Ω^{- 1} \otimes {\tilde{Ω}}_{0, A} \\ + (Ω + η Σ_{R_{1 ∣ 2}} η^{T}) \otimes {\tilde{Ω}}_{0, A}^{- 1} - 2 I_{u_{1}} \otimes I_{q - u_{1}}\} vec (Γ_{A, 0}^{T} Δ_{* A} Γ_{1}) \\ \geq & m_{3} {∥ Γ_{A, 0}^{T} Δ_{* A} Γ_{1}^{T} ∥}_{F} \\ \geq & m_{3} m_{2}^{2} {∥ Δ ∥}_{F}^{2}, \end{matrix}

where

m_{2}

is the smallest eigenvalue of

{(I_{u_{1}} + A_{A}^{T} A_{A})}^{- 1}

, and

m_{3}

is the smallest eigenvalue of

Ω^{- 1} \otimes {\tilde{Ω}}_{0, A} + (Ω + η Σ_{R_{1 ∣ 2}} η^{T}) \otimes {\tilde{Ω}}_{0, A}^{- 1} - 2 I_{u_{1}} \otimes I_{q - u_{1}}

. The derivation of the last inequality is identical to the derivation of a similar inequality at the end of the proof of Theorem 1. □

Now, we discuss the convergence rate and selection consistency of the sparse partial envelope estimator when

r_{n}

tends to infinity with n. We make the same assumptions as Su et al. []:

(A1) There exist positive constants $\bar{l}$ and $\underset{̲}{l}$ such that $γ_{\max} (Σ) ⩽ \bar{l}$ and $γ_{\min} (Σ) ⩾ \underset{̲}{l}$ , where $γ_{\max} (Σ)$ and $γ_{\min} (Σ)$ are the largest and smallest eigenvalues of $Σ$ .
(A2) The samples of $ε$ are independent and identically sampled from a sub-Gaussian distribution, for example, $E \{exp (τ_{1}^{T} ε)\} ⩽ exp (c_{1} τ_{1}^{T} Σ τ_{1})$ for some constant $c_{1} > 0$ and every $τ_{1} \in R^{r_{n}}$ . Samples of X are independent and identically distributed, and $X - μ_{X}$ follows a sub-Gaussian distribution, for example, $E [exp \{τ_{2}^{T} (X - μ_{X})\}] ⩽ exp (c_{2} τ_{2}^{T} Σ_{X} τ_{2})$ for some constant $c_{2} > 0$ and every $τ_{2} \in R^{p}$ . Let $s_{1}$ and $s_{2}$ denote the number of nonzero off-diagonal elements in the lower triangle of $Σ^{- 1}$ and $Σ_{R_{Y ∣ 2}}^{- 1}$ , respectively, and let $s = max {s_{1}, s_{2}}$ . We employ ${∥ \cdot ∥}_{F}$ to denote the Frobenius norm of a matrix.

Theorem 4.

Suppose that the sparse partial envelope models (5) and (8) hold, under (A1) and (A2), if

ξ_{\max, n} = o (\sqrt{(r_{n} + s) log (r_{n}) / n})

, then as

n \to \infty

, there exists a local minimizer

\hat{A}

of (10) such that

∥ \hat{A} {- A ∥}_{F} = O_{p} (\sqrt{(r_{n} + s) log (r_{n}) / n})

, and the sparse partial envelope estimator

{\hat{β}}_{1}

converges at the same rate:

∥ {\hat{β}}_{1} - β_{1} ∥_{F} = O_{p} (\sqrt{(r_{n} + s) log (r_{n}) / n})

.

Theorem 4 shows that the convergence rate of the sparse partial envelope estimator is finite by the convergence rate of

{\hat{Σ}}_{R_{Y ∣ 2}, spice}^{- 1}

and

{\hat{Σ}}_{res, spice}^{- 1}

. If there is a different inverse covariance matrix estimator, then we can improve the convergence rate of the sparse partial envelope estimator to a faster rate. Meanwhile, we need (A1) and (A2) for the consistency of

{\hat{Σ}}_{R_{Y ∣ 2}, spice}^{- 1}

and

{\hat{Σ}}_{res, spice}^{- 1}

.

Proof of Theorem 4.

We first show that

\begin{matrix} ∥ {\hat{Σ}}_{res, spice}^{- 1} - Σ^{- 1} ∥_{F} & = O_{p} [{\{(r_{n} + s_{1}) log (r_{n}) / n\}}^{1 / 2}], \end{matrix}

(18)

\begin{matrix} ∥ {\hat{Σ}}_{R_{Y ∣ 2}, spice}^{- 1} - Σ_{R_{Y ∣ 2}}^{- 1} ∥_{F} & = O_{p} [{\{(r_{n} + s_{2}) log (r_{n}) / n\}}^{1 / 2}], \end{matrix}

(19)

where

{∥ \cdot ∥}_{\max}

denotes the max norm of a matrix, which is the maximum of the absolute values of all elements in the matrix. In order to build (18) and (19), it is sufficient to demonstrate that there exist positive constants

C_{R_{Y ∣ 2}} > 0

and

C_{res} > 0

such that

\begin{matrix} max_{i, j} | {\hat{Σ}}_{R_{Y ∣ 2}, i j} - Σ_{R_{Y ∣ 2}, i j} | & \leq C_{R_{Y ∣ 2}} {\{log (r_{n}) / n\}}^{1 / 2}, \\ max_{i, j} | {\hat{Σ}}_{res, i j} - Σ_{i j} | & \leq C_{res} {\{log (r_{n}) / n\}}^{1 / 2} . \end{matrix}

(20)

Let Q be an m-dimensional random vector with mean

μ_{Q}

and covariance matrix

Σ_{Q}

, and

Q - μ_{Q}

follow a sub-gaussian distribution. Assuming that

Q_{1}, \dots, Q_{n}

are n independent and identically distributed samples of Q, then

\bar{Q} = \sum_{i = 1}^{n} Q_{i}

and

\begin{matrix} {\hat{Σ}}_{Q} = & \frac{1}{n} \sum_{k = 1}^{n} (Q_{k} - \bar{Q}) {(Q_{k} - \bar{Q})}^{T} \\ = & \frac{1}{n} \sum_{k = 1}^{n} (Q_{k} - μ_{Q}) {(Q_{k} - μ_{Q})}^{T} - (\bar{Q} - μ_{Q}) {(\bar{Q} - μ_{Q})}^{T} . \end{matrix}

From Lemma 1 in Ravikumar et al. [], there exist positive constants

C_{i}

such that

\begin{matrix} pr (| {\hat{Σ}}_{Q, i j} - Σ_{Q, i j} | > δ) \leq & pr [|{\{\frac{1}{n} \sum_{k = 1}^{n} (Q_{k} - μ_{Q}) {(Q_{k} - μ_{Q})}^{T}\}}_{i j} - Σ_{Q, i j}| > \frac{δ}{2}] \\ + pr [|{\{(\bar{Q} - μ_{Q}) {(\bar{Q} - μ_{Q})}^{T}\}}_{i j}| > \frac{δ}{2}] \\ \leq & C_{1} exp (- C_{2} n δ^{2}) + C_{3} exp (- C_{4} n δ^{2}), \end{matrix}

where

δ \in (0, b_{1})

, and

| \cdot |

denotes the absolute value. Let

δ = C_{5} {log (m) / n}^{1 / 2}

for some

C_{5} > 0

. By the union sum inequality, there exists a positive constant

C_{6}

such that

\begin{matrix} max_{i, j} | {\hat{Σ}}_{Q, i j} - Σ_{Q, i j} | \leq C_{6} {log (m) / n}^{1 / 2}, \end{matrix}

with probability tending to 1 as

n \to \infty

.

Let

Q = {(R_{1 ∣ 2}^{T}, ε^{T})}^{T} \in R^{p_{1} + r_{n}}

with mean

{(μ_{R_{1 ∣ 2}}^{T}, 0^{T})}^{T}

, which has a block diagonal covariance matrix with diagonal blocks being

Σ_{R_{1 ∣ 2}}

and

Σ

. Then, there exists the constant

C_{0}

such that

{max}_{i, j} ∣ {\hat{Σ}}_{Q, i j} - Σ_{Q, i j} ∣ \leq C_{0} {\{log (r_{n} + p_{1}) / n\}}^{1 / 2}

. For fixed

p_{1}

, there exists the constant

C_{0}^{*}

such that

{max}_{i, j} | {\hat{Σ}}_{Q, i j} - Σ_{Q, i j} | \leq C_{0}^{*} {\{log (r_{n}) / n\}}^{1 / 2}

. So, we have

\begin{matrix} max_{i, j} | {\hat{Σ}}_{R_{1 ∣ 2}, i j} - Σ_{R_{1 ∣ 2}, i j} | & \leq C_{0}^{*} {\{log (r_{n}) / n\}}^{1 / 2}, \\ max_{i, j} | {\hat{Σ}}_{ε, i j} - Σ_{i j} | & \leq C_{0}^{*} {\{log (r_{n}) / n\}}^{1 / 2}, \\ max_{i, j} | {\hat{Σ}}_{ε R_{1 ∣ 2}, i j} | & \leq C_{0}^{*} {\{log (r_{n}) / n\}}^{1 / 2} . \end{matrix}

Because

{\hat{Σ}}_{R_{Y ∣ 2}} = β_{1} {\hat{Σ}}_{R_{1 ∣ 2}} β_{1}^{T} + {\hat{Σ}}_{ε} + β_{1} {\hat{Σ}}_{R_{1 ∣ 2} ε} + {\hat{Σ}}_{ε R_{1 ∣ 2}} β_{1}^{T}

, for some

C_{R_{Y ∣ 2}} > 0

, we have

\begin{matrix} max_{i, j} | {\hat{Σ}}_{R_{Y ∣ 2}, i j} - Σ_{R_{Y ∣ 2}, i j} | \leq C_{R_{Y ∣ 2}} {\{log (r_{n}) / n\}}^{1 / 2}, \end{matrix}

Since

{\hat{Σ}}_{res} = {\hat{Σ}}_{ε} - {\hat{Σ}}_{ε R_{1 ∣ 2}} {\hat{Σ}}_{R_{1 ∣ 2}}^{- 1} {\hat{Σ}}_{R_{1 ∣ 2} ε}

, we have

\begin{matrix} {\hat{Σ}}_{res} - Σ = & ({\hat{Σ}}_{ε R_{1 ∣ 2}} - Σ_{ε R_{1 ∣ 2}}) Σ_{R_{1 ∣ 2}}^{- 1} Σ_{R_{1 ∣ 2} ε} + Σ_{ε R_{1 ∣ 2}} ({\hat{Σ}}_{R_{1 ∣ 2}}^{- 1} - Σ_{R_{1 ∣ 2}}^{- 1}) Σ_{R_{1 ∣ 2} ε} \\ + Σ_{ε R_{1 ∣ 2}} Σ_{R_{1 ∣ 2}}^{- 1} ({\hat{Σ}}_{R_{1 ∣ 2} ε} - Σ_{R_{1 ∣ 2} ε}) + ({\hat{Σ}}_{ε R_{1 ∣ 2}} - Σ_{ε R_{1 ∣ 2}}) ({\hat{Σ}}_{R_{1 ∣ 2}}^{- 1} - Σ_{R_{1 ∣ 2}}^{- 1}) Σ_{R_{1 ∣ 2} ε} \\ + Σ_{ε R_{1 ∣ 2}} ({\hat{Σ}}_{R_{1 ∣ 2}}^{- 1} - Σ_{R_{1 ∣ 2}}^{- 1}) ({\hat{Σ}}_{R_{1 ∣ 2} ε} - Σ_{R_{1 ∣ 2} ε}) \\ + ({\hat{Σ}}_{ε R_{1 ∣ 2}} - Σ_{ε R_{1 ∣ 2}}) Σ_{R_{1 ∣ 2}}^{- 1} ({\hat{Σ}}_{R_{1 ∣ 2} ε} - Σ_{R_{1 ∣ 2} ε}) \\ + ({\hat{Σ}}_{ε R_{1 ∣ 2}} - Σ_{ε R_{1 ∣ 2}}) ({\hat{Σ}}_{R_{1 ∣ 2}}^{- 1} - Σ_{R_{1 ∣ 2}}^{- 1}) ({\hat{Σ}}_{R_{1 ∣ 2} ε} - Σ_{R_{1 ∣ 2} ε}) + {\hat{Σ}}_{ε} - Σ . \end{matrix}

If

M \in R^{d_{1} \times d_{2}}, N \in R^{d_{2} \times d_{3}}

, then

{∥ M N ∥}_{\max} \leq d_{2} {∥ M ∥}_{\max} {∥ N ∥}_{\max}

, where

{∥ \cdot ∥}_{\max}

is the matrix max norm. Exploiting this fact, for some

C_{res} > 0

, we have

\begin{matrix} max_{i, j} | {\hat{Σ}}_{res, i j} - Σ_{i j} | \leq C_{res} {\{log (r_{n}) / n\}}^{1 / 2}, \end{matrix}

and hence, (18) and (19) hold.

Let

a_{n} = {\{(r_{n} + s) log (r_{n}) / n\}}^{1 / 2}

. The objective function in (10) can be denoted as

f_{obj, 2}

. Theorem 4 holds if for any small

ε > 0

, there exists a sufficiently large constant C such that

lim_{n \to \infty} pr \{inf_{Δ \in R^{(q - u_{1}) \times u_{1}}, {∥ Δ ∥}_{F} = C} f_{obj, 2} (A + a_{n} Δ) > f_{obj, 2} (A)\} > 1 - ϵ .

(21)

Let

Δ_{*} = {(0_{u_{1} \times u_{1}}, Δ^{T})}^{T} \in R^{q \times u_{1}}

. Following the computations as in the proof of Theorem 1, we calculate the Taylor expansion of

f_{obj, 2} (A + a_{n} Δ)

at A and obtain

\begin{matrix} f_{obj, 2} (A + a_{n} Δ) - f_{obj, 2} (A) \\ \geq & 2 a_{n} tr [{(G_{A}^{T} Σ G_{A})}^{- 1} G_{A}^{T} ({\hat{Σ}}_{res, spice} - Σ) Δ_{*} \\ + \{{(G_{A}^{T} {\hat{Σ}}_{res, spice} G_{A})}^{- 1} - {(G_{A}^{T} Σ G_{A})}^{- 1}\} G_{A}^{T} Σ Δ_{*} \\ + \{{(G_{A}^{T} {\hat{Σ}}_{res, spice} G_{A})}^{- 1} - {(G_{A}^{T} Σ G_{A})}^{- 1}\} G_{A}^{T} ({\hat{Σ}}_{res, spice} - Σ) Δ_{*} \\ + {(G_{A}^{T} Σ_{R_{Y ∣ 2}, spice}^{- 1} G_{A})}^{- 1} G_{A}^{T} ({\hat{Σ}}_{R_{Y ∣ 2}, spice}^{- 1} - Σ_{R_{Y ∣ 2}}^{- 1}) Δ_{*} \\ + \{{(G_{A}^{T} {\hat{Σ}}_{R_{Y ∣ 2}, spice}^{- 1} G_{A})}^{- 1} - {(G_{A}^{T} Σ_{R_{Y ∣ 2}}^{- 1} G_{A})}^{- 1}\} G_{A}^{T} Σ_{R_{Y ∣ 2}}^{- 1} Δ_{*} \\ + \{{(G_{A}^{T} {\hat{Σ}}_{R_{Y ∣ 2}, spice}^{- 1} G_{A})}^{- 1} - {(G_{A}^{T} Σ_{R_{Y ∣ 2}}^{- 1} G_{A})}^{- 1}\} G_{A}^{T} ({\hat{Σ}}_{R_{Y ∣ 2}, spice}^{- 1} - Σ_{R_{Y ∣ 2}}^{- 1}) Δ_{*}] \\ + a_{n}^{2} tr \{Ω^{- 1} Γ_{1}^{T} Δ_{*}^{T} Γ_{0} Ω_{0} Γ_{0}^{T} Δ_{*} Γ_{1} + (Ω + η Σ_{R_{1 ∣ 2}} η^{T}) Γ_{1}^{T} Δ_{*}^{T} Γ_{0} Ω_{0}^{- 1} Γ_{0}^{T} Δ_{*} Γ_{1} \\ - 2 {(I_{u_{1}} + A^{T} A)}^{- 1} Δ_{*}^{T} Γ_{0} Γ_{0}^{T} Δ_{*}\} - \frac{1}{2} a_{n} (q - u_{1}) ξ_{\max, n} max_{i} (∥ a_{i} ∥_{2}^{- 1} {∥ δ_{i} ∥}_{2}) + o_{p} (a_{n}^{2}) . \end{matrix}

Notice that

\begin{matrix} {\hat{Σ}}_{res, spice} - Σ = - Σ ({\hat{Σ}}_{res, spice}^{- 1} - Σ^{- 1}) Σ + o_{p} ({\hat{Σ}}_{res, spice}^{- 1} - Σ^{- 1}) . \end{matrix}

Let

∥ \cdot ∥

be the spectral norm of a matrix. For two matrices

M \in R^{d_{1} \times d_{2}}

and

N \in R^{d_{2} \times d_{3}}

,

{∥ M N ∥}_{F} \leq {∥ M ∥ ∥ N ∥}_{F}

. So,

\begin{matrix} {∥Σ ({\hat{Σ}}_{res, spice}^{- 1} - Σ^{- 1}) Σ∥}_{F} \leq {∥ Σ ∥}^{2} ∥ {\hat{Σ}}_{res, spice}^{- 1} - Σ^{- 1} ∥_{F} \leq {\bar{k}}^{2} {∥ {\hat{Σ}}_{res, spice}^{- 1} - Σ^{- 1} ∥}_{F}, \end{matrix}

and

∥ {\hat{Σ}}_{res, spice} {- Σ ∥}_{F} = O_{p} [{\{(r_{n} + s) log (r_{n}) / n\}}^{1 / 2}]

. Then,

\begin{matrix} tr [{(G_{A}^{T} Σ G_{A})}^{- 1} G_{A}^{T} ({\hat{Σ}}_{res, spice} - Σ) Δ_{*}] \\ \geq & - {\bar{k}}^{2} {∥ Δ ∥}_{F} ∥ {\hat{Σ}}_{res, spice}^{- 1} - Σ^{- 1} ∥_{F} ∥{(G_{A}^{T} Σ G_{A})}^{- 1}∥ {∥ G_{A} ∥}_{F} . \end{matrix}

Now, we have

\begin{matrix} {(G_{A}^{T} {\hat{Σ}}_{res, spice} G_{A})}^{- 1} - {(G_{A}^{T} Σ G_{A})}^{- 1} \\ = & - {(G_{A}^{T} Σ G_{A})}^{- 1} (G_{A}^{T} {\hat{Σ}}_{res, spice} G_{A} - G_{A}^{T} Σ G_{A}) {(G_{A}^{T} Σ G_{A})}^{- 1} \\ + o_{p} (G_{A}^{T} {\hat{Σ}}_{res, spice} G_{A} - G_{A}^{T} Σ G_{A}) \\ = & - {(G_{A}^{T} Σ G_{A})}^{- 1} G_{A}^{T} ({\hat{Σ}}_{res, spice} - Σ) G_{A} {(G_{A}^{T} Σ G_{A})}^{- 1} \\ + o_{p} [{\{(r_{n} + s_{1}) log (r_{n}) / n\}}^{1 / 2}] \\ = & - {(G_{A}^{T} Σ G_{A})}^{- 1} G_{A}^{T} Σ ({\hat{Σ}}_{res, spice}^{- 1} - Σ^{- 1}) Σ G_{A} {(G_{A}^{T} Σ G_{A})}^{- 1} \\ + o_{p} [{\{(r_{n} + s_{1}) log (r_{n}) / n\}}^{1 / 2}] . \end{matrix}

So,

\begin{matrix} tr [\{{(G_{A}^{T} {\hat{Σ}}_{res, spice} G_{A})}^{- 1} - {(G_{A}^{T} Σ G_{A})}^{- 1}\} G_{A}^{T} Σ Δ_{*}] \\ \geq & - u_{1}^{1 / 2} {\bar{k}}^{2} {∥ Δ ∥}_{F} ∥ {\hat{Σ}}_{res, spice}^{- 1} - Σ^{- 1} ∥_{F} ∥{(G_{A}^{T} Σ G_{A})}^{- 1}∥ {∥ G_{A} ∥}_{F} . \end{matrix}

Collecting all these inequalities and results together, we apply these inequalities to the terms in the first four lines of

f_{obj, 2} (A + a_{n} Δ) - f_{obj, 2} (A)

, then

\begin{matrix} f_{obj, 2} (A + a_{n} Δ) - f_{obj, 2} (A) \\ \geq & 2 M_{1} a_{n} {∥ Δ ∥}_{F} ∥ {\hat{Σ}}_{res, spice}^{- 1} - Σ^{- 1} ∥_{F} + 2 M_{2} a_{n} {∥ Δ ∥}_{F} {∥ {\hat{Σ}}_{R_{Y ∣ 2}, spice}^{- 1} - Σ_{R_{Y ∣ 2}}^{- 1} ∥}_{F} \\ + a_{n}^{2} tr \{Ω^{- 1} Γ_{1}^{T} Δ_{*}^{T} Γ_{0} Ω_{0} Γ_{0}^{T} Δ_{*} Γ_{1} + (Ω + η Σ_{R_{1 ∣ 2}} η^{T}) Γ_{1}^{T} Δ_{*}^{T} Γ_{0} Ω_{0}^{- 1} Γ_{0}^{T} Δ_{*} Γ_{1} \\ - 2 {(I_{u_{1}} + A^{T} A)}^{- 1} Δ_{*}^{T} Γ_{0} Γ_{0}^{T} Δ_{*}\} \\ - \frac{1}{2} a_{n} (q - u_{1}) ξ_{\max, n} max_{i = 1, \dots, q - u_{1}} (∥ a_{i} ∥_{2}^{- 1} {∥ δ_{i} ∥}_{2}) + o_{p} (a_{n}^{2}), \end{matrix}

where

M_{1} = - 2 u_{1}^{1 / 2} {\bar{k}}^{2} ∥{(G_{A}^{T} Σ G_{A})}^{- 1}∥ {∥ G_{A} ∥}_{F}

,

M_{2} = - 2 ∥{(G_{A}^{T} Σ_{R_{Y ∣ 2}}^{- 1} G_{A})}^{- 1}∥ {∥ G_{A} ∥}_{F}

and

ξ_{\max, n} = o [{\{(r_{n} + s) log (r_{n}) / n\}}^{1 / 2}] = o_{p} (a_{n})

. Based on the proof at the end of Theorem 1, there exists the positive constant

m_{1}

by Theorem 1 such that

\begin{matrix} tr \{Ω^{- 1} Γ_{1}^{T} Δ_{*}^{T} Γ_{0} Ω_{0} Γ_{0}^{T} Δ_{*} Γ_{1} + (Ω + η Σ_{R_{1 ∣ 2}} η^{T}) Γ_{1}^{T} Δ_{*}^{T} Γ_{0} Ω_{0}^{- 1} Γ_{0}^{T} Δ_{*} Γ_{1} \\ - 2 {(I_{u_{1}} + A^{T} A)}^{- 1} Δ_{*}^{T} Γ_{0} Γ_{0}^{T} Δ_{*}\} \\ \geq & m_{1} {∥ Δ ∥}_{F}^{2}, \end{matrix}

Then, with sufficiently large C, the second order term of

{∥ Δ ∥}_{F}

dominates the first-order term, and

f_{obj, 2} (A + a_{n} Δ) - f_{obj, 2} (A) > 0

with the probability tending to 1. Therefore, (21) holds, and

∥ \hat{A} {- A ∥}_{F} = O_{p} [{\{(r_{n} + s) log (r_{n}) / n\}}^{1 / 2}]

. Since

P_{Γ} = G_{A} {(I_{u_{1}} + A^{T} A)}^{- 1} G_{A}^{T}

is a simple and continuous function of A, then

∥ P_{\hat{Γ}} - P_{Γ} ∥_{F} = O_{p} [{\{(r_{n} + s) log (r_{n}) / n\}}^{1 / 2}]

.

Because

\begin{matrix} {\hat{β}}_{1, ols} - β_{1} = & {\hat{Σ}}_{ε R_{1 ∣ 2}} {\hat{Σ}}_{R_{1 ∣ 2}}^{- 1} - Σ_{ε R_{1 ∣ 2}} Σ_{R_{1 ∣ 2}}^{- 1} \\ = & ({\hat{Σ}}_{ε R_{1 ∣ 2}} - Σ_{ε R_{1 ∣ 2}}) Σ_{R_{1 ∣ 2}}^{- 1} + Σ_{ε R_{1 ∣ 2}} ({\hat{Σ}}_{R_{1 ∣ 2}}^{- 1} - Σ_{R_{1 ∣ 2}}^{- 1}) \\ + o_{p} [{\{(r_{n} + s) log (r_{n}) / n\}}^{1 / 2}], \end{matrix}

there exists the constant

C_{ols}

such that

\begin{matrix} max_{i, j} | {\hat{β}}_{1, ols, i j} - β_{1, i j} | \leq C_{ols} {\{log (r_{n}) / n\}}^{1 / 2} . \end{matrix}

Since

∥ {\hat{β}}_{1, ols} - β_{1} ∥_{F} \leq {(p_{1} r_{n})}^{1 / 2} {∥ {\hat{β}}_{1, ols} - β_{1} ∥}_{\max}

, then

\begin{matrix} ∥ {\hat{β}}_{1} - β_{1} ∥_{F} \leq & {∥(P_{\hat{Γ}} - P_{Γ}) {\hat{β}}_{1, ols}∥}_{F} + {∥P_{Γ} ({\hat{β}}_{1, ols} - β_{1})∥}_{F} \\ \leq & {∥(P_{\hat{Γ}} - P_{Γ}) {\hat{β}}_{1, ols}∥}_{F} + {∥{\hat{β}}_{1, ols} - β_{1}∥}_{F} . \end{matrix}

Hence, the sparse partial envelope estimator

{\hat{β}}_{1}

converges to

β_{1}

with rate

{\{(r_{n} + s) log (r_{n}) / n\}}^{1 / 2}

. □

Theorem 5.

Assume that the conditions in Theorem 4 hold,

\sqrt{(r_{n} + s) log (r_{n}) / n} \to 0

as

n \to \infty

and

\sqrt{(r_{n} + s) log (r_{n}) / n} = o (ξ_{\min, n})

. Then,

pr ({\hat{a}}_{i} \neq 0, i = 1, \dots, q - u_{1}) \to 1

as

n \to \infty

, and

pr ({\hat{a}}_{i} = 0, i = q - u_{1} + 1, \dots, r_{n} - u_{1}) \to 1

as

n \to \infty

.

Theorem 5 establishes the selection consistency of the sparse partial envelope estimator. When

r_{n}

grows with n, the sparse partial envelope estimator correctly identifies active and inactive responses with the probability tending to 1.

Proof of Theorem 5.

We make

\begin{matrix} ρ = min_{i = 1, \dots, q - u_{1}} {∥ a_{i} ∥}_{2} > 0, \end{matrix}

then

ρ

is the smallest norm of the non-sparse rows in A. Because

\begin{matrix} ∥ {\hat{β}}_{1} - β_{1} ∥_{F} = O_{p} [{\{(r_{n} + s) log (r_{n}) / n\}}^{1 / 2}] and {\{(r_{n} + s) log (r_{n}) / n\}}^{1 / 2} \to 0, \end{matrix}

then

∥ {\hat{β}}_{1} - β_{1} ∥_{F} < ρ / 2

with the probability tending to 1. This means

∥ {\hat{a}}_{i} - a_{i} ∥_{2} < ρ / 2

for

i = 1, \dots, r_{n}

. For

i = 1, \dots, q

,

∥ {\hat{a}}_{i} ∥_{2} > {∥ a_{i} ∥}_{2} - ρ / 2 > 0

. Hence, the sparse partial envelope estimator identifies the nonzero rows with the probability tending to 1.

For

a_{i}, i = q - u_{1} + 1, \dots, r_{n} - u_{1}

, assuming

{\hat{a}}_{i} \neq 0

, we take the derivative of

f_{obj, 2}

concerning

a_{i}

and evaluate at

{\hat{a}}_{i}

, and we have

\begin{matrix} - 4 e_{i}^{T} {\hat{G}}_{A} {(I_{u_{1}} + {\hat{A}}^{T} \hat{A})}^{- 1} + 2 e_{i}^{T} {\hat{Σ}}_{res, spice} {\hat{G}}_{A} {({\hat{G}}_{A}^{T} {\hat{Σ}}_{res, spice} {\hat{G}}_{A})}^{- 1} \\ + 2 e_{i}^{T} {\hat{Σ}}_{R_{Y ∣ 2}, spice}^{- 1} {\hat{G}}_{A} {({\hat{G}}_{A}^{T} {\hat{Σ}}_{R_{Y ∣ 2}, spice}^{- 1} {\hat{G}}_{A})}^{- 1} + λ ω_{i} \frac{{\hat{a}}_{i}^{T}}{∥ {\hat{a}}_{i} ∥_{2}} = 0 . \end{matrix}

Since

- 4 e_{i}^{T} G_{A} {(I_{u_{1}} + A^{T} A)}^{- 1} + 2 e_{i}^{T} Σ G_{A} {(G_{A}^{T} Σ G_{A})}^{- 1} + 2 e_{i}^{T} Σ_{R_{Y ∣ 2}}^{- 1} G_{A} {(G_{A}^{T} Σ_{R_{Y ∣ 2}}^{- 1} G_{A})}^{- 1} = 0

, then we have

\begin{matrix} ∥- 4 e_{i}^{T} {\hat{G}}_{A} {(I_{u_{1}} + {\hat{A}}^{T} \hat{A})}^{- 1} + 2 e_{i}^{T} {\hat{Σ}}_{res, spice} {\hat{G}}_{A} {({\hat{G}}_{A}^{T} {\hat{Σ}}_{res, spice} {\hat{G}}_{A})}^{- 1} \\ {+ 2 e_{i}^{T} {\hat{Σ}}_{R_{Y ∣ 2}, spice}^{- 1} {\hat{G}}_{A} {({\hat{G}}_{A}^{T} {\hat{Σ}}_{R_{Y ∣ 2}, spice}^{- 1} {\hat{G}}_{A})}^{- 1}∥}_{F} = O_{p} [{\{(r_{n} + s) log (r_{n}) / n\}}^{1 / 2}] . \end{matrix}

But

\begin{matrix} {∥λ ω_{i} \frac{{\hat{a}}_{i}^{T}}{∥ {\hat{a}}_{i} ∥_{2}}∥}_{F} = λ ω_{i} = ξ_{i} \geq ξ_{\min, n} . \end{matrix}

Because

{\{(r_{n} + s) log (r_{n}) / n\}}^{1 / 2} = o (ξ_{\min, n})

, this is a contradiction. Hencem we have

pr ({\hat{a}}_{i} = 0) \to 1

for

i = q - u_{1} + 1, \dots, r_{n} - u_{1}

as

n \to \infty

. □

5. Simulation Study

Now, in this part, we report results on the numerical performance of the sparse partial envelope estimator on parameter estimation and response variable selection. We discussed it through the large-sample setting and the high-dimensional setting. In the first situation, we set

r = 20

,

q = 5

,

p = 3

,

p_{1} = 2

,

u = 3

and

u_{1} = 2

. The matrix

(Γ_{A}, Γ_{A, 0})

was obtained by orthogonalizing a

q \times q

matrix of independent uniform

(0, 1)

variates. Then, we employed the structure in (8) to construct

Γ

,

Γ_{0}

and

β_{2}

. The elements in

η

were independent

N (0, 4)

random variables. The errors were simulated independently from

N (0, Σ)

distributions, and the error covariance matrix followed the structure

Σ = Γ Ω Γ^{T} + Γ_{0} Ω_{0} Γ_{0}^{T}

, where

Ω = I_{u_{1}}

and

Ω_{0}

was a block diagonal matrix with the upper left block being

25 I_{q - u_{1}}

and the lower right block being

0.09 I_{r - q}

. The coefficient

β_{2, A}

was a matrix of independent normal variates. The predictors

X_{1}

were generated from a multivariate normal distribution with mean 0 and covariance matrix

4 I_{p_{1}}

, and the elements in

X_{2}

were generated from independent

N (0, 1)

random variables. The sample size were 50, 150, 350, 750, 1350, and 2150, respectively, and we generated 200 replications for each sample size. With each sample size, the standard deviation of each element in

{\hat{β}}_{1}

over the replicates was computed, which we called the actual standard deviations of the elements in

{\hat{β}}_{1}

. The bootstrap standard errors were acquired by calculating the standard deviations for 200 bootstrap samples as a way to estimate the actual estimation standard deviations. Because of the excessively long computation time, the results of the sparse envelope model were based on 20 replications.

For each replication, we fitted the standard model (1), the oracle partial envelope model (15), the active partial envelope model (16), the sparse partial envelope models (5) and (8). Then, we acquired their estimators of

{\hat{β}}_{1}

and computed the estimation standard deviation for each element in

{\hat{β}}_{1}

. The results for a randomly selected element in

{\hat{β}}_{1}

are summarized in Figure 1. For better visibility, we only display the asymptotic standard deviation of the standard model. From the left panel of Figure 1, we can see that the actual standard deviations of the three models from large to small are the active partial envelope model, the oracle partial envelope model and the sparse partial envelope model, respectively. From the right panel of Figure 1, we can see that the asymptotic standard deviations of the four models from large to small are the standard model, the active partial envelope model, the oracle partial envelope model and the sparse partial envelope model separately. To sum up, we can observe that the sparse partial envelope estimator is more efficient than the standard estimator, the oracle partial envelope estimator and the active partial envelope estimator. The ratio of the asymptotic standard deviation of the standard estimator to that of the sparse partial envelope estimator is 11.68. The ratio of the asymptotic standard deviation of the oracle partial envelope estimator to that of the sparse partial envelope estimator is 1.27. The ratio of the asymptotic standard deviation of the active partial envelope estimator to that of the sparse partial envelope estimator is 6.22. The difference between the sparse partial envelope estimator and the oracle partial envelope estimator becomes quite small as the sample size n increases, which confirms the oracle property stated in Theorem 3.

Figure 1. Comparison of the standard deviations for four model estimators. The − line marks the asymptotic standard deviation of the standard model. The lines ‘

- △

’ and

- -

with + mark the actual and asymptotic standard deviations of the oracle partial envelope model. The lines ‘

- □

’ and

- -

with × mark the actual and asymptotic standard deviations of the active partial envelope model. The lines ‘

- ◯

’ and

- \cdot

with * mark the actual and asymptotic standard deviations of the sparse partial envelope model.

We also studied the variable selection performance of the sparse partial envelope estimator on true positive rate (TPR) and true negative rate (TNR) over 200 repeated samples. The true positive rate is calculated by

π_{1} / q

, where

π_{1}

is the number of active responses correctly chosen, and the true negative rate is calculated by

π_{2} / (r - q)

, where

π_{2}

is the number of inactive responses correctly chosen. Table 1 reports the variable selection results in the first situation under the large-sample setting. Compared to the sparse envelope estimator, the sparse partial envelope estimator has a better selection performance in this case. The proposed sparse partial envelope estimation procedure can correctly identify all the active sets, and the true negative rate tends to 1 as n increases, which confirms the selection consistency stated in Theorem 2.

Table 1. Comparison of selection performances of the sparse partial envelope estimator and the sparse envelope estimator in the first situation.

In the second situation, except for letting

u = q = 5

, the other parameters were the same as those in the first situation. When

u = q

, there is no immaterial information in the active responses for the sparse envelope model, and

Γ_{A} = I_{q}

, but as long as

u_{1} < u

, there is still immaterial information in the active responses for the sparse partial envelope model. In this sense, the sparse partial envelope model is more flexible than the sparse envelope model. Figure 2 plots the standard deviations of a chosen element in

{\hat{β}}_{1}

and

\hat{β}

when

u = q

. From the left panel of Figure 2, we can see that the actual standard deviations of the three models from large to small are the active partial envelope model, the oracle partial envelope model and the sparse partial envelope model, respectively. From the right panel of Figure 2, we can see that the asymptotic standard deviations of the four models from large to small are the standard model, the active partial envelope model, the oracle partial envelope model and the sparse partial envelope model separately. So, in summary, we can obtain that the sparse partial envelope estimator is more efficient than the standard estimator, the oracle partial envelope estimator and the active partial envelope estimator. The ratio of the asymptotic standard deviation of the standard estimator to that of the sparse partial envelope estimator is 11.68. The ratio of the asymptotic standard deviation of the oracle partial envelope estimator to that of the sparse partial envelope estimator is 1.27. The ratio of the asymptotic standard deviation of the active partial envelope estimator to that of the sparse partial envelope estimator is 6.22. The ratio of the asymptotic standard deviation of the sparse envelope estimator to that of the sparse partial envelope estimator is 1.35. As the sample size n increases, the difference between the sparse partial envelope estimator and the oracle partial envelope estimator also diminishes step by step.

Figure 2. Comparison of the standard deviations for five model estimators. The − line marks the asymptotic standard deviation of the standard model. The lines ‘

- △

’ and

- -

with + mark the actual and asymptotic standard deviations of the oracle partial envelope model. The lines ‘

- □

’ and

- -

with × mark the actual and asymptotic standard deviations of the active partial envelope model. The lines ‘

- ◊

’ and ⋯ with ✩ mark the actual and asymptotic standard deviations of the sparse envelope model. The lines ‘

- ◯

’ and

- \cdot

with * mark the actual and asymptotic standard deviations of the sparse partial envelope model.

Table 2 reports the variable selection results in the second situation. From Table 2, we can see that the variable selection performance of the sparse partial envelope estimator is the same as that of the sparse envelope estimator. Table 3 shows that comparisons of computing time in minutes among the standard model, the oracle partial envelope model, the active partial envelope model, the sparse partial envelope model and the sparse envelope model. It is obvious that the calculation speeds of the sparse partial envelope model and the sparse envelope model are significantly slower than those of the standard model, the oracle partial envelope model and the active partial envelope model, and the computation speed of the sparse envelope model is the slowest of the five models.

Table 2. Comparison of selection performances of the sparse partial envelope estimator and the sparse envelope estimator in the second situation.

Table 3. Comparison of computing time in minutes for the five models in the second situation.

In the third situation, we set

r = 100

,

q = 26

,

p = 50

,

p_{1} = 30

, and

u = 22

and varied

u_{1}

from 2 to 20. The matrix

(Γ_{A}, Γ_{A, 0})

was obtained by orthogonalizing the

q \times q

matrix of independent standard normal variates. Accordingly, we used the structure in (8) to build

Γ

,

Γ_{0}

and

β_{2}

. The elements in

η

were independent normal variates with mean 0 and variance 0.16, and the error covariance matrix had the structure

Σ = Γ Ω Γ^{T} + Γ_{0} Ω_{0} Γ_{0}^{T}

with

Ω = I_{u_{1}}

and

Ω_{0} = 25 I_{r - u_{1}}

. The predictors

X_{1}

were generated from a multivariate normal distribution with mean 0 and covariance matrix

I_{p_{1}}

, and the predictors

X_{2}

were generated from a multivariate normal distribution with mean 0 and covariance matrix

I_{p - p_{1}}

. The standard deviation of a randomly chosen element in

{\hat{β}}_{1}

is displayed under different

u_{1}

in Figure 3. We see that when

u_{1}

is small, there is a bigger immaterial part, and so we anticipate a more substantial efficiency gain by employing the sparse partial envelope estimator. Meanwhile, we see that the standard deviation of the sparse partial envelope model is significantly smaller than that of the standard model, so we can acquire that the sparse partial envelope estimator is much more efficient than the standard estimator.

Figure 3. Comparison of the standard deviations for the sparse partial envelope estimator (dashed) and the standard estimator (solid).

In the fourth situation, we had

n < r

and set

r = 1000

,

q = 10

,

p = 20

,

p_{1} = 15

,

u = 3

and

u_{1} = 2

. The first

q / 2

rows in

Γ_{A}

were

{{(1 / q)}^{1 / 2}, 0}^{T}

and the remaining

q / 2

rows in

Γ_{A}

were

{0, {(1 / q)}^{1 / 2}}^{T}

. Then, we employed the structure in (8) to construct

Γ

,

Γ_{0}

and

β_{2}

. The elements in

η

were independent

N (0, 36)

random variables, and the error covariance matrix followed the structure

Σ = Γ Ω Γ^{T} + Γ_{0} Ω_{0} Γ_{0}^{T}

, where

Ω = 0.25 I_{u_{1}}

and

Ω_{0}

was a block diagonal matrix with the upper left block being

16 I_{q - u_{1}}

and lower right block being

9 I_{r - q}

. The coefficient

β_{2, A}

was a matrix of independent normal variates. The predictors

X_{1}

and

X_{2}

were generated from independent

N (0, 1)

random variables. The sample sizes were 50, 100, 200, 400, 600, and 800, respectively, and we generated 200 replications for each sample size. Table 4 shows the variable selection performance of the sparse partial envelope estimator in the fourth situation under the high-dimensional setting. Figure 4 plots the average of

∥ {\hat{β}}_{1} - β_{1} ∥_{F}

over 200 replications versus sample size n, and Figure 5 describes the convergence of

∥ {\hat{β}}_{1} - β_{1} ∥_{F}

, which is indicated in Theorem 4. Again, the left panel in Figure 4 provides a comparison of

∥ {\hat{β}}_{1} - β_{1} ∥_{F}

among the standard estimator, the sparse partial envelope actual estimator and the sparse partial envelope bootstrap estimator, and the right panel highlights the comparison between sparse partial envelope actual estimator and sparse partial envelope bootstrap estimator. Meanwhile, Figure 5 also provides a comparison of

{[n / \{(r_{n} + s) log (r_{n})\}]}^{1 / 2} {∥ {\hat{β}}_{1} - β_{1} ∥}_{F}

among the standard estimator, the sparse partial envelope actual estimator and the sparse partial envelope bootstrap estimator. The bootstrap estimator of

∥ {\hat{β}}_{1} - β_{1} ∥_{F}

is computed based on the average of 200 bootstrap samples. With each bootstrap sample, we acquired the sparse partial envelope estimator

{\hat{β}}_{1, boot}

and calculated

∥ {\hat{β}}_{1, boot} - {\hat{β}}_{1} ∥_{F}

. Figure 4 and Figure 5 both demonstrate that

∥ {\hat{β}}_{1, boot} - {\hat{β}}_{1} ∥_{F}

is a good approximation to

∥ {\hat{β}}_{1} - β_{1} ∥_{F}

and also display that

∥ {\hat{β}}_{1} - β_{1} ∥_{F}

is much smaller than

∥ {\hat{β}}_{1, ols} - β_{1} ∥_{F}

.

Table 4. The selection performance of the sparse partial envelope estimator in the fourth situation under the high-dimensional setting.

Figure 4. Comparison of the standard estimator (dashed with □), the sparse partial envelope actual estimator (dashed with ◯) and bootstrap estimator (dashed with ▵).

Figure 5. Comparison of the standard estimator (dashed with □), the sparse partial envelope actual estimator (dashed with ◯) and bootstrap estimator (dashed with ▵).

In the fifth situation, we still had

n < r

and set

r = 1000

,

q = 10

,

p = 5

,

p_{1} = 3

,

u = 2

and

u_{1} = 1

. The first

q / 2

rows in

Γ_{A}

were

{(1 / q)}^{1 / 2}

and the remaining

q / 2

rows in

Γ_{A}

were

{(2 / q)}^{1 / 2}

. The other parameters were the same as those in the fourth situation. The sample sizes were 50, 100, 200, 400, 600, and 800, respectively, and we generated 200 replications for each sample size. Table 5 shows the variable selection performance of the sparse partial envelope estimator in the fifth situation under the high-dimensional setting.

Table 5. The selection performance of the sparse partial envelope estimator in the fifth situation under the high-dimensional setting.

6. Real Data Analysis

This section is dedicated to a real data example that elaborates aspects of the sparse partial envelope model in the case of

n < r

. This example comes from full-scale experiments at a paper factory in Norway. It is well known that the quality of paper is typically depicted by several variables and is then highly multivariate. Furthermore, the production of paper is a very complex process, relying on a host of variables, where some can be controlled and others cannot. The experiment was carried out at the factory of the paper plant Saugbruksforeningen, Norway. The data consist of 30 observations (rows) and 41 variables (columns), and we can acquire

n = 30

observations. See Aldrin [] for a much more detailed description.

In this experiment, the goal is to find the influence of 9 predictor variables on the quality of the paper measured by 32 response variables. Response variables correspond to columns 1 to 32, which depict various qualities of the paper, so

r = 32

. Predictor variables correspond to columns 33 to 41. The first three predictor variables are

x_{1}

in column 33,

x_{2}

in column 34 and

x_{3}

in column 35, respectively, and are changed methodically by the experiment. The next three predictor variables which are corresponding to columns 36 to 38, are established by

x_{1} * 2

,

x_{2} * 2

and

x_{3} * 2

. The last three predictor variables, which correspond to columns 39 to 41, are established by

x_{1} * x_{2}

,

x_{1} * x_{3}

and

x_{2} * x_{3}

. So, we can obtain

p = 9

. The predictors

x_{1}

,

x_{2}

,

x_{3}

,

x_{1} * x_{2}

,

x_{1} * x_{3}

and

x_{2} * x_{3}

are assigned to the main predictors, so

p_{1} = 6

. The sparse partial envelope model was fitted to the data, and cross-validation suggested

u_{1} = 1

. The model identified all the response variables as active responses. We estimated

∥ {\hat{β}}_{1, ols} - β_{1} ∥_{F}

and

∥ {\hat{β}}_{1} - β_{1} ∥_{F}

in the sparse partial envelope model by the average of 200 bootstrap samples and also estimated

∥ \hat{β} {- β ∥}_{F}

in the sparse envelope model by the average of 200 bootstrap samples. The ratio of the estimated

∥ {\hat{β}}_{1, ols} - β_{1} ∥_{F}

to

∥ {\hat{β}}_{1} - β_{1} ∥_{F}

is 2.34, and the ratio of the estimated

∥ \hat{β} {- β ∥}_{F}

in the sparse envelope model to

∥ {\hat{β}}_{1} - β_{1} ∥_{F}

in the sparse partial envelope model is 1.73. By and large, they both demonstrate an obvious efficiency gain as a result of the sparse partial envelope model.

For purpose of testing the prediction performance, we randomly split the data into two parts of equal size. Half of the data are used as the training set and the other half are used as the testing set. The prediction error is computed as

\begin{matrix} \begin{matrix} Prediction error = \sqrt{\frac{1}{n_{test set}} \sum_{i \in test set} {(Y_{i} - {\hat{Y}}_{i, predict})}^{T} (Y_{i} - {\hat{Y}}_{i, predict})} . \end{matrix} \end{matrix}

(22)

Then, the prediction error is averaged over 100 random splits, and the average prediction error is reported. The sparse envelope model has a prediction error of 11.8162, and the sparse partial envelope model has a prediction error of 2.8131, which is a 76% reduction.

7. Discussion

In this article, we propose a sparse partial envelope model under the multivariate linear regression model, which can implement response variable selection and improve parameter estimation efficiency. Then, we demonstrate its theoretical properties containing consistency, an oracle property and the asymptotic distribution of the sparse partial envelope estimator. Meanwhile, we both consider the large-sample scenario and high-dimensional scenario. Finally, we provide simulation studies and real data analysis to support the theories. In this paper, the predictor and response variables in the sparse partial envelope model are both vector-valued. Interesting topics for further studies involve researching the case where the predictor and response variables in the sparse partial envelope model are applied to matrix-valued data or tensor-valued data.

Author Contributions

Conceptualization, Y.W.; methodology, Y.W.; software, J.Z.; validation, Y.W. and J.Z.; formal analysis, Y.W.; investigation, Y.W. and J.Z.; resources, Y.W.; data curation, J.Z.; writing—original draft preparation, Y.W. and J.Z.; writing—review and editing, Y.W. and J.Z.; visualization, J.Z.; supervision, Y.W.; project administration, J.Z.; funding acquisition, J.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This research is supported by the Natural Science Research Start-up Foundation of Recruiting Talents of the Nanjing University of Posts and Telecommunications (Grant No. NY223064).

Data Availability Statement

The dataset is publicly available and can be downloaded from Aldrin (1996).

Acknowledgments

We thank the editor, associate editor, and reviewers for their constructive comments that have improved this paper substantially.

Conflicts of Interest

The authors declare that there are no conflicts of interests.

References

Cook, R.D.; Li, B.; Chiaromonte, F. Envelope models for parsimonious and efficient multivariate linear regression. Stat. Sin. 2010, 20, 927–1010. [Google Scholar]
Su, Z.; Cook, R.D. Partial envelopes for efficient estimation in multivariate linear regression. Biometrika 2011, 98, 133–146. [Google Scholar] [CrossRef]
Su, Z.; Cook, R.D. Inner envelopes: Efficient estimation in multivariate linear regression. Biometrika 2012, 99, 687–702. [Google Scholar] [CrossRef]
Su, Z.; Cook, R.D. Estimation of multivariate means with heteroscedastic errors using envelope models. Stat. Sin. 2013, 23, 213–230. [Google Scholar] [CrossRef][Green Version]
Cook, R.D.; Helland, I.S.; Su, Z. Envelopes and partial least squares regression. J. R. Stat. Soc. Ser. B (Stat. Methodol.) 2013, 75, 851–877. [Google Scholar] [CrossRef]
Su, Z.; Zhu, G.; Chen, X.; Yang, Y. Sparse envelope model: Efficient estimation and response variable selection in multivariate linear regression. Biometrika 2016, 103, 579–593. [Google Scholar] [CrossRef]
Cook, R.D.; Zhang, X. Foundations for envelope models and methods. J. Am. Stat. Assoc. 2015, 110, 599–611. [Google Scholar] [CrossRef]
Cook, R.D.; Zhang, X. Algorithms for envelope estimation. J. Comput. Graph. Stat. 2016, 25, 284–300. [Google Scholar] [CrossRef]
Cook, R.D.; Zhang, X. Fast envelope algorithms. Stat. Sin. 2018, 28, 1179–1197. [Google Scholar]
Khare, K.; Pal, S.; Su, Z. A bayesian approach for envelope models. Ann. Stat. 2017, 45, 196–222. [Google Scholar] [CrossRef]
Li, L.; Zhang, X. Parsimonious tensor response regression. J. Am. Stat. Assoc. 2017, 112, 1131–1146. [Google Scholar] [CrossRef]
Zhang, X.; Li, L. Tensor envelope partial least-squares regression. Technometrics 2017, 59, 426–436. [Google Scholar] [CrossRef]
Pan, Y.; Mai, Q.; Zhang, X. Covariate-adjusted tensor classification in high dimensions. J. Am. Stat. Assoc. 2019, 114, 1305–1319. [Google Scholar] [CrossRef]
Zhu, G.; Su, Z. Envelope-based sparse partial least squares. Ann. Stat. 2020, 48, 161–182. [Google Scholar] [CrossRef]
Zhang, J.; Huang, Z.; Xiong, Y. Efficient estimation of reduced-rank partial envelope model in multivariate linear regression. Random Matrices Theory Appl. 2021, 10, 2150024. [Google Scholar] [CrossRef]
Zhang, J.; Huang, Z.; Jiang, Z. Groupwise partial envelope model: Efficient estimation in multivariate linear regression. Commun. Stat.-Simul. Comput. 2023, 52, 2924–2940. [Google Scholar] [CrossRef]
Zhang, J.; Huang, Z. Scale invariant and efficient estimation for groupwise scaled envelope model. J. Korean Stat. Soc. 2024, 53, 1027–1048. [Google Scholar] [CrossRef]
Zhang, J.; Huang, Z.; Zhu, L.X. Scaled partial envelope model in multivariate linear regression. Stat. Sin. 2023, 33, 663–683. [Google Scholar] [CrossRef]
Cook, R.D.; Forzani, L.; Su, Z. A note on fast envelope estimation. J. Multivar. Anal. 2016, 150, 42–54. [Google Scholar] [CrossRef]
Zou, H. The adaptive lasso and its oracle properties. J. Am. Stat. Assoc. 2006, 101, 1418–1429. [Google Scholar] [CrossRef]
Rothman, A.J.; Bickel, P.J.; Levina, E.; Zhu, J. Sparse permutation invariant covariance estimation. Electron. J. Stat. 2008, 2, 494–515. [Google Scholar] [CrossRef]
Dattorro, J. Convex Optimization & Euclidean Distance Geometry; Meboo Publishing: Palo Alto, CA, USA, 2016. [Google Scholar]
Cook, R.D.; Setodji, C.M. A model-free test for reduced rank in multivariate regression. J. Am. Stat. Assoc. 2003, 98, 340–351. [Google Scholar] [CrossRef]
Magnus, J.R.; Neudecker, H. Matrix Differential Calculus with Applications in Statistics and Econometrics; John Wiley & Sons: Hoboken, NJ, USA, 2019. [Google Scholar]
Shapiro, A. Asymptotic theory of overparameterized structural models. J. Am. Stat. Assoc. 1986, 81, 142–149. [Google Scholar] [CrossRef]
Magnus, J.R.; Neudecker, H. The commutation matrix: Some properties and applications. Ann. Stat. 1979, 7, 381–394. [Google Scholar] [CrossRef]
Ravikumar, P.; Wainwright, M.J.; Raskutti, G.; Yu, B. High-dimensional covariance estimation by minimizing ℓ₁-penalized log-determinant divergence. Electron. J. Stat. 2011, 5, 935–980. [Google Scholar] [CrossRef]
Aldrin, M. Moderate projection pursuit regression for multivariate response data. Comput. Stat. Data Anal. 1996, 21, 501–531. [Google Scholar] [CrossRef]

Figure 1. Comparison of the standard deviations for four model estimators. The − line marks the asymptotic standard deviation of the standard model. The lines ‘

- △

’ and

- -

with + mark the actual and asymptotic standard deviations of the oracle partial envelope model. The lines ‘

- □

’ and

- -

with × mark the actual and asymptotic standard deviations of the active partial envelope model. The lines ‘

- ◯

’ and

- \cdot

with * mark the actual and asymptotic standard deviations of the sparse partial envelope model.

Figure 2. Comparison of the standard deviations for five model estimators. The − line marks the asymptotic standard deviation of the standard model. The lines ‘

- △

’ and

- -

with + mark the actual and asymptotic standard deviations of the oracle partial envelope model. The lines ‘

- □

’ and

- -

with × mark the actual and asymptotic standard deviations of the active partial envelope model. The lines ‘

- ◊

’ and ⋯ with ✩ mark the actual and asymptotic standard deviations of the sparse envelope model. The lines ‘

- ◯

’ and

- \cdot

with * mark the actual and asymptotic standard deviations of the sparse partial envelope model.

Figure 3. Comparison of the standard deviations for the sparse partial envelope estimator (dashed) and the standard estimator (solid).

Figure 4. Comparison of the standard estimator (dashed with □), the sparse partial envelope actual estimator (dashed with ◯) and bootstrap estimator (dashed with ▵).

Figure 5. Comparison of the standard estimator (dashed with □), the sparse partial envelope actual estimator (dashed with ◯) and bootstrap estimator (dashed with ▵).

Table 1. Comparison of selection performances of the sparse partial envelope estimator and the sparse envelope estimator in the first situation.

	Sparse Partial Envelope		Sparse Envelope
n	TPR	TNR	TPR	TNR
50	100%	100%	100%	6.67%
80	100%	100%	100%	93.33%
100	100%	100%	100%	53.33%
150	100%	100%	100%	73.33%
200	100%	100%	100%	100%
250	100%	100%	100%	93.33%
300	100%	100%	100%	100%
350	100%	100%	100%	100%

Table 2. Comparison of selection performances of the sparse partial envelope estimator and the sparse envelope estimator in the second situation.

	Sparse Partial Envelope		Sparse Envelope
n	TPR	TNR	TPR	TNR
50	100%	100%	100%	100%
80	100%	100%	100%	100%
100	100%	100%	100%	100%
150	100%	100%	100%	100%
200	100%	100%	100%	100%
250	100%	100%	100%	100%
300	100%	100%	100%	100%
350	100%	100%	100%	100%

Table 3. Comparison of computing time in minutes for the five models in the second situation.

Model Type	Computing Time
Standard model	0.53
Oracle partial envelope	1.17
Active partial envelope	0.62
Sparse partial envelope	359.40
Sparse envelope	3673.44

Table 4. The selection performance of the sparse partial envelope estimator in the fourth situation under the high-dimensional setting.

n	TPR	TNR
50	100%	8.38%
100	100%	33.03%
200	100%	91.01%
400	100%	99.19%
600	100%	99.80%
800	100%	100%

Table 5. The selection performance of the sparse partial envelope estimator in the fifth situation under the high-dimensional setting.

n	TPR	TNR
50	100%	2.22%
100	100%	1.82%
200	100%	10.51%
400	100%	83.94%
600	100%	96.87%
800	100%	99.40%

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Efficient Estimation and Response Variable Selection in Sparse Partial Envelope Model

Abstract

1. Introduction

2. Review of the Envelope Model and Partial Envelope Model

3. Sparse Partial Envelope Model

4. Theoretical Properties of the Sparse Partial Envelope Estimator

5. Simulation Study

6. Real Data Analysis

7. Discussion

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Article Metrics

Citations

Article Access Statistics