Sure Independence Screening for Ultrahigh-Dimensional Additive Model with Multivariate Response

Chen, Yongshuai; Liang, Baosheng

doi:10.3390/math13101558

Open AccessArticle

Sure Independence Screening for Ultrahigh-Dimensional Additive Model with Multivariate Response

by

Yongshuai Chen

¹ and

Baosheng Liang

^2,*

¹

School of Statistics, Capital University of Economics and Business, Beijing 100070, China

²

Department of Biostatistics, School of Public Health, Peking University, Beijing 100191, China

^*

Author to whom correspondence should be addressed.

Mathematics 2025, 13(10), 1558; https://doi.org/10.3390/math13101558

Submission received: 30 March 2025 / Revised: 2 May 2025 / Accepted: 6 May 2025 / Published: 9 May 2025

(This article belongs to the Special Issue Parametric and Nonparametric Statistics: From Theory to Applications)

Download

Browse Figures

Versions Notes

Abstract

This paper investigated an ultrahigh-dimensional feature screening approach for additive models with multivariate responses. We proposed a nonparametric screening procedure based on random vector correlations between each predictor and multivariate response, and we established the theoretical results of sure screening and ranking consistency properties under regularity conditions. We also developed an iterative sure independence screening algorithm for convenient and efficient implementation. Extensive finite-sample simulations and a real data example demonstrate the superiority of the proposed procedure over 58–100% of existing candidates. On average, the proposed method outperforms 79% of existing methods across all scenarios considered.

Keywords:

sure independence screening; ultrahigh dimensional; additive model; multivariate response; random vector correlation

MSC:

62H20

1. Introduction

Many scientific fields work with ultrahigh-dimensional data, where the data dimensionality far exceeds the sample size. Examples include microarray analysis, tumor classification, and biomedical imaging. Analyzing such data presents challenges in terms of computational expediency, statistical accuracy, and algorithmic stability, as pointed out in Fan and Lv [1], Fan et al. [2], Li et al. [3] and Liu et al. [4].

Several methods have been proposed to tackle the challenges associated with handling ultrahigh-dimensional data. Fan and Lv [1] introduced the concept of sure screening, which screens for a property that all the important variables survive after applying a variable screening procedure with probability tending to 1, and they proposed the sure independence screening (SIS) method based on the Pearson correlation for Gaussian predictors and responses in linear models. They also developed an iterative sure independence screening (ISIS) method to improve its effectiveness. Fan and Song [5] extended SIS and ISIS to generalized linear models by ranking the maximum marginal likelihood estimates. Zhu et al. [6] proposed a sure-independent ranking and screening (SIRS) procedure for selecting active predictors in multi-index models. Fan et al. [7] extended correlation learning to marginal nonparametric learning and introduced a nonparametric independence screening (NIS) procedure for sparse additive models, utilizing B-spline expansion for approximating the nonparametric projections. They also developed an iterative nonparametric independence screening (INIS) procedure to improve the performance of NIS in terms of false positive rates. Yuan and Billor [8] proposed a functional version of SIS (functional SIS) by applying the SIS method to functional regression models with a scalar response and functional predictors. Cui et al. [9] introduced a modified SIS (EVSIS) to select important predictors in ultrahigh-dimensional linear models with measurement errors.

The methods mentioned above are model-based. On the other hand, there are model-free feature screening procedures that use the measure of independence. For example, Li et al. [10] developed a sure independence screening procedure based on distance correlation (DC-SIS). Shao and Zhang [11] employed martingale difference correlation to construct a screening procedure (MDC-SIS). Cui et al. [12] proposed a screening method (MV-SIS) using a mean–variance index for ultrahigh-dimensional discriminant analysis. Chen and Lu [13] proposed a quantile-composited feature screening (QCS) procedure for ultrahigh-dimensional grouped data by transforming the continuous predictor to a Bernoulli variable. Liu et al. [14] introduced a screening method based on the proposed generalized Jaccard coefficient (GJAC-SIS) in ultrahigh-dimensional survival data. Sang and Dang [15] utilized the Gini distance correlation between a group of covariates with responses to establish a grouped screening method for discriminant analysis in ultrahigh-dimensional data. Wu and Cui [16] developed feature screening procedures on the basis of the Hellinger distance (HD-SIS and AHD-SIS), which are applicable to both discrete and continuous response variables. Zhong et al. [17] offered a groupwise feature screening procedure via the semi-distance correlation for ultrahigh-dimensional classification problems. Tian et al. [18] proposed a conditional mean independence screening procedure via variations in the conditional mean.

Out of the screening methods mentioned earlier, only DC-SIS can handle multivariate responses. However, there is a growing need for ultrahigh-dimensional feature selection for multivariate response in applications such as pathway analysis. If we overlook the correlation among the multivariate response and only apply the screening procedure to each component of the response, we may miss some important predictors. As a result, Li et al. [3] developed a projection screening procedure to select variables for linear models with multivariate response and ultrahigh-dimensional covariates. Using the generalized correlation between each predictor and multivariate response, Liu et al. [4] proposed a screening method form ultrahigh-dimensional additive models (GCPS and IGCPS). Li et al. [19] generalized the martingale difference correlation (GMDC) to measure the conditional mean independence and conditional quantile independence, and they utilized it as a marginal utility to develop high-dimensional variable screening (GMDC-SIS and quantile GMDC-SIS) procedures.

Due to the lack of a priori information about the model structure, a more flexible class of nonparametric models, such as the additive model, can be used to promote the flexibility of models. Some studies on additive model with multivariate response are listed as follows. Liu et al. [20] proposed a novel approach for multivariate additive models by treating the coefficients in the spline expansions as a third-order tensor and combined dimension reduction with penalized variable selection to deal with ultrahigh dimensionality. They applied the proposed method to a breast cancer study and selected several important genes that have significant associations between the measured genome copy numbers and the level of gene transcripts. Desai et al. [21] introduced a new method for the simultaneous selection and estimation of multivariate sparse additive models with correlated errors. The method was utilized to investigate the protein–mRNA expression levels, and it helped to explore and establish nonlinear mRNA–protein associations.

Therefore, in this study, we focus on an additive model for sparse ultrahigh-dimensional data with multivariate response. A nonparametric screening procedure is proposed to measure the relationship between the multivariate response and each predictor simultaneously in the additive model. In particular, we obtain a normalized B-spline basis for each predictor and then compute the random vector correlation between the multivariate response and this basis. The predictors are then ranked based on the random vector correlation. The proposed method has sure screening and ranking consistency properties under some regularity conditions. Furthermore, to address the issue of the marginal method selecting unimportant predictors, which is due to their high correlations with important ones, an iterative sure independence screening method is also developed to improve the finite sample’s performance.

The remainder of this paper is organized as follows. In Section 2, we proposed sure independence screening based on the random vector correlation (RVC-SIS) procedure for sparse additive models with multivariate response and study its theoretical properties. Section 3 presents an iterative sure independence screening (RVC-ISIS) procedure to enhance the performance of RVC-SIS. In Section 4, we conduct numerical studies and real data analysis to demonstrate the new procedure’s finite sample performance in comparison to some existing methods. Section 5 provides an application example. Finally, we discuss the performance and limitations of the proposed method in Section 6 and conclude in Section 7. The technical proofs are presented in Appendix A.

2. Sure Independence Screening Using Random Vector Correlation

Suppose

Y = {(Y_{1}, \dots, Y_{q})}^{⊤} \in R^{q}

is from the following additive model:

Y = μ + \sum_{j = 1}^{p} f_{j} (X_{j}) + ε,

where

X = {(X_{1}, \dots, X_{p})}^{⊤} \in R^{p}

is a predictor vector,

f_{j} (\cdot) = {(f_{1 j} (\cdot), \dots, f_{q j} (\cdot))}^{⊤}

denotes the unknown vector functions,

ε = {(ε_{1}, \dots, ε_{q})}^{⊤}

is the random error with the zero mean, and

μ

is a q-vector intercept term. Note that

f_{l j}

is a nonparametric smoothing functions; we use a cubic B-spline parameterization method to approximate them. Specifically, let

{B_{j k}, k = 1, \dots, d_{n}}

be a normalized B-spline basis with

∥ B_{j k} ∥_{\infty} \leq 1

, where

d_{n}

is the number of the basis function, and

{∥ \cdot ∥}_{\infty}

is the sup norm. Then,

f_{l j} (X_{j})

can be well approximated by

f_{n l j} (X_{j}) = \sum_{k = 1}^{d_{n}} β_{l j k} B_{j k} (X_{j}) = B_{j}^{⊤} β_{l j},

with

β_{l j} = {(β_{l j 1}, \dots, β_{l j d_{n}})}^{⊤}

and

B_{j} = {(B_{j 1} (X_{j}), \dots, B_{j d_{n}} (X_{j}))}^{⊤}

, which motivates us to utilize the random vector correlation (RVC hereafter) defined by Escoufier [22] as a marginal utility screening index:

ω_{j} = \frac{tr (Σ_{Y B_{j}} Σ_{B_{j} Y})}{\sqrt{tr (Σ_{Y Y}^{2}) tr (Σ_{B_{j} B_{j}}^{2})}} = : \frac{V (Y, B_{j})}{\sqrt{V (Y, Y) V (B_{j}, B_{j})}},

where

Σ_{Y B_{j}}

is the population covariance matrix between

Y

and

B_{j}

,

Σ_{Y Y}^{2}

is the square covariance matrix of

Y

and

Σ_{B_{j} B_{j}}^{2}

and the squared covariance matrix of

B_{j}

, and

tr

denotes the trace operator. Intuitively, if

Y

and

X_{j}

are independent, then

ω_{j} = 0

. Otherwise,

ω_{j}

would be positive. This desired property allows us to use

ω_{j}

to propose a feature screening method for additive models with multivariate responses.

Suppose that

X_{i} = {(X_{i 1}, \dots, X_{i p})}^{⊤}

and

Y_{i} = {(Y_{i 1}, \dots, Y_{i q})}^{⊤}

,

1 \leq i \leq n

are the samples from

X

and

Y

, respectively. Then, the sample version of

B_{j}

is

B_{j i} = {(B_{j 1} (X_{i j}), \dots, B_{j d_{n}} (X_{i j}))}^{⊤}

. An unbiased estimator of

V (Y, B_{j})

is given by

\begin{matrix} {\hat{V}}_{n} (Y, B_{j}) = \frac{1}{n (n - 3)} \sum_{i \neq k} A_{i k} B_{i k}, \end{matrix}

where

A_{i k}

and

B_{i k}

are

\begin{matrix} A_{i k} & = & a_{i k} - \frac{1}{n - 2} \sum_{l = 1}^{n} a_{i l} - \frac{1}{n - 2} \sum_{s = 1}^{n} a_{s k} + \frac{1}{(n - 1) (n - 2)} \sum_{s, l = 1}^{n} a_{s l}, \\ B_{i k} & = & b_{i k} - \frac{1}{n - 2} \sum_{l = 1}^{n} b_{i l} - \frac{1}{n - 2} \sum_{s = 1}^{n} b_{s k} + \frac{1}{(n - 1) (n - 2)} \sum_{s, l = 1}^{n} b_{s l}, \end{matrix}

with

a_{i k} = {∥ B_{j i} - B_{j k} ∥}^{2} / 2

and

b_{i k} = {∥ Y_{i} - Y_{k} ∥}^{2} / 2

. The sample RVC between

Y

and

B_{j}

is defined by

{\hat{ω}}_{j} = \frac{{\hat{V}}_{n} (Y, B_{j})}{\sqrt{{\hat{V}}_{n} (Y, Y) {\hat{V}}_{n} (B_{j}, B_{j})}}, j = 1, \dots, p .

Thus, a sure independence screening procedure based on a random vector can be carried out according to

{{\hat{ω}}_{j}}_{j = 1}^{p}

.

To study the sure screening property of the RVC-SIS procedure, we define the subset of the truly important predictors as

A = {j : E (∥ f_{j} (X_{j}) ∥) > 0, 1 \leq j \leq p} .

The following conditions are imposed to facilitate the proof of the properties:

(C1): The distribution of $X$ is absolutely continuous, and its density is bounded away from zero and infinity on ${[a, b]}^{p}$ , where a and b are finite real numbers.
(C2): The nonparametric components ${f_{l j}$ , $1 \leq j \leq p$ , $1 \leq l \leq q}$ belong to $F$ , where the rth derivative, $f^{(r)}$ , exists and satisfies the Lipschitz Condition with exponent $α$ : $| f^{(r)} (t) - f^{(r)} (t^{'}) | \leq K | t - t^{'} |^{α} for t, t^{'} \in [a, b]$ for some positive constant K. Here, r is a non-negative integer and $α \in (0, 1]$ such that $d = r + α > 0.5$ .
(C3): $∥ \sum_{j \in A} f_{l j} ∥_{\infty}$ is bound: $l = 1, \dots, q$ .
(C4): The random error vector $ε$ satisfies ${E {exp (t ∥ ε ∥}^{2})} < \infty$ for all $0 < t \leq 2 t_{0}$ .
(C5): ${min}_{j \in A} {max}_{1 \leq l \leq q} E [{E (Y_{l} | X_{j})}^{2}] \geq c_{1} d_{n} n^{- 2 κ}$ for some $0 < κ < d / (2 d + 1)$ and $c_{1} > 0$ .
(C6): There exist $ξ \in (0, 1)$ and $c > 0$ such that $d_{n}^{- 2 d - 1} \leq c_{1} (1 - ξ) n^{- 2 κ} / C_{1}$ and $2 c < c_{1} D_{1} ξ / \sqrt{D_{2} V (Y, Y)}$ . Here, $C_{1}$ , $D_{1}$ , and $D_{2}$ are given in Fact 1 and Fact 3 of [7].

Based on Condition (C1), we can infer that the marginal density of

X_{j}

is bounded on the interval

[a, b]

, away from zero and infinity. The Lipschitz Condition (C2) requires the function to be sufficiently smooth to determine the convergence rate for its spline estimates. Condition (C4) controls the tail distribution of the random error vector to ensure the sure screening property. Condition

(C 5)

requires that the signal of the important predictors is not too weak. Condition (C6) is imposed to show the good approximation of

f_{n l j} (X_{j})

relative to

f_{l j} (X_{j})

. Together with Conditions (C4) and (C6), Condition (C3) ensures uniform control in the probability inequality. These conditions are given in light of Li et al. [3], Liu et al. [4], Fan et al. [7] and the reference therein.

Note that it will be shown that

min_{j \in A} ω_{j} \geq 2 c d_{n}^{1 / 2} n^{- 2 κ}

under the conditions aforementioned. We thus select a set of predictors

\hat{A} = {j : {\hat{ω}}_{j} \geq c d_{n}^{1 / 2} n^{- 2 κ}, j = 1, \dots, p}

from a theoretical perspective. In addition, there exists an efficient way to choose a subset of variables

\hat{A}

at a practical application level. To be specific, we sort

{{\hat{ω}}_{j}}_{j = 1}^{p}

in descending order and pick out the first and largest

[n / log (n)]

to form

\hat{A}

, where

[a]

denotes the integer part of a.

Theorem 1

(Sure Screening Property). Under Conditions (C1)–(C4), for

0 < γ < 1 / 2 - 2 κ

and

c_{2} > 0

, there exist some positive constants

c_{3}

and

c_{4}

such that

Pr (max_{1 \leq j \leq p} | {\hat{ω}}_{j} - ω_{j} | \geq c_{2} d_{n}^{1 / 2} n^{- 2 κ}) \leq O (p [exp {- c_{3} n^{1 - 4 κ - 2 γ} d_{n}^{- 2}} + n exp (- c_{4} n^{γ})]) .

(1)

Along with Conditions (C5)–(C6), we have that

Pr (A \subseteq \hat{A}) \geq 1 - O (s_{n} [exp {- c_{3} n^{1 - 4 κ - 2 γ} d_{n}^{- 2}} + n exp (- c_{4} n^{γ})]),

(2)

where

s_{n} = | A |

is the cardinality of

A

.

Theorem 1 states that the proposed procedure is capable of handling the NP-dimensionality

log p = o (n^{1 - 4 κ - 2 γ} d_{n}^{- 2} + n^{γ})

. To balance the two terms on the right side of (1), we have

n^{1 - 4 κ - 2 γ} d_{n}^{- 2} = n^{γ}

. As a result, (1) can be expressed as

Pr (max_{1 \leq j \leq p} | {\hat{ω}}_{j} - ω_{j} | \geq c_{2} d_{n}^{1 / 2} n^{- 2 κ}) \leq O (p [exp {- c_{3} n^{(1 - 4 κ) / 3} d_{n}^{- 2 / 3}}]) .

This implies that we can handle the NP-dimensionality

log p = o (n^{(1 - 4 κ) / 3} d_{n}^{- 2 / 3})

, indicating that dimensionality depends on the the minimum true signal strength and the number of basis functions.

The proposed procedure enjoys the ranking consistency property.

Theorem 2

(Ranking Consistency Property). If Conditions (C1)–(C6) hold, then we have

\underset{n \to \infty}{lim inf} {min_{j \in A} {\hat{ω}}_{j} - max_{j \in A^{c}} {\hat{ω}}_{j}} > 0 in probability .

Compared to the sure screening property, Theorem 2 demonstrates a stronger theoretical result. It indicates, with overwhelming probability, that the truly active predictors tend to have larger magnitudes of

{\hat{ω}}_{j}

than the inactive ones. In other words, the active predictors can be ranked at the top by our screening method.

3. Iterative Sure Independence Screening Using Random Vector Correlation

Three main challenges identified by Fan and Lv [1] may also arise with RVC-SIS. Firstly, RVC-SIS may prioritize unimportant predictors that are highly correlated with important predictors over other important predictors that are relatively weakly related to the response. Secondly, RVC-SIS may fail to select an important predictor that is marginally uncorrelated but jointly correlated with the response, leading to its exclusion from the estimated model. Finally, all the important predictors may not be screened out by RVC-SIS due to the collinearity among predictors. To address these three challenges, we use the joint information of the covariates more comprehensively in variable selection rather than solely relying on marginal information, to develop the RVC-ISIS method that is summarized in Algorithm 1.

Algorithm 1 RVC-ISIS Algorithm

0:: Start with an empty important set $\hat{A} = \emptyset$ .
1:: First, use RVC-SIS to select $d_{1}$ predictors with the first $d_{1}$ largest ${\hat{ω}}_{j}$ values.
Add the indices of these selected predictors to $\hat{A}$ .
2:: Calculate $X_{n e w} = {I_{n} - X_{\hat{A}} {(X_{\hat{A}}^{⊤} X_{\hat{A}})}^{- 1} X_{\hat{A}}^{⊤}} X_{{\hat{A}}^{c}}$ .
3:: Conduct RVC-SIS procedure for $Y$ and $X_{n e w}$ . Choose $d_{2}$ predictors in the same way as Step 1 and update $\hat{A}$ by adding the indices of these selected $d_{2}$ predictors.
4:: Repeat Step 2 and Step 3 until the total number of selected predictors reaches a predetermined value, such as $| \hat{A} | = [n / log (n)]$ .
5:: Output the final indices of the selected predictors $\hat{A}$ .

RVC-ISIS focuses on Step 3, which operates in two ways: Firstly, when an active variable is correlated with other active variables but marginally independent of the response, Step 3 helps make it relevant to the response, and then, it becomes marginally detectable. Secondly, when many irrelevant variables are highly correlated with the active variables, Step 3 utilizes the correlated active variables selected in Step 1 to make the rest of the active variables easier to detect, rather than detecting the irrelevant variables.

4. Numerical Studies

In this section, we evaluate the finite-sample performance of our proposed screening procedure through a series of simulation studies and a real data example. We compare its performance with Naive-SIS1, Naive-ISIS1, Naive-SIS2, Naive-ISIS2, PS, and IPS in Li et al. [3]; GCPS and IGCPS in [4]; DC-SIS in Li et al. [10]; and DC-ISIS in Zhong and Zhu [23]. Our simulation studies are conducted using R version 4.1.2. We set the total number of predictors as

p = 2000

and the sample size as

n = 400

. For our proposed method, we take

[n^{1 / 5}]

as the number of internal knots. We adopt the criteria from Liu et al. [24] to assess the performance of the proposed method:

(i): $R_{j}$ : The average rank of $X_{j}$ in terms of the sorted list via the screening procedure based on 1000 replications. The higher the ranking of $X_{j}$ , the greater the probability of being selected.
(ii): M: The minimum model size to include all active predictors. In other words, M represents the largest rank of the true predictors: $M = {max}_{j \in A} R_{j}$ , where $A$ is the true model. We report the $5 %, 25 %, 50 %, 75 %$ , and $95 %$ quantiles of M from 1000 repetitions.
(iii): $P_{j}$ : The proportion of important predictor $X_{j}, j \in A$ being selected for a given model size $[n / log (n)]$ in the 1000 replications.
(iv): $P_{a}$ : The proportion of all active predictors being selected into the submodel with size $[n / log (n)]$ over 1000 simulations.

Example 1.

Following Li et al. [3], we consider a linear model with multivariate responses as follows.

\begin{matrix} Y_{1} & = & β_{11} X_{1} + β_{12} X_{2} + \dots + β_{15} X_{5} + ε_{1}, \\ ⋮ & ⋮ \\ Y_{q_{n}} & = & β_{q_{n}, 1} X_{1} + β_{q_{n}, 2} X_{2} + \dots + β_{q_{n}, 5} X_{5} + ε_{q_{n}} . \end{matrix}

The dimension of the response is

q_{n} = [n^{α}], α \sim U (0.2, 0.9)

. The truly active predictors set is

M = {X_{1}, \dots, X_{5}}

.

X = {(X_{1}, \dots, X_{p})}^{⊤}

is drawn from a multivariate normal distribution

N (0_{p \times 1}, Σ)

, where Σ is a

p \times p

covariance matrix with elements

σ_{i j} = 0 . 8^{| i - j |}

. The nonzero coefficient is given by

β_{k j} = {(- 1)}^{W_{k j}} (log (\sqrt{n}) + | Z_{k j} |), k = 1, \dots, q_{n}, j = 1, \dots, 5,

where

W_{k j}

is a binary random variable with

Pr (W_{k j} = 1) = 0.6

and

Z_{k j} \sim N (0, 1)

. The random error term

ε_{k}

follows the standard normal distribution

N (0, 1)

.

Example 2.

We apply the screening procedures to the additive model when the components of the multivariate response are strongly correlated:

\begin{matrix} Y_{1} & = & 5 f_{1} (X_{1}) + 8 f_{2} (X_{2}) + 4 f_{3} (X_{3}) + 6 f_{4} (X_{4}) + ε_{1}, \\ Y_{2} & = & 4 f_{1} (X_{1}) + 6 f_{2} (X_{2}) + 5 f_{3} (X_{3}) + 8 f_{4} (X_{4}) + ε_{2}, \\ Y_{3} & = & 6 f_{1} (X_{1}) + 4 f_{2} (X_{2}) + 3 f_{3} (X_{3}) + 5 f_{4} (X_{4}) + ε_{3}, \end{matrix}

with

f_{1} (x) = exp (sin (π x)), f_{2} (x) = exp (x) {(2 x - 1)}^{2}, f_{3} (x) = \frac{sin (2 π x)}{2 - sin (2 π x)}

and

f_{4} (x) = 0.1 sin (2 π x) + 0.2 cos (2 π x) + 0.3 {sin}^{2} (2 π x) + 0.4 {cos}^{3} (2 π x) + 0.5 {sin}^{3} (2 π x) .

The predictor

X = {(X_{1}, \dots, X_{p})}^{⊤}

is generated by

X_{j} = \frac{W_{j} + U}{2}, j = 1, \dots, p,

where

W_{1}, \dots, W_{p}

and U are i.i.d. U(0,1). The truly active predictors set is

M = {X_{1}, \dots, X_{4}}

. The random error term

ε_{k}

is from the standard normal distribution

N (0, 1)

.

Example 3.

The following model is taken from Fan and Lv [1] and Li et al. [3].

\begin{matrix} Y_{1} & = & 5 X_{1} + 5 X_{2} + 5 X_{3} - 15 X_{4} \sqrt{0.5} + ε_{1}, \\ Y_{2} & = & 4 X_{1} + 6 X_{2} + 8 X_{3} - 18 X_{4} \sqrt{0.5} + ε_{2}, \\ Y_{3} & = & 5 X_{1} + 4 X_{2} + 6 X_{3} - 15 X_{4} \sqrt{0.5} + ε_{3} . \end{matrix}

X = {(X_{1}, \dots, X_{p})}^{⊤}

follows a multivariate normal distribution

N (0_{p \times 1}, Σ)

. The covariance matrix Σ satisfies

σ_{11} = \dots = σ_{p p} = 1

and the off-diagonal elements

σ_{i j} = 0.5, i \neq j

, except for

σ_{4 j} = σ_{j 4} = \sqrt{0.5}, j \neq 4

. The random error term

ε_{k}

follows a standard normal distribution

N (0, 1)

. The set of truly active predictors is

M = {X_{1}, \dots, X_{4}}

.

Example 4.

We adopt the simulation model in [1,3] as follows.

\begin{matrix} Y_{1} & = & 5 X_{1} + 5 X_{2} + 5 X_{3} - 15 X_{4} \sqrt{0.5} + 1.25 X_{5} + ε_{1}, \\ Y_{2} & = & 4 X_{1} + 6 X_{2} + 8 X_{3} - 18 X_{4} \sqrt{0.5} + 1.5 X_{5} + ε_{2}, \\ Y_{3} & = & 5 X_{1} + 4 X_{2} + 6 X_{3} - 15 X_{4} \sqrt{0.5} + 1.25 X_{5} + ε_{3} . \end{matrix}

The covariance matrix Σ is the same as in Example 3, except that

σ_{5 j} = σ_{j 5} = 0, j \neq 5

.

X_{5}

is uncorrelated with any

X_{j}, j \neq 5

. The random error term

ε_{k}

is from the standard normal distribution

N (0, 1)

.

In Example 1, the model is linear, and the response is multivariate. All methods perform well under this scenario, as shown in Table 1, Table 2 and Table 3.

In Example 2, the model is nonlinear. Each component of the multivariate response has the same nonlinear function of the active predictors. The only difference is the corresponding coefficients. It is easy to check, in this example, that the components of the response are strongly correlated. Note that the smaller the rank of a predictor, the easier it is to be selected. Thus, we can conclude from Table 1 that our proposed method RVC-SIS and RVC-ISIS outperforms the remaining ones in terms of the rank of the important predictors. Naive-SIS1, Naive-SIS2, and PS perform badly in selecting the important predictors

X_{1}

–

X_{4}

. Naive-ISIS1 and Naive-ISIS2 still fail to detect

X_{1}

–

X_{2}

, while IPS still exhibits bad performances in choosing

X_{1}

due to the large rank. The rank of active variables

X_{2}

–

X_{4}

in GCPS is too high to be chosen, while the rank of

X_{2}

in IGCPS is low enough to be selected. Although DC-SIS behaves badly in selecting

X_{2}

in this context, DC-ISIS dramatically improves its performance. From Table 2, we can see that DC-ISIS, RVC-SIS, and RVC-ISIS can pick up

X_{1}

–

X_{4}

with an overwhelming probability. Naive-SIS1, Naive-SIS2, and PS select these predictors with little chance. Naive-ISIS1 and Naive-ISIS2 still fail to detect

X_{1}

–

X_{2}

, while IPS cannot identify

X_{1}

. GCPS selects

X_{1}

with overwhelming probability, while IGCPS chooses

X_{2}

–

X_{4}

with a large probability. Table 3 demonstrates that only DC-ISIS, RVC-SIS, and RVC-ISIS include all important predictors

X_{1}

–

X_{4}

with a low M, indicating that the four predictors have a higher priority of being selected by these screening methods and thus are ranked at the top.

In Example 3, it is evident that predictor

X_{4}

is marginally uncorrelated with

Y

at the population level, but it is important to

Y

. Table 1 shows that Naive-SIS1, Naive-SIS2, PS, DC-SIS, GCPS, and RVC-SIS consistently identify

X_{4}

with a larger rank, while the proposed iterative screening procedures of these methods screen out

X_{4}

with a smaller rank. This indicates that the iterative screening procedure can effectively tackle the second challenge that is mentioned at the beginning of Section 3. Table 2 demonstrates that Naive-SIS1, Naive-SIS2, PS, DC-SIS, GCPS, and RVC-SIS select

X_{4}

with little probability, while their respective iterative screening procedures can pick it up with an overwhelming probability. It can be seen from Table 3 that the model size of Naive-SIS1, Naive-SIS2, PS, DC-SIS, GCPS, and RVC-SIS is consistently large due to the incapability of identifying

X_{4}

.

In Example 4, it is worth noting that important predictor

X_{4}

is again uncorrelated with the response

Y

, and

X_{5}

has a very small correlation with response

Y

. In fact, variable

X_{5}

has the same proportion of contributions to the response as noise

ε_{i}

does. Similar conclusions can be drawn from these tables due to the existence of

X_{4}

. Furthermore, these screening methods have a lower priority of selecting

X_{5}

than

X_{1}

–

X_{3}

due to the small contribution of

X_{5}

to the response.

5. Application

In order to demonstrate the application of our proposed method, we used yeast cell-cycle gene expression data in Spellman et al. [25] and chromatin immunoprecipitation on chip (CHIP-chip) data from Lee et al. [26]. The yeast cell-cycle gene expression data consist of 542 genes from a factor-based experiment in which 18 mRNA levels are measured every 7 min for 119 min.

The chromatin immunoprecipitation on chip (ChIP-chip) data contain the binding information for 106 transcription factors (TFs), which are related to cell regulation. These two datasets are combined as the yeast cell-cycle dataset available in the R package spls. For further details about this dataset, please refer to Chun and Keles [27].

First, the data are randomly split into a training set of 300 and a test set of 242. We apply all screening methods on the training set to select the active predictors. To obtain the threshold in the screening procedures, we introduce auxiliary variables proposed by Zhu et al. [6]. That is, we independently generate

d = 2000

auxiliary variables

Z \sim N_{d} (0, I_{d})

such that

Z

is independent of both

Y

and

X

. Consider the

(p + d)

-dimensional vector

{(X^{⊤}, Z^{⊤})}^{⊤}

as the predictors and

Y

as the response. Then, we calculate

\hat{ω_{j}}

for

j = 1, \dots, p + d

. Note that

Z

is truly inactive via artificial construction, and we can define a threshold

C_{d} = max_{j = p + l \dots, p + d} \hat{ω_{j}}

to select the important predictors. We repeated the random data segmentation 100 times to avoid the impact of random segmentation on the results and comparisons.

Note that there are 21 known and experimentally verified TFs related to the cell-cycle process, as stated in Wang et al. [28]. We then compare the number of confirmed TFs selected in the screening procedure mentioned earlier. The results are reported in Figure 1. From this figure, it can be concluded that RVC-ISIS performs best among all screening procedures, and RVC-SIS performs best among the screening procedures except the iterative ones.

Furthermore, we applied the test set to assess the performance of the screening procedures in terms of the mean square error of the predictions, which can be found in Figure 2 and Table 4. It can be seen that these iterative screening procedures uniformly outperform the original non-iterative screening ones.

6. Discussion

Through extensive simulations and a real data example, we see that the proposed procedure outperforms 58–100% of the existing candidates such as the Naive-SIS1, Naive-ISIS1, Naive-SIS2, Naive-ISIS2, PS and IPS by Li et al. [3], GCPS and IGCPS by Liu et al. (2020) [4], DC-SIS by Li et al. [10], and DC-ISIS by Zhong and Zhu [23]. On average, the proposed method outperforms 79% of existing methods across both linear and nonlinear scenarios. In particular, the proposed method has an enormous advantage in the nonlinear scenarios, with multivariate responses being strongly correlated, like in Example 2.

There exists some limitations in the proposed screening method as well. First, as Muthyala et al. (2024) [29] pointed out, although sure independence screening methods have been proven to be particularly effective in the natural sciences and clinical studies, its widespread adoption has been limited by performance inefficiencies and the challenges posed by its R-based or FORTRAN-based implementation, especially in modern computing environments with massive data. More fast and efficient implementation using new techniques, such as PyTorch 3.12 —based implementations, are necessary for the proposed RVC-SIS and RVC-ISIS algorithms, which is critical for projects like chronic thromboembolic pulmonary hypertension studies [30] with ultrahigh-dimensional multimodal medical data, including radiomics features, genomics data, laboratory omics data, and so on.

Second, this paper only considered the sure independence screening for additive models of continuous multivariates response. However, in clinical practice, such as the pulmonary thromboembolism study, the majority of clinical outcomes is of the binary or survival type. The proposed method and algorithm cannot be directly applied to either scenarios with multivariate, binary, or multilevel categorical outcomes or the scenarios with survival outcomes. However, it is relatively easy to adaptive the proposed RVC-SIS and RVC-ISIS to other types of outcomes using routine methods of related statistical fields.

Last but not the least, we do not investigate the performance of our proposed method for the skewed and heavy-tailed distributed data [31]. Further exploring the interactions between (among) predictors and detecting the corresponding ones is beyond the scope of this paper. These could be interesting topics for future research.

7. Conclusions

This paper investigated a nonparametric screening method for sparse, ultrahigh-dimensional additive models with multivariate responses. We proved that the RVC-SIS exhibits sure screening and ranking consistency properties under certain regularity conditions. In addition, we show, through extensive simulation study, that the proposed RVC-ISIS procedure effectively detects the truly important predictors even in the presence of complex correlations among predictors. The proposed method in this article has great potential for applications in developing nonlinear models with both ultrahigh-dimensional predictors and multivariate responses, especially in the chronic thromboembolic pulmonary hypertension study, where there exist multi-organ linkage responses, such as the right-sided heart failure in hearts, chronic thromboembolic pulmonary hypertension in lungs, and cerebral thrombosis in the brain.

Author Contributions

Methodology, Y.C. and B.L.; Software, Y.C.; Validation, B.L.; Formal analysis, Y.C. and B.L.; Investigation, B.L.; Resources, B.L.; Writing—original draft, Y.C. and B.L.; Writing—review & editing, B.L.; Funding acquisition, B.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by the Noncommunicable Chronic Diseases-National Science and Technology Major Project (No. 2023ZD0506600) and the Capital’s Funds for Health Improvement and Research (grant number: 2024-1G-4251).

Data Availability Statement

The original data presented in the study are publicly available in R package ‘spls’ (https://cran.r-project.org/web/packages/spls/index.html, accessed on 5 May 2025).

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A

Proof of Theorem 1.

We first demonstrate the consistency of the numerator of

{\hat{ω}}_{j}

. It is easy to show that

{\hat{V}}_{n} (Y, B_{j})

is a U statistic of order four: that is,

\begin{matrix} {\hat{V}}_{n} (Y, B_{j}) & = & \frac{1}{(\binom{n}{4})} \sum_{s < t < u < v} h (Z_{s}, Z_{t}, Z_{u}, Z_{v}), \end{matrix}

where

\begin{matrix} h (Z_{1}, Z_{2}, Z_{3}, Z_{4}) \\ = & \frac{1}{12} {(2 Y_{1}^{⊤} Y_{2} + 2 Y_{3}^{⊤} Y_{4} - Y_{1}^{⊤} Y_{3} - Y_{1}^{⊤} Y_{4} - Y_{2}^{⊤} Y_{3} - Y_{2}^{⊤} Y_{4}) (B_{j 1}^{⊤} B_{j 2} + B_{j 3}^{⊤} B_{j 4}) \\ + (2 Y_{1}^{⊤} Y_{3} + 2 Y_{2}^{⊤} Y_{4} - Y_{1}^{⊤} Y_{2} - Y_{1}^{⊤} Y_{4} - Y_{2}^{⊤} Y_{3} - Y_{3}^{⊤} Y_{4}) (B_{j 1}^{⊤} B_{j 3} + B_{j 2}^{⊤} B_{j 4}) \\ + (2 Y_{1}^{⊤} Y_{4} + 2 Y_{2}^{⊤} Y_{3} - Y_{1}^{⊤} Y_{2} - Y_{1}^{⊤} Y_{3} - Y_{2}^{⊤} Y_{4} - Y_{3}^{⊤} Y_{4}) (B_{j 1}^{⊤} B_{j 4} + B_{j 2}^{⊤} B_{j 3})} \end{matrix}

with

Z_{i} = (Y_{i}, B_{j i})

.

Rewrite the U-statistic

{\hat{V}}_{n} (Y, B_{j})

as follows:

\begin{matrix} {\hat{V}}_{n} (Y, B_{j}) & = & \frac{1}{(\binom{n}{4})} \sum_{s < t < u < v} h (Z_{s}, Z_{t}, Z_{u}, Z_{v}) 1 {| h (Z_{s}, Z_{t}, Z_{u}, Z_{v}) | \leq M} \\ + \frac{1}{(\binom{n}{4})} \sum_{s < t < u < v} h (Z_{s}, Z_{t}, Z_{u}, Z_{v}) 1 {| h (Z_{s}, Z_{t}, Z_{u}, Z_{v}) | > M} \\ = : & {\hat{V}}_{n 1} (Y, B_{j}) + {\hat{V}}_{n 2} (Y, B_{j}), \end{matrix}

where M will be specified later. Accordingly,

V (Y, B_{j})

can be decomposed into two parts:

\begin{matrix} V (Y, B_{j}) & = & E [h (Z_{s}, Z_{t}, Z_{u}, Z_{v}) 1 {| h (Z_{s}, Z_{t}, Z_{u}, Z_{v}) | \leq M}] \\ + E [h (Z_{s}, Z_{t}, Z_{u}, Z_{v}) 1 {| h (Z_{s}, Z_{t}, Z_{u}, Z_{v}) | > M}] \\ = : & V_{1} (Y, B_{j}) + V_{2} (Y, B_{j}) . \end{matrix}

Evidently,

{\hat{V}}_{n 1} (Y, B_{j})

and

{\hat{V}}_{n 2} (Y, B_{j})

are unbiased estimators of

V_{1} (Y, B_{j})

and

V_{2} (Y, B_{j})

, respectively.

First, we demonstrate the consistency of

{\hat{V}}_{n 1} (Y, B_{j})

. Note that

{\hat{V}}_{n 1} (Y, B_{j})

is a U-statistic of order four. We can therefore conclude, based on theorem 5.6.1.A of Serfling [32], that

Pr ({\hat{V}}_{n 1} (Y, B_{j}) - V_{1} (Y, B_{j}) \geq ε) \leq exp (- n ε^{2} / (8 M^{2}))

for any given

ε > 0

. In light of the symmetry of U statistics, it is straightforward to obtain

Pr (| {\hat{V}}_{n 1} (Y, B_{j}) - V_{1} (Y, B_{j}) | \geq ε) \leq 2 exp (- n ε^{2} / (8 M^{2})) .

(A1)

Next, we consider the consistency of

{\hat{V}}_{n 2} (Y, B_{j})

. Notice that

∥ B_{j k} ∥_{\infty} \leq 1

. Using the Cauchy–Schwarz inequality, we have

| h (Z_{1}, Z_{2}, Z_{3}, Z_{4}) | \leq d_{n} (∥ Y_{1} ∥^{2} + ∥ Y_{2} ∥^{2} + ∥ Y_{3} ∥^{2} + ∥ Y_{4} ∥^{2}) .

Together with Markov’s inequality, it is not dfficult to show that

{V_{2} (Y, B_{j})}^{2} \leq E {h^{2} (Z_{1}, Z_{2}, Z_{3}, Z_{4})} [E (exp (t^{'} ∥ Y_{1} ∥^{2} {))]}^{4} / exp (t^{'} M / d_{n})

for any

t^{'} > 0

. By choosing

M = C n^{γ} d_{n}

for

0 < γ < 1 / 2 - 2 κ

,

| V_{2} (Y, B_{j}) | \leq ε / 2

provided that n is sufficiently large and Conditions (C3)–(C4) hold. Meanwhile, we can confirm that the events satisfy

{| {\hat{V}}_{n 2} (Y, B_{j}) | \geq ε / 2} \subseteq {∥ Y_{i} ∥^{2} > M / (4 d_{n}), for some 1 \leq i \leq n}

. Otherwise, if

∥ Y_{i} ∥^{2} \leq M / (4 d_{n})

holds for all

1 \leq i \leq n

, then we can conclude that

| h (Z_{s}, Z_{t}, Z_{u}, Z_{v}) | \leq M

. Therefore

| {\hat{V}}_{n 2} (Y, B_{j}) | = 0

, which contradicts event

| {\hat{V}}_{n 2} (Y, B_{j}) | \geq ε / 2

. Under Conditions (C3)–(C4), for any

t > 0

, there exists a positive constant C such that

Pr (∥ Y_{i} ∥^{2} \geq M / (4 d_{n})) \leq C exp (- t M / (4 d_{n}))

. Consequently,

Pr (| {\hat{V}}_{n 2} (Y, B_{j}) | \geq ε / 2) \leq C n exp (- t M / (4 d_{n}))

. which implies that

Pr (| {\hat{V}}_{n 2} (Y, B_{j}) - V_{2} (Y, B_{j}) | \geq ε) \leq C n exp (- t M / (4 d_{n}))

(A2)

Recall that

M = C n^{γ} d_{n}

. By combining (A1) and (A2), we have

Pr (| {\hat{V}}_{n} (Y, B_{j}) - V (Y, B_{j}) | \geq ε) = O {exp (- c_{3} ε^{2} n^{1 - 2 γ} d_{n}^{- 2}) + n exp (- c_{4} n^{γ})}

(A3)

for some positive constants

c_{3}

and

c_{4}

.

Similarly, we can determine the convergence rate of the denominator. Furthermore, recalling Fact 3 in Fan et al. [7], we can show that

V (B_{j}, B_{j}) = O (d_{n}^{- 1})

. Hence, we obtain the convergence rate of

{\hat{ω}}_{j}

, which is similar to the form of (A3). To be precise,

Pr (| {\hat{ω}}_{j} - ω_{j} | \geq d_{n}^{1 / 2} ε) = O {exp (- c_{3} ε^{2} n^{1 - 2 γ} d_{n}^{- 2}) + n exp (- c_{4} n^{γ})}

. Let

ε = c_{2} n^{- 2 κ}

, where any

c_{2} > 0

and

κ

satisfies

0 < 2 κ + γ < 1 / 2

; we have

\begin{matrix} Pr (max_{1 \leq j \leq p} | {\hat{ω}}_{j} - ω_{j} | \geq c_{2} d_{n}^{1 / 2} n^{- 2 κ}) \\ \leq & p max_{1 \leq j \leq p} Pr (| {\hat{ω}}_{j} - ω_{j} | \geq c_{2} d_{n}^{1 / 2} n^{- 2 κ}) \\ \leq & O (p [exp {- c_{3} n^{1 - 4 κ - 2 γ} d_{n}^{- 2}} + n exp (- c_{4} n^{γ})]) . \end{matrix}

To deal with the second part, we first show that

{min}_{j \in A} ω_{j} \geq 2 c d_{n}^{1 / 2} n^{- 2 κ}

. By invoking Condition (C5), for any

j \in A

, there exists

l \in {1, \dots, q}

such that

E [{E (Y_{l} | X_{j})}^{2}] \geq c_{1} d_{n} n^{- 2 κ}

. Under Conditions (C1), (C2), and (C6), we can conclude from Lemma 1 and Fact 3 in Fan et al. [7] that

V (Y, B_{j}) \geq c_{1} D_{1} ξ n^{- 2 κ}

, which implies that

ω_{j} \geq 2 c d_{n}^{1 / 2} n^{- 2 κ}

provided that

0 < c < c_{1} D_{1} ξ / \sqrt{D_{2} V (Y, Y)} / 2

. Let us introduce the event

ζ_{n} = {{max}_{j \in A} | {\hat{ω}}_{j} - ω_{j} | \leq c d_{n}^{1 / 2} n^{- 2 κ}}

. It is worth noting that

ζ_{n} \subseteq {A \subseteq \hat{A}}

. Therefore,

\begin{matrix} Pr (A \subseteq \hat{A}) & \geq & Pr (ζ_{n}) = 1 - Pr (ζ_{n}^{c}) \\ \geq & 1 - O (s_{n} [exp {- c_{3} n^{1 - 4 κ - 2 γ} d_{n}^{- 2}} + n exp (- c_{4} n^{γ})]) . \end{matrix}

where the last inequality follows from (A3). □

Proof of Theorem 2.

Recall the definition of

A

; it is clear that

ω_{j} > 0

for

j \in A

and

ω_{j} = 0

for

j \in A^{c}

. Under Conditions (C1), (C2), (C5), and (C6), we can show that

{min}_{j \in A} ω_{j} \geq 2 c d_{n}^{1 / 2} n^{- 2 κ}

. Then, it is straightforward to verify

\begin{matrix} Pr (min_{j \in A} {\hat{ω}}_{j} \leq max_{j \in A^{c}} {\hat{ω}}_{j}) \\ \leq & Pr {(max_{j \in A^{c}} {\hat{ω}}_{j} - max_{j \in A^{c}} ω_{j}) - (min_{j \in A} {\hat{ω}}_{j} - min_{j \in A} ω_{j}) \geq 2 c d_{n}^{1 / 2} n^{- 2 κ}} \\ \leq & Pr {max_{j \in A^{c}} | {\hat{ω}}_{j} - ω_{j} | + max_{j \in A} | {\hat{ω}}_{j} - ω_{j} | \geq 2 c d_{n}^{1 / 2} n^{- 2 κ}} \\ \leq & Pr {max_{1 \leq j \leq p} | {\hat{ω}}_{j} - ω_{j} | \geq c d_{n}^{1 / 2} n^{- 2 κ}} . \end{matrix}

Using the result of Theorem 1 and Fatou’s Lemma, we can derive

\begin{matrix} Pr {\underset{n \to \infty}{lim inf} (min_{j \in A} {\hat{ω}}_{j} - max_{j \in A^{c}} {\hat{ω}}_{j}) \leq 0} \\ \leq & lim_{n \to \infty} Pr (min_{j \in A} {\hat{ω}}_{j} \leq max_{j \in A^{c}} {\hat{ω}}_{j}) = 0 . \end{matrix}

The proof is now completed. □

References

Fan, J.; Lv, J. Sure independence screening for ultrahigh-dimensional feature space. J. R. Stat. Soc. Ser. (Stat. Methodol.) 2008, 70, 849–911. [Google Scholar] [CrossRef]
Fan, J.; Samworth, R.; Wu, Y. Ultrahigh-dimensional feature selection: Beyond the linear model. J. Mach. Learn. Res. 2009, 10, 2013–2038. [Google Scholar] [PubMed]
Li, X.; Cheng, G.; Wang, L.; Lai, P.; Song, F. Ultrahigh-dimensional feature screening via projection. Comput. Stats Data Anal. 2017, 114, 88–104. [Google Scholar] [CrossRef]
Liu, S.; Li, X.; Zhang, J. Ultrahigh-dimensional feature screening for additive model with multivariate response. J. Stat. Comput. Simul. 2020, 90, 775–799. [Google Scholar] [CrossRef]
Fan, J.; Song, R. Sure independence screening in generalized linear models with NP-dimensionality. Ann. Stat. 2010, 38, 3567–3604. [Google Scholar] [CrossRef]
Zhu, L.P.; Li, L.; Li, R.; Zhu, L.X. Model-free feature screening for ultrahigh-dimensional data. J. Am. Stat. Assoc. 2011, 106, 1464–1475. [Google Scholar] [CrossRef]
Fan, J.; Feng, Y.; Song, R. Nonparametric independence screening in sparse ultra-high-dimensional additive models. J. Am. Stat. Assoc. 2011, 106, 544–557. [Google Scholar] [CrossRef]
Yuan, Y.; Billor, N. Sure independent screening for functional regression model. Commun. Stat.-Simul. Comput. 2024, 1–20. [Google Scholar] [CrossRef]
Cui, H.; Zou, F.; Ling, L. Feature screening and error variance estimation for ultrahigh-dimensional linear model with measurement errors. Commun. Math. Stat. 2025, 13, 139–171. [Google Scholar] [CrossRef]
Li, R.; Zhong, W.; Zhu, L. Feature screening via distance correlation learning. J. Am. Stat. Assoc. 2012, 107, 1129–1139. [Google Scholar] [CrossRef]
Shao, X.; Zhang, J. Martingale difference correlation and its use in high-dimensional variable screening. J. Am. Stat. Assoc. 2014, 109, 1302–1318. [Google Scholar] [CrossRef]
Cui, H.; Li, R.; Zhong, W. Model-free feature screening for ultrahigh-dimensional discriminant analysis. J. Am. Stat. Assoc. 2015, 110, 630–641. [Google Scholar] [CrossRef]
Chen, S.; Lu, J. Quantile-Composited feature screening for ultrahigh-dimensional data. Mathematics 2023, 11, 2398. [Google Scholar] [CrossRef]
Liu, R.; Deng, G.; He, H. Generalized Jaccard feature screening for ultra-high dimensional survival data. AIMS Math. 2024, 9, 27607–27626. [Google Scholar] [CrossRef]
Sang, Y.; Dang, X. Grouped feature screening for ultrahigh-dimensional classification via Gini distance correlation. J. Multivar. Anal. 2024, 204, 105360. [Google Scholar] [CrossRef]
Wu, J.; Cui, H. Model-free feature screening based on Hellinger distance for ultrahigh-dimensional data. Stat. Pap. 2024, 65, 5903–5930. [Google Scholar] [CrossRef]
Zhong, W.; Li, Z.; Guo, W.; Cui, H. Semi-distance correlation and its applications. J. Am. Stat. Assoc. 2024, 119, 2919–2933. [Google Scholar] [CrossRef]
Tian, Z.; Lai, T.; Zhang, Z. Variation of conditional mean and its application in ultrahigh-dimensional feature screening. Commun. Stat.-Theory Methods 2025, 54, 352–382. [Google Scholar] [CrossRef]
Li, L.; Ke, C.; Yin, X.; Yu, Z. Generalized martingale difference divergence: Detecting conditional mean independence with applications in variable screening. Comput. Stat. Data Anal. 2023, 180, 107618. [Google Scholar] [CrossRef]
Liu, L.; Lian, H.; Huang, J. More efficient estimation of multivariate additive models based on tensor decomposition and penalization. computational statistics and data analysis. J. Mach. Learn. Res. 2024, 25, 1–27. [Google Scholar]
Desai, N.; Baladandayuthapani, V.; Shinohara, R.; Morris, J. Covariance Assisted Multivariate Penalized Additive Regression (CoMPAdRe). J. Comput. Graph. Stat. 2024, 1–10. [Google Scholar] [CrossRef]
Escoufier, Y. Le traitement des variables vectorielles. Biometrics 1973, 29, 751–760. [Google Scholar] [CrossRef]
Zhong, W.; Zhu, L. An iterative approach to distance correlation-based sure independence screening. J. Stat. Comput. Simul. 2015, 85, 2331–2345. [Google Scholar] [CrossRef]
Liu, J.; Li, R.; Wu, R. Feature selection for varying coefficient models with ultrahigh-dimensional covariates. J. Am. Stat. Assoc. 2014, 109, 266–274. [Google Scholar] [CrossRef]
Spellman, P.T.; Sherlock, G.; Zhang, M.Q.; Iyer, V.R.; Anders, K.; Eisen, M.B.; Brown, P.O.; Botstein, D.; Futcher, B. Comprehensive identification of cell cycle–regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization. Mol. Biol. Cell 1998, 9, 3273–3297. [Google Scholar] [CrossRef]
Lee, T.I.; Rinaldi, N.J.; Robert, F.; Odom, D.T.; Bar-Joseph, Z.; Gerber, G.K.; Hannett, N.M.; Harbison, C.T.; Thompson, C.M.; Young, R.A.; et al. Transcriptional regulatory networks in Saccharomyces cerevisiae. Science 2002, 298, 799–804. [Google Scholar] [CrossRef] [PubMed]
Chun, H.; Keles, S. Sparse partial least squares regression for simultaneous dimension reduction and variable selection. J. R. Stat. Soc. Ser. (Stat. Methodol.) 2010, 72, 3–25. [Google Scholar] [CrossRef]
Wang, L.; Chen, G.; Li, H. Group SCAD regression analysis for microarray time course gene expression data. Bioinformatics 2007, 23, 1486–1494. [Google Scholar] [CrossRef]
Muthyala, M.; Sorourifar, F.; Paulson, J.A. TorchSISSO: A PyTorch-based implementation of the sure independence screening and sparsifying operator for efficient and interpretable model discovery. Digit. Chem. Eng. 2024, 13, 100198. [Google Scholar] [CrossRef]
Xie, W.; Yu, Y.; Huang, Q.; Yan, X.; Yang, Y.; Xiong, C.; Liu, Z.; Wan, J.; Gong, S.; Wang, L.; et al. Epidemiology and management patterns of chronic thromboembolic pulmonary hypertension in China: A systematic literature review and meta-analysis. Chin. Med J. 2025, 138, 1000–1002. [Google Scholar] [CrossRef]
Song, Y.; Zhou, W.; Zhou, W.X. Large-scale inference of multivariate regression for heavy-tailed and asymmetric data. Stat. Sin. 2023, 33, 1831–1852. [Google Scholar] [CrossRef]
Serfling, R.J. Approximation Theorems of Mathematical Statistics; John Wiley & Sons: Hoboken, NJ, USA, 1980. [Google Scholar]

Figure 1. The number of confirmed TFs being selected by different screening methods, obtained from 100 random data partitions.

Figure 2. The mean squared errors of prediction based on 100 random partitions of the data.

Table 1. The average rank

R_{j}

of the true predictor

X_{j}

for Examples 1–4.

Table 1. The average rank

R_{j}

of the true predictor

X_{j}

for Examples 1–4.

	Method	$R_{1}$	$R_{2}$	$R_{3}$	$R_{4}$	$R_{5}$
Example 1	Naive-SIS1	4.108	2.48	2.002	2.488	4.056
	Naive-ISIS1	3.856	2.742	2.202	2.342	3.878
	Naive-SIS2	3.71	2.726	2.408	2.722	3.634
	Naive-ISIS2	4.236	3.056	2.408	2.546	2.886
	PS	2.94	2.998	2.99	2.988	3.084
	IPS	3.656	1.016	1.986	3.258	3.798
	DC-SIS	4.53	2.442	1.862	2.464	4.154
	DC-ISIS	4.38	2.347	2.036	2.898	3.956
	GCPS	4.653	2.462	1.362	2.474	4.468
	IGCPS	4.038	2.847	1.362	2.998	4.854
	RVC-SIS	4.156	2.45	1.898	2.494	4.114
	RVC-ISIS	3.832	2.55	2.134	2.256	4.234
Example 2	Naive-SIS1	1006.022	617.534	1709.544	1774.618	-
	Naive-ISIS1	927.04	465.648	107.872	48.554	-
	Naive-SIS2	1078.528	654.678	1737.142	1800.202	-
	Naive-ISIS2	1032.746	509.788	149.616	41.562	-
	PS	1076.362	865.076	734.618	570.108	-
	IPS	1000.356	156.824	9.482	5.864	-
	DC-SIS	125.704	385.02	2.012	1	-
	DC-ISIS	19.254	14.596	2.316	1	-
	GCPS	1.035	677.842	345.654	887.356	-
	IGCPS	1.035	28.647	77.336	96.256	-
	RVC-SIS	8.618	8.12	2.232	1	-
	RVC-ISIS	7.756	4.61	2.536	1	-
Example 3	Naive-SIS1	2.746	2.248	1.028	1002.92	-
	Naive-ISIS1	4.232	2.756	1.028	5.376	-
	Naive-SIS2	2.646	2.358	1.052	1003.12	-
	Naive-ISIS2	3.836	3.112	1.052	5.332	-
	PS	1.556	2.86	1.584	1014.08	-
	IPS	1.556	3.948	1.596	5.386	-
	DC-SIS	2.706	2.284	1.028	982.866	-
	DC-ISIS	4.112	2.878	1.028	5	-
	GCPS	2.732	2.342	1.126	856.562	-
	IGCPS	4.126	2.964	1.126	5.051	-
	RVC-SIS	2.734	2.236	1.032	965.084	-
	RVC-ISIS	4.216	2.758	1.032	5	-
Example 4	Naive-SIS1	2.908	2.306	1.026	1010.838	12.572
	Naive-ISIS1	4.208	2.806	1.026	41.296	6.024
	Naive-SIS2	2.668	2.452	1.036	1011.006	12.306
	Naive-ISIS2	3.796	3.224	1.036	56.808	6.044
	PS	1.552	2.862	1.586	977.962	33.456
	IPS	1.552	3.904	1.586	72.238	5.638
	DC-SIS	2.894	2.316	1.032	968.662	22.386
	DC-ISIS	4.068	2.926	1.032	3.000	6.016
	GCPS	2.825	2.212	1.262	1126.124	36.746
	IGCPS	4.534	2.647	1.262	3.898	8.956
	RVC-SIS	3.054	2.264	1.044	989.44	22.552
	RVC-ISIS	4.134	2.838	1.044	3.000	6.106

NOTE: “-” denotes that there is no corresponding content because of the difference in

M

.

Table 2. The selecting rate

P_{j}

of the true predictors and

P_{a}

for Examples 1–4.

Table 2. The selecting rate

P_{j}

of the true predictors and

P_{a}

for Examples 1–4.

	Method	$P_{1}$	$P_{2}$	$P_{3}$	$P_{4}$	$P_{5}$	$P_{a}$
Example 1	Naive-SIS1	1	1	1	1	1	1
	Naive-ISIS1	1	1	1	1	1	1
	Naive-SIS2	1	1	1	1	1	1
	Naive-ISIS2	1	1	1	1	1	1
	PS	1	1	1	1	1	1
	IPS	1	1	1	1	1	1
	DC-SIS	1	1	1	1	1	1
	DC-ISIS	1	1	1	1	1	1
	GCPS	1	1	1	1	1	1
	IGCPS	1	1	1	1	1	1
	RVC-SIS	1	1	1	1	1	1
	RVC-ISIS	1	1	1	1	1	1
Example 2	Naive-SIS1	0.026	0.168	0	0	-	0
	Naive-ISIS1	0.054	0.234	0.798	0.916	-	0.015
	Naive-SIS2	0.028	0.176	0	0	-	0
	Naive-ISIS2	0.055	0.220	0.702	0.916	-	0.02
	PS	0	0.070	0.110	0.130	-	0
	IPS	0.022	$0.752$	1	1	-	0.016
	DC-SIS	0.628	0.268	1	1	-	0.158
	DC-ISIS	0.986	0.998	1	1	-	0.984
	GCPS	1	0.156	0.278	0.002	-	0
	IGCPS	1	0.996	0.883	0.824	-	0.648
	RVC-SIS	0.964	0.982	1	1	-	0.966
	RVC-ISIS	0.988	0.996	1	1	-	0.986
Example 3	Naive-SIS1	1	1	1	0	-	0
	Naive-ISIS1	1	1	1	1	-	1
	Naive-SIS2	1	1	1	0	-	0
	Naive-ISIS2	1	1	1	1	-	1
	PS	1	1	1	0	-	0
	IPS	1	1	1	1	-	1
	DC-SIS	1	1	1	0	-	0
	DC-ISIS	1	1	1	1	-	1
	GCPS	1	1	1	0	-	0
	IGCPS	1	1	1	1	-	1
	RVC-SIS	1	1	1	0	-	0
	RVC-ISIS	1	1	1	1	-	1
Example 4	Naive-SIS1	1	1	1	0	1	0
	Naive-ISIS1	1	1	1	1	1	1
	Naive-SIS2	1	1	1	0	1	0
	Naive-ISIS2	1	1	1	1	1	1
	PS	1	1	1	0	1	0
	IPS	1	1	1	1	1	1
	DC-SIS	1	1	1	0	1	0
	DC-ISIS	1	1	1	1	1	1
	GCPS	1	1	1	0	1	0
	IGCPS	1	1	1	1	1	1
	RVC-SIS	1	1	1	0	1	0
	RVC-ISIS	1	1	1	1	1	1

Table 3. The quantiles of

M = {max}_{j \in A} R_{j}

for Examples 1–4.

Table 3. The quantiles of

M = {max}_{j \in A} R_{j}

for Examples 1–4.

	Method	$5 %$	$25 %$	$50 %$	$75 %$	$95 %$
Example 1	Naive-SIS1	5	5	5	5	6
	Naive-ISIS1	5	5	5	5	6
	Naive-SIS2	5	5	5	5	6
	Naive-ISIS2	5	5	5	5	6
	PS	5	5	5	5	5
	IPS	5	5	5	5	5
	DC-SIS	5	5	5	5	5
	DC-ISIS	5	5	5	5	5
	GCPS	5	5	5	5	5
	IGCPS	5	5	5	5	5
	RVC-SIS	5	5	5	5	6
	RVC-ISIS	5	5	5	5	6
Example 2	Naive-SIS1	1668.45	1990.75	1969	1994	2000
	Naive-ISIS1	277.9	669	1015	1407.25	1887.05
	Naive-SIS2	1713.4	1913	1975	1995	2000
	Naive-ISIS2	261.25	724.95	1069	1466.5	1846.1
	PS	811.95	1215.25	1488.5	1739.25	1957.15
	IPS	122	512.75	982	1476.5	1940.4
	DC-SIS	22.95	122.75	315.5	653.5	1317.2
	DC-ISIS	5	5	9	12	40.1
	GCPS	280.56	634.85	948.32	1203.46	1568.12
	IGCPS	20.86	38.95	57.78	286.96	500.85
	RVC-SIS	4	4	4	7	46
	RVC-ISIS	4	4	4	7	38
Example 3	Naive-SIS1	430.65	628.5	887.5	1326.25	1879.55
	Naive-ISIS1	5	5	5	5	5
	Naive-SIS2	444.25	639	903.5	1343.5	1878.15
	Naive-ISIS2	5	5	5	5	5
	PS	458.85	682.75	939	1325.75	1829.3
	IPS	4	4	4	5	5
	DC-SIS	337.75	590	871	1389.25	1828.3
	DC-ISIS	5	5	5	5	5
	GCPS	268.65	457.54	671.88	999.58	1509.98
	IGCPS	5	5	5	5	5
	RVC-SIS	284.65	596.75	880	1299.25	1811.35
	RVC-ISIS	5	5	5	5	5
Example 4	Naive-SIS1	426.8	639.5	903	1394	1840.2
	Naive-ISIS1	6	6	6	6	6
	Naive-SIS2	421.9	641.25	890.5	1368.25	1835.35
	Naive-ISIS2	6	6	6	6	6
	PS	373.85	676.75	935	1253.25	1809.25
	IPS	5	5	5.5	6	7
	DC-SIS	304.9	609.75	922	1324.5	1809.05
	DC-ISIS	6	6	6	6	6
	GCPS	298.25	807.75	1130.42	1628.75	1811.50
	IGCPS	6	6	6	6	6
	RVC-SIS	258.95	572.5	934.5	1424.25	1876.15
	RVC-ISIS	6	6	6	6	6

Table 4. Averaged mean squared errors (AMSEs) of the prediction over 100 random data partitions and the corresponding standard deviations of the screening procedures.

Method	AMSE	Standard Deviation
Naive-SIS1	0.1997	0.0281
Naive-ISIS1	0.1959	0.0286
Naive-SIS2	0.1991	0.0278
Naive-ISIS2	0.1878	0.0288
PS	0.1898	0.0652
IPS	0.1749	0.0604
DC-SIS	0.1822	0.0124
DC-ISIS	0.1772	0.0126
GCPS	0.1668	0.0223
IGCPS	0.1629	0.0221
RVC-SIS	0.1601	0.0136
RVC-ISIS	0.1422	0.0134

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Chen, Y.; Liang, B. Sure Independence Screening for Ultrahigh-Dimensional Additive Model with Multivariate Response. Mathematics 2025, 13, 1558. https://doi.org/10.3390/math13101558

AMA Style

Chen Y, Liang B. Sure Independence Screening for Ultrahigh-Dimensional Additive Model with Multivariate Response. Mathematics. 2025; 13(10):1558. https://doi.org/10.3390/math13101558

Chicago/Turabian Style

Chen, Yongshuai, and Baosheng Liang. 2025. "Sure Independence Screening for Ultrahigh-Dimensional Additive Model with Multivariate Response" Mathematics 13, no. 10: 1558. https://doi.org/10.3390/math13101558

APA Style

Chen, Y., & Liang, B. (2025). Sure Independence Screening for Ultrahigh-Dimensional Additive Model with Multivariate Response. Mathematics, 13(10), 1558. https://doi.org/10.3390/math13101558

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Sure Independence Screening for Ultrahigh-Dimensional Additive Model with Multivariate Response

Abstract

1. Introduction

2. Sure Independence Screening Using Random Vector Correlation

3. Iterative Sure Independence Screening Using Random Vector Correlation

4. Numerical Studies

5. Application

6. Discussion

7. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Appendix A

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI