Abstract
High-dimensional regression with multivariate responses poses significant challenges when data are collected across multiple platforms, each with potentially correlated outcomes. In this paper, we introduce a multi-platform multivariate high-dimensional linear regression (MM-HLR) model for simultaneously modeling within-platform correlation and cross-platform information fusion. Our approach incorporates a mixture of Lasso and group Lasso penalties to promote both individual predictor sparsity and cross-platform group sparsity, thereby enhancing interpretability and estimation stability. We develop an efficient computational algorithm based on iteratively reweighted least squares and block coordinate descent to solve the resulting regularized optimization problem. We establish theoretical guarantees for our estimator, including oracle bounds on prediction error, estimation accuracy, and support recovery under mild conditions. Our simulation studies confirm the method’s strong empirical performance, demonstrating low bias, small variance, and robustness across various dimensions. The analysis of real financial data further validates the performance gains achieved by incorporating multivariate responses and integrating data across multiple platforms.
1. Introduction
As data complexity continues to expand in the big-data era, modern datasets increasingly display heterogeneity, high dimensionality, and multiple sources. A common example arises when experiments with the same scientific objective are conducted across different platforms or environments [1,2,3,4]. Modeling each dataset individually may result in failing to capture their intrinsic connections, which stem from their shared goal. This motivates the development of methods capable of integrating multi-platform data for unified, simultaneous analysis.
Data integration provides a timely solution by leveraging multiple data sources to enable more robust and efficient statistical inference than relying on any single source alone [5]. In a systematic review, Ref. [5] surveyed integration methods for combining probability samples with non-probability samples and big data sources (see also the references therein). Within the regression framework, a series of studies have advanced data integration methodologies. For instance, Ref. [1] introduced a pseudolikelihood information criterion for high-dimensional multi-experiment data with mixed response types and varying predictor measurements, establishing selection consistency even with unbounded model size and demonstrating through simulations that data integration substantially outperforms single-source analysis. Building on this work, Ref. [6] implemented the FusionLearn R package, which provides a fusion learning algorithm for cross-platform data analysis. Ref. [7] extended the framework to multi-platform data with sub-Gaussian or sub-exponential errors, developing a consistent model selection criterion based on composite likelihood and Bayesian posterior probabilities to recover the union support of predictors under diverging model dimensions. Meanwhile, Ref. [4] addressed multi-task feature learning with mixed continuous and discrete responses using a mixed -regularized composite quasi-likelihood function. In a related vein, Ref. [8] proposed a quantile regression approach for high-dimensional multi-source data exhibiting heterogeneity and heavy-tailed error distributions, providing both theoretical guarantees and practical advantages in model recovery.
However, much of the existing literature focuses primarily on univariate response modeling within a single laboratory or experiment. In many modern applications, response variables are multivariate and correlated, yet share the same set of high-dimensional covariates or predictors [9]. For instance, in the UK Biobank population-based cohort study, researchers face large-scale, ultrahigh-dimensional features alongside a wide array of correlated phenotypic outcomes, including lifestyle measures, biomarkers, and disease diagnoses [10]. Such data structures have motivated extensive methodological developments in high-dimensional multivariate regression. Seminal work includes remMap by [11], designed for multivariate response regression in high-dimension–low-sample-size settings. Ref. [9] proposed a blockwise descent algorithm for group-penalized multi-response regression. Further advancing the field, Ref. [12] introduced a regularization method that enhances variable selection by efficiently eliminating irrelevant blocks of regression coefficients. More recently, Ref. [10] developed a scalable sparse reduced-rank regression method for high-dimensional multi-task learning with correlated outcomes. Ref. [13] proposed a novel framework suited for settings with large numbers of responses, response categories, and predictors. Additionally, several quantile regression approaches have been proposed to handle multiple responses under non-Gaussian or heterogeneous error settings, like [14,15], among others.
Nevertheless, relatively few works have simultaneously addressed the complexities of multivariate responses across multiple data sources. We thus propose a multi-platform multivariate high-dimensional linear regression (MM-HLR) method, designed to jointly model within-platform correlation while promoting cross-platform group sparsity. Structured or group sparsity has been well studied recently. Zhang et al. [16] presented a probabilistic framework for subset selection under partition constraints, which aligns with the group-sparsity and cross-platform fusion objectives of our study. In addition, Li et al. [12] proposed a penalized multivariate multiple linear regression model with an arbitrary group structure for the regression coefficient matrix. Besides, the mixed penalty has similarly been employed for multitask feature learning and selection [4,17]. Kawano et al. [18] proposed a multivariate regression modeling in integrative analysis that performed group selection by group lasso estimation. In this article, we extended the mixture of and penalty formulation from [12] to simultaneously enforce individual sparsity and cross-platform group sparsity in our multi-platform setting. An efficient optimization framework, combining block coordinate descent with a proxy approximation strategy, is introduced to solve the resulting non-convex regularized problem. Theoretically, we establish non-asymptotic bounds on the prediction error, estimation accuracy, and support recovery of the MM-HLR estimator. Empirically, we evaluate the proposed method under varying sample sizes and predictor dimensionalities, benchmarking it against two alternative approaches: (i) FusionLearn [6], which is designed for multi-platform integration but treats responses as univariate, thereby ignoring within-platform correlation structures, and (ii) multivariate Group Lasso [19], which is designed for multivariate regression within a single platform and thus overlooks cross-platform grouping. Comprehensive simulation studies are conducted to demonstrate the method’s performance across diverse data-generating scenarios. Finally, we validate the proposed method through a real-world financial data analysis to assess the gains of jointly modeling multivariate responses and integrating data across multiple platforms.
The remainder of this article is organized as follows: Section 2 introduces the model framework and parameter estimation. Section 3 establishes the theoretical guarantees of the proposed estimator. Section 4 presents simulation studies to evaluate the performance of the method. We provide a real data analysis in Section 5. Finally, Section 6 concludes this article.
2. Methodology
Let and be two matrices. denotes that is positive semidefinite. denotes the Frobenius norm of . In addition, and denote the sum and the maximum of the absolute values of all entries of . Let be a vector. We denote its and norms by and , respectively.
2.1. Model Setup
Assume we have n objects and aim to model the linear relationship between m responses and p predictors for each object. To enhance measurement accuracy, each object is sent to k different platforms, each of which returns two matrices, a response matrix and a design matrix , . Due to variations in equipment and measurement protocols across platforms, the scale and continuity of the results may differ [1,6]. Our goal is to identify the common set of influential predictors that affect the responses across all platforms. Moreover, because multivariate responses are measured on the same objects, there exist unknown correlations between responses within the platform and across platforms. Key terminology is summarized in Table 1.
Table 1.
Data structure of response matrix , design matrix and parameter matrix for platform i, where and denote the j-th observed value of response s and predictor r on platform i, respectively. The corresponding regression coefficient of predictor r on response s is , .
From the linearity relationship between responses and predictors, we have the following:
where , , are respectively the response matrix, design matrix and regression parameter matrix from ith platform. is the error matrix, where the rows of are identically independent from , with being the within-platform error covariance matrix.
To simplify the modeling complexity, we treat each platform independently within the likelihood function. Accordingly, we adopt a marginal composite likelihood approach to integrate information across platforms [7,20,21], i.e., the following is calculated:
where , and denotes the multivariate normal density function of , that is,
The log-likelihood function can be given by the following (excluding the constant):
Following [1], the objective of this study is to recover a union subset of predictors, where each selected predictor is associated with at least one outcome across the platforms. To identify these commonly influential predictors, we enforce the constraint that the set of nonzero rows of is identical for all platforms . Stacking the r-th row vector of all s, we obtain the regression coefficient matrix for predictor r across all m responses and k platforms, i.e., written as follows:
where represents the r-th row of . We assume that the sparsity among k platforms is the same with respect to the predictors, that is, are either all zero or all nonzero for each . To select the influential predictors among all platforms, we add the group lasso penalty into the log-likelihood function, i.e., written as follows:
where is the Frobenius norm (square root of sum of squares of all entries). Assuming the penalty function is the mixture of and norms, then a sparse estimate of (denoted as ) can be obtained by solving the minimization problem as follows:
Denote and . The first penalty encourages sparsity (individual coefficients to zero), and the second penalty encourages group sparsity for each predictor to be zero across all platforms.
2.2. Parameter Estimation
We use an iteratively reweighted least squares approach and block coordinate descent per predictor . For simplicity, we assume that the element-wise regularization parameters for the Lasso penalty are identical, . In practice, the covariance matrix among responses is typically unknown for each platform . A simple initial estimate, such as the empirical covariance of the model residuals, can be employed. The following algorithm is then implemented conditional on these estimated s. Denote . Then the optimization problem (6) becomes the following:
Rewrite . For predictor r, by holding fixed for , we define residuals without predictor r for platform i, . Then the first gradient of f, with respect to , is , where is the column j of . Denote the following:
By coordinate descent algorithm, we estimate by fixing the rest. Let be fixed for , then problem (7) can be transformed to be a minimization sub-problem for predictor across k platforms, i.e., the following is calculated:
Next, we will show how to solve the sub-problem (9). Define the element-wise soft-thresholding , where , a soft-thresholding operator. In light of the block soft-thresholding solution of the group Lasso in [22], the predictor r is set to be inactive, across all platforms if the following is met:
Otherwise, the predictor r is selected to be active, . If so, we can update via a block soft-thresholding operator, as shown in [23]. Given our objective to select a union subset of active predictors across all platforms, an active predictor r may be relevant to platform i or other platforms. We therefore consider the following two scenarios for updating .
Assume that the predictor r in all platforms except platform i is inactive, that is, for Then the group Lasso norm reduces to the following: and the sub-problem (9) simplifies to become
where the objective function is a sum of a smooth quadratic term and a composite nonsmooth penalty . Its proximal mapping consists of sequential soft-thresholding followed by block shrinkage. Then the minimizer of the row-wise objective with respect to is given by the following:
Assume that the predictor r is active across all platforms, we conduct the closed-form updating for the nonzero group r as follows. A solution to problem (9) is a minimizer of the following:
Take the first derivative of with respect to and let them be zero, i.e., as follows:
We rewrite the above equation to group the terms linear in , written as follows:
which yields a closed-form of , i.e.,
2.3. Algorithm
We estimate via a nested alternating minimization procedure that consists of an outer loop updating covariance matrices and an inner loop performing row-wise block coordinate descent on regression coefficients with sparse group penalties. For the -th outer loop, , we perform a fixed number of inner loops to update the regression coefficients while keeping the covariance matrices fixed. When the inner loop arrives at the maximum iteration, we update , followed by the updating of and recording the residuals. Convergence of the algorithm is then assessed at the outer-iteration level based on changes in and . Each outer iteration, therefore, consists of a full inner loop for coefficient updates, followed by updating the regression covariance and convergence evaluation. The details are as follows:
- Step 1:
- Initialization and for each platform .
- Step 2:
- (Outer loop) Update across all platforms given at t-th iteration.Substep 2.1: (Inner loop) Initialization: for predictor , , , where is the r-th row of .Substep 2.2: Update across all platforms given and . Perform row-wise group screening. Let if . Otherwise, predictor r is added to the active row set. For each active predictor r, we conduct row-wise coefficient updating, i.e., calculated as follows:if . If not, then the following is calculated:Substep 2.3: For each platform i, , where, and the following:Substep 2.4: Repeat substeps 2.2–2.3 until it reaches the maximum inner iterations .
- Step 3:
- Update via
- Step 4:
- Repeat steps 2–3 until it meets the termination conditions, i.e., the following:
where is a small number, say, 1 × 10−4.
We remark that, for initialization, we set , , for . Tuning parameters are typically selected by evaluating a candidate set of values via cross-validation or an information criterion like the adjusted BIC. To enhance computational efficiency and avoid a costly two-dimensional grid search, we adopt a practical simplification by fixing the relationship between and , for instance, by setting or . Consequently, we select the optimal via two-fold cross-validation from a common empirical candidate set for , such as {0.1, 0.2, ..., 2.0 } with an increment of 0.1 or smaller. The range and granularity of the candidate set can be adapted to the specific application.
3. Theoretical Property
The theoretical property is in light of [12], where their theories are built under the framework of multivariate linear regression. Herein, we extend them to multiple platforms. For platform , denote the index set of nonzero elements in , and the index set of nonzero rows in . We assume that the sparsity of all is the same. Define and . For any matrix and , denote as the projection of on the index set , which is a matrix with the same elements of on coordinates and zeros on the complementary coordinates . Denote , where is a matrix with the rth row same as and zeros on the other rows.
Let be the true regression coefficient matrices in model (1), and be its estimated counterpart for . Assume each column of the random error matrix follows a multivariate normal distribution . Denote and . The theoretical framework is built on a given covariance matrix for each platform i. In practice, is usually unknown. So we instead use an estimator . Before proceeding, we impose mild conditions on both design matrices and covariance matrices for all platforms.
Assumption 1.
Assume that the columns of are centered such that the diagonal elements of the matrix are equal to 1 for all . Let be the largest eigenvalue of for all .
Assumption 2.
For a given , assume that there exists a , such that for all . There exist positive constants such that , and the eigenvalues of are less than and larger than for all .
Assumption 3.
Let and be any index sets that satisfy . Let be a set of positive numbers. For any nontrivial matrices , if it satisfies the following:
where is built by the rth rows of for all , then the following minimums exist and are positive, i.e., the following:
We remark that these constants involved in assumptions represent standard regularity conditions in statistical theory. While their specific values are derived from the proofs and are not tuned in practice, their existence is required to establish the desired theoretical guarantees. Furthermore, we define , for , and for some constant . Rewrite and .
Theorem 1.
Under Assumptions 1–3, the following oracle bounds for the prediction error, the estimation error, and the order of sparsity hold, with probability at least , i.e., written as follows:
The proof of Theorem 1 is given in the Appendix A. The three inequalities (25)–(27) guarantee the performance of the estimator in terms of prediction accuracy, estimation error, and sparsity. The first inequality (25) bounds the weighted mean squared prediction error across all k platforms. The bound scales with and depends on the sparsity and grouping structures. The second inequality (26) controls the cumulative -norm of from across all platforms. This bound grows slowly with , typical of high-dimensional settings, and depends similarly on the same sparsity and grouping structures. The third inequality (27) bounds , a measure of the model complexity or effective sparsity of the estimator. This ensures that the estimated model is not overly dense and depends on the maximum eigenvalue .
4. Simulations
To test our method, we simulated data from model (1) with , , and . For each platform i, the design matrix was generated with entries drawn independently from a p-variate normal distribution with mean zero and unit standard deviation. To introduce high correlation within each row of the error matrix , we set the diagonal elements of its covariance matrix to 1 and the off-diagonal elements to 0.8. The coefficient matrix was constructed with 10 active rows. Entries in these rows were sampled uniformly, either from or .
The evaluation metrics encompass three primary aspects: parameter estimation accuracy across all platforms, prediction accuracy, and feature selection accuracy. For each platform , we measure the Frobenius norm of the deviation between the true regression coefficient matrix and its estimate , written as follows:
the deviation between the true covariance matrix and its estimate ,
and the root mean squared errors (RMSE), i.e.,
where is out of samples for all platforms. The overall estimation accuracy of and can be summarized by the average across all k platforms, written as follows:
Meanwhile, feature selection performance is evaluated using sensitivity, the proportion of truly active predictors that are correctly identified, written as follows:
and specificity, the proportion of truly inactive predictors that are correctly excluded,
where TP and TN represent the number of predictors correctly identified as active (non-zero true coefficients with non-zero estimates) and inactive (zero true coefficients with zero estimates), respectively. Conversely, FN and FP denote the number of predictors incorrectly identified as active (zero true coefficients with non-zero estimates) and inactive (non-zero true coefficients with zero estimates), respectively. These metrics are computed for each platform, and the results are aggregated across all k platforms to provide a comprehensive assessment of feature selection performance.
We compare our proposed method with two alternative approaches, FusionLearn and multivariate Group Lasso. The FusionLearn method, implemented in the R package FusionLearn [6], is designed for multi-platform data integration but treats responses as univariate. To adapt it to our multivariate setting, we apply it separately to each of the m responses across platforms, which ignores the covariance structure among responses. Conversely, the multivariate Group Lasso method, available via the glmnet package [19], is designed for multivariate regression within a single platform. We therefore apply it independently to the data from each platform, which does not leverage information shared across platforms.
The simulation results are given in Table 2 and Table 3. As is seen in Table 2, the comparison of MM-HLR, FusionLearn, and GroupLasso reveals distinct performance in parameter estimation and prediction. In terms of parameter estimation of and , the proposed MM-HLR consistently provides estimates closest to the true ones, reflected by their smallest measures. In contrast, both FusionLearn and GroupLasso systematically overestimate the bias of parameters. The estimates of and via GroupLasso are often excessively high, especially for smaller n and larger p. In terms of prediction performance, MM-HLR demonstrates a strong advantage, achieving the lowest RMSE in every setting. Its RMSE values range from 0.38 to 0.58, which are much lower than those of the other two methods. These consistent superiorities confirm MM-HLR’s excellent parameter estimation and predictive capability.
Table 2.
Comparison of averaged parameter estimation accuracy () and prediction accuracy (RMSE) across different methods under various configurations (standard errors in parentheses).
Table 3.
Comparison of averaged feature selection accuracy (sensitivity and specificity) across different methods under various configurations (standard errors in parentheses).
Table 3 presents the comparison of averaged feature selection accuracy (sensitivity and specificity) across different methods under various configurations. All three methods achieve perfect sensitivity (1.0) across all pairs of , indicating their superior ability to select true predictors. However, the specificity values vary significantly among the three methods. MM-HLR achieves near-perfect or perfect specificity (0.9996 to 1.0) in all scenarios. GroupLasso has the lowest specificity overall, ranging from 0.7362 to 0.9396, indicating its weaker ability to exclude irrelevant predictors than the other two methods, while the ability of FusionLearn to exclude irrelevant predictors is in the middle, exhibiting the specificity values from 0.9166 to 0.9524.
5. Real Data Analysis
Zhang et al. [7] integrated three financial indices (S&P 500, Dow Jones, and VIX) as distinct platforms. Similarly, we consider a two-platform analysis () of U.S. stock market data. Our first platform comprises the S&P 500 index and its volatility index (VIX), while the second consists of the NASDAQ-100 index and its volatility index (VXN). Thus, each platform provides a two-dimensional response vector (). We consider stocks that are actively traded in either the S&P 500 or NASDAQ-100 indices as predictors. The analysis uses weekly data from October 2022 to September 2025. To achieve approximately independent samples, log returns are calculated for all series. After data preprocessing and the removal of missing values, the final dataset contains 156 weekly observations and 510 predictors, which are common to both platforms. The correlations between two responses within each platform are −0.79 for Platform 1 (S&P 500 and VIX) and −0.66 for Platform 2 (NASDAQ-100 and VXN). The dataset is randomly partitioned into a training set of 100 samples () and a test set comprising the remaining observations (). Our objective is to identify a common set of predictors that are relevant to at least one of the responses in either platform.
Table 4 presents the prediction errors on the test samples for Platform i and their average , where . We also report the number of selected predictors () to quantify the sparsity of the estimated model. The proposed MM-HLR method demonstrates superior overall performance compared to the benchmark methods, FusionLearn and GroupLasso, in terms of both prediction accuracy and model sparsity. Specifically, MM-HLR achieves the lowest total prediction error (), which is much lower than that of FusionLearn () and approximately 20% lower than that of GroupLasso (). This advantage is consistent across both individual platforms. Furthermore, MM-HLR yields the most parsimonious model, selecting only 6 predictors. In contrast, FusionLearn and GroupLasso select 115 and 75 predictors, respectively, suggesting a higher risk of overfitting. This effective balance between prediction accuracy and model simplicity underscores the capability of the MM-HLR method to leverage both multivariate dependence and multi-platform structures within high-dimensional regression.
Table 4.
The platform-specific prediction errors , the average prediction error (PE), and the number of selected predictors (support).
6. Conclusions and Discussion
In this paper, we introduce the multi-platform high-dimensional multivariate linear regression model, designed to simultaneously model correlated multivariate responses across multiple platforms. The proposed framework explicitly accounts for within-platform response correlation while incorporating a group Lasso penalty to fuse information across platforms, thereby promoting both individual sparsity and structured grouping of predictors. The optimization algorithm, combining iteratively reweighted least squares with block coordinate descent, is developed to solve the resulting regularization problem efficiently. Theoretical guarantees are established for the estimator , covering prediction accuracy, estimation error bounds, and support recovery (sparsity) under regularity conditions. Simulation studies under various scenarios demonstrate the outperformance of MM-HLR in parameter estimation, prediction, and variable selection, showing minimal bias, low variance, and strong robustness across all tested conditions. The superior performance of our approach is further validated through an analysis of real financial data, which confirms its effectiveness in leveraging both multivariate dependence and multi-platform structures within high-dimensional regression.
In this article, we impose a shared sparsity structure across platforms. Extending the framework to handle scenarios with only partial overlap would be a valuable direction for future work. Accordingly, we note that the method’s performance may degrade if the common sparsity pattern is violated and that formally relaxing this assumption constitutes an important avenue for future research.
Author Contributions
Conceptualization, X.G., Y.W. and S.Q.; methodology, X.G., Y.W. and S.Q.; software, S.Q. and G.Z.; validation, S.Q.; formal analysis, S.Q.; investigation, S.Q.; resources, S.Q. and G.Z.; data curation, S.Q.; writing—original draft preparation, S.Q. and G.Z.; writing—review and editing, Y.W. and S.Q.; supervision, X.G., Y.W. and S.Q.; project administration, S.Q. All authors have read and agreed to the published version of the manuscript.
Funding
Shanshan Qin is supported by the National Natural Science Foundation of China (No.12201454) and the China Scholarship Council (No.202408120083). Xin Gao is supported by the Natural Science and Engineering Research Council of Canada (No. RGPIN-2024-06202) and Yuehua Wu is supported by the Natural Science and Engineering Research Council of Canada (No.RGPIN-2023-05655).
Institutional Review Board Statement
Not applicable.
Data Availability Statement
The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.
Conflicts of Interest
The authors declare no conflicts of interest.
Appendix A. Theoretical Proofs
We provide the proof of Theorem 1. Before proceeding, we present Lemmas A1 and A2, which are required in the proof of Theorem 1.
Lemma A1.
Define a random matrix and a random variable , where is the sth column of , . Define an event and its complementary event , respectively, as follows:
By Assumptions 1 and 2, the following holds:
for some constant .
Proof of Lemma A1.
Since the rows of follows independently and identically (iid), then the rows of are iid with random vectors. For the covariance matrix , by Assumption 2, we have the following:
which means that the diagonal elements of are less than . Since , where is the sth diagonal element of . By Assumption 1, . Therefore are standard normal random variables for all . We have the following:
where . □
Lemma A2.
Under Assumptions 1–3, for any , with probability at least , the following holds:
Proof of Lemma A2.
For any , . Plugging into the above inequality, results in the following:
By Lemma A1, it holds that on event , the following is calculated:
This completes the proof of the first inequality (A1) in Lemma A2.
To prove the second inequality (A2) in Lemma A2, we use the KKT conditions. For , the following is written:
where is the s column of , which implies that
On the other hand, we have on event , written as follows:
Combining Equations (A5) and (A6) together, we can obtain the following:
Therefore, for any platform i, the following is written:
This completes the proof of Lemma A2. □
Now, we detail the proof of Theorem 1.
Proof of Theorem 1.
By the first inequality (A1) in Lemma A2 and letting , we have, on event , the following:
which is derived by Cauchy–Schwarz inequality. On event , we also have the following:
which, together with the fact that
yield the following inequality,
Thus the inequality (22) in Assumption 3 holds with and . Therefore, we obtain the following:
Plugging Equations (A13) and (A14) into (A8), we have the following:
which yields that, by Cauchy–Schwarz inequality,
Next, we prove the second term (26) in Theorem 1. Define . Hence, the following is calculated:
Then we have the following:
By (A9), we obtain the following:
Therefore, we obtain the following:
Finally, we prove the third term (27) in Theorem 1. By (A2) in Lemma A2, we can obtain the following:
Since a common sparsity structure is shared across all platforms, the following is assumed:
This completes the proof of Theorem 1. □
References
- Gao, X.; Carroll, R.J. Data integration with high dimensionality. Biometrika 2017, 104, 251–272. [Google Scholar] [CrossRef]
- Liu, Q.; Xu, Q.; Zheng, V.W.; Xue, H.; Cao, Z.; Yang, Q. Multi-task learning for cross-platform siRNA efficacy prediction: An in-silico study. BMC Bioinform. 2010, 11, 1–16. [Google Scholar] [CrossRef]
- Zhang, K.; Gray, J.W.; Parvin, B. Sparse multitask regression for identifying common mechanism of response to therapeutic targets. Bioinformatics 2010, 26, i97–i105. [Google Scholar] [CrossRef]
- Zhong, Y.; Xu, W.; Gao, X. Heterogeneous multi-task feature learning with mixed ℓ2,1 regularization. Mach. Learn. 2024, 113, 891–932. [Google Scholar] [CrossRef]
- Yang, S.; Kim, J.K. Statistical data integration in survey sampling: A review. Jpn. J. Stat. Data Sci. 2020, 3, 625–650. [Google Scholar] [CrossRef]
- Gao, X.; Zhong, Y. FusionLearn: A biomarker selection algorithm on cross-platform data. Bioinformatics 2019, 35, 4465–4468. [Google Scholar] [CrossRef] [PubMed]
- Zhang, G.; Wu, Y.; Gao, X. Bayesian Model Selection via Composite Likelihood for High-dimensional Data Integration. Can. J. Stat. 2024, 52, 924–938. [Google Scholar] [CrossRef]
- Dai, G.; Müller, U.U.; Carroll, R.J. Data integration in high dimension with multiple quantiles. Stat. Sin. 2023, 33, 169–191. [Google Scholar] [CrossRef]
- Simon, N.; Friedman, J.; Hastie, T. A blockwise descent algorithm for group-penalized multiresponse and multinomial regression. arXiv 2013, arXiv:1311.6529. [Google Scholar] [CrossRef]
- Qian, J.; Tanigawa, Y.; Li, R.; Tibshirani, R.; Rivas, M.A.; Hastie, T. Large-scale multivariate sparse regression with applications to UK biobank. Ann. Appl. Stat. 2022, 16, 1891–1918. [Google Scholar] [CrossRef]
- Peng, J.; Zhu, J.; Bergamaschi, A.; Han, W.; Noh, D.Y.; Pollack, J.R.; Wang, P. Regularized multivariate regression for identifying master predictors with application to integrative genomics study of breast cancer. Ann. Appl. Stat. 2010, 4, 53–77. [Google Scholar] [CrossRef]
- Li, Y.; Nan, B.; Zhu, J. Multivariate sparse group lasso for the multivariate multiple linear regression with an arbitrary group structure. Biometrics 2015, 71, 354–363. [Google Scholar] [CrossRef]
- Molstad, A.J.; Zhang, X. Conditional probability tensor decompositions for multivariate categorical response regression. J. Am. Stat. Assoc. 2025, 1–25. [Google Scholar] [CrossRef]
- Chen, B.; Chen, C. Fast optimization methods for high-dimensional row-sparse multivariate quantile linear regression. J. Stat. Comput. Simul. 2024, 94, 69–102. [Google Scholar] [CrossRef]
- Petrella, L.; Raponi, V. Joint estimation of conditional quantiles in multivariate linear regression models with an application to financial distress. J. Multivar. Anal. 2019, 173, 70–84. [Google Scholar] [CrossRef]
- Zhang, Q.; Huang, W.; Jin, C.; Zhao, P.; Shu, Y.; Shen, L.; Tao, D. Multinoulli Extension: A Lossless Yet Effective Probabilistic Framework for Subset Selection over Partition Constraints. In Proceedings of the 42nd International Conference on Machine Learning, Vancouver, BC, Canada, 13–19 July 2025. [Google Scholar]
- Zhong, Y.; Gao, X.; Xu, W. Robust multitask feature learning with adaptive Huber regressions. Can. J. Stat. 2025, 53, e70022. [Google Scholar] [CrossRef]
- Kawano, S.; Fukushima, T.; Nakagawa, J.; Oshiki, M. Multivariate regression modeling in integrative analysis via sparse regularization. Jpn. J. Stat. Data Sci. 2025, 1–28. [Google Scholar] [CrossRef]
- Friedman, J.; Hastie, T.; Tibshirani, R.; Narasimhan, B.; Tay, J.K.; Simon, N.; Yang, J. glmnet: Lasso and Elastic-Net Regularized Generalized Linear Models. R Package Version 4.1-7. 2023. Available online: https://CRAN.R-project.org/package=glmnet (accessed on 14 January 2026).
- Cox, D.R.; Reid, N. A note on pseudolikelihood constructed from marginal densities. Biometrika 2004, 91, 729–737. [Google Scholar] [CrossRef]
- Gao, X.; Song, P.X.K. Composite likelihood EM algorithm with applications to multivariate hidden Markov model. Stat. Sin. 2011, 21, 165–185. [Google Scholar]
- Yuan, M.; Lin, Y. Model selection and estimation in regression with grouped variables. J. R. Stat. Soc. Ser. B Stat. Methodol. 2006, 68, 49–67. [Google Scholar] [CrossRef]
- Parikh, N.; Boyd, S. Proximal algorithms. Found. Trends Optim. 2014, 1, 127–239. [Google Scholar] [CrossRef]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.