Transfer Learning for Moderate–Dimensional Ridge-Regularized Robust Linear Regression
Abstract
1. Introduction
2. Methodology
2.1. Problem Setup
2.2. Trans-RR Algorithm
| Algorithm 1: Trans-RR algorithm |
|
Input: target data and source data Output: the estimated coefficient vector Step 1. Compute for some constant . Step 2. Compute for some constant . Step 3. Let Step 4. Output . |
3. Theoretical Results
3.1. Technical Assumptions
- (a)
- .
- (b)
- Suppose ρ is an even and convex function. Assume that is bounded and is Lipschitz and bounded. Moreover, we assume that and that for all .
- (c)
- Assume that there exist independent variables ’s and ’s such that . Suppose that ’s are i.i.d. with independent entries, and they have mean and . Suppose there exist and that vary with n, where and is bounded in n, such that for any convex 1-Lipschitz function G of , holds for all , where is the median of . We require the same assumption to hold for the columns of the design matrix . Additionally, we assume that the coordinates of have moments of all orders, and the k-th moment of the entries of is assumed to be uniformly bounded independently of n and p for all k.
- (d)
- Suppose that ’s are independent, with , being bounded, and growing at most like for some k. ’s may have finitely many possible distributions.
- (e)
- Suppose that ’s are independent and are also independent of ’s and ’s. They may have finitely many possible distributions, each with a density that is differentiable, symmetric, and unimodal. If is the density of one such distribution, we assume that .
- (f)
- The fraction of occurrences for each possible combination of distributions for has a limit as .
- (g)
- There exist constants and such that and .
- (a)
- .
- (b)
- and satisfy Assumption 1(b).
- (c)
- , ’s and ’s satisfy Assumption 1(c).
- (d)
- ’s satisfy Assumption 1(d).
- (e)
- ’s, ’s and ’s satisfy Assumption 1(e).
- (f)
- ’s and ’s satisfy Assumption 1(f).
- (g)
- remains bounded. Furthermore, , where .
3.2. Asymptotic Characterization of Estimation Error
3.3. Adaptive Aggregation Against Negative Transfer
| Algorithm 2: Adaptive Trans-RR algorithm |
|
Input: target data , source data , fold count K, candidate set , validation loss L Output: the adaptive estimator and the selected weight Step 1. Tune , , and by cross-validation on the corresponding samples. Compute from Algorithm 1 and from (11). Step 2. Draw a K-fold partition of the target indices, independent of the partitions used in Step 1. Step 3. For each , let be the output of Algorithm 1 applied to the full source data and at the tuned . Let be the solution of (11) on at the tuned . Step 4. Compute |
3.4. Applicability and Limitations
4. Simulation
- Case I: for and for . The target errors are i.i.d. , and the source errors are i.i.d. .
- Case II: The variables and are i.i.d. Unif, while and are i.i.d. and , respectively.
- Case III: In both the target and source studies, half of the observations are generated as in Case and the other half are generated as in Case .
4.1. Validity of Theoretical Results
4.2. Theoretical Estimation Error Curves
4.3. Comparison with Existing Methods
- Single-RR: The single-task estimator in (11), fit to the target sample alone.
- Trans-RR: The two-stage estimator in (5), computed by Algorithm 1.
- Trans-RR-Ada: The adaptive aggregate in (12), computed by Algorithm 2 with , absolute-error loss , and weight grid .
- Pooled-RR: The same robust ridge fit applied to the concatenation of the source and target samples.
- Single-Lasso: The lasso on the target sample, with its regularization parameter chosen by -fold cross-validation.
- Trans-Lasso: The two-stage transfer-lasso of [8], in which a cross-validated source-stage lasso estimates and a cross-validated target-stage lasso fits the residual on the target sample.
4.4. Robustness to Non-Identity Covariance
4.5. Sensitivity to Tuning Choices
4.5.1. Choice of
4.5.2. Choice of Cross-Validation Criterion
4.5.3. Choice of Robust Loss
4.5.4. Choice of Ridge Penalty Grid
5. Real Data Analysis
6. Discussion
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
Appendix A. Notation
| Symbol | Meaning |
|---|---|
| Dimensions and asymptotic regime | |
| p, n, | predictor dimension; target and source sample sizes |
| limits of and | |
| Target study (1) () and source study (2) () | |
| , | i-th target/source predictor, with components and scales (Assumptions 1(c) and 2(c)) |
| , | target/source response and error ( may be heavy-tailed) |
| target/source regression coefficient (non-sparse) | |
| , | target-stage/source-stage loss and its derivative |
| target-stage ridge for Trans-RR and for , source-stage ridge for | |
| , , , | source-stage, Step 2, Trans-RR (5), and single-task (11) estimators |
| Source–target discrepancy | |
| , | discrepancy vector and its magnitude |
| Adaptive aggregation (Section 3.3) | |
| , | adaptive Trans-RR estimator and its instance at , (12) |
| , , | mixing weight, its cross-validation choice, and candidate set |
| K, , | fold count, target-sample fold partition, validation loss in (13) |
| Smoothed Huber family (choices for and ) | |
| , | Huber transition and smoothing parameters (scalar distinct from vector ) |
| , , | Huber, smoothed Huber (6), and Pseudo-Huber (8) losses |
| Asymptotic quantities (Theorem 1) | |
| asymptotic value of and companion scalar in the fixed-point system | |
| , | standard normal (independent of ) and auxiliary random variable |
| proximal mapping: | |
Appendix B. Additional (δ, η) Heatmaps
| Case I (Gaussian Errors) | |||
| 0.653/0.560/0.561/0.604 | 0.654/0.561/0.562/0.604 | 0.656/0.562/0.563/0.606 | |
| 0.644/0.549/0.550/0.600 | 0.645/0.550/0.550/0.600 | 0.646/0.551/0.551/0.601 | |
| 0.641/0.537/0.538/0.594 | 0.640/0.537/0.539/0.594 | 0.641/0.538/0.539/0.593 | |
| Case II (Cauchy Errors) | |||
| 0.884/0.819/0.821/0.809 | 0.884/0.819/0.821/0.808 | 0.884/0.819/0.821/0.808 | |
| 0.883/0.817/0.819/0.812 | 0.883/0.816/0.819/0.811 | 0.883/0.816/0.818/0.811 | |
| 0.890/0.821/0.823/0.822 | 0.889/0.821/0.823/0.821 | 0.889/0.821/0.823/0.821 | |
| Case III (Mixture Errors) | |||
| 0.779/0.695/0.694/0.706 | 0.780/0.695/0.695/0.706 | 0.780/0.695/0.695/0.706 | |
| 0.781/0.693/0.692/0.705 | 0.781/0.693/0.692/0.706 | 0.781/0.692/0.692/0.705 | |
| 0.794/0.701/0.701/0.714 | 0.793/0.700/0.701/0.714 | 0.792/0.700/0.700/0.713 | |
| Case I (Gaussian Errors) | |||
| 0.653/0.576/0.577/0.634 | 0.654/0.577/0.578/0.635 | 0.656/0.578/0.579/0.635 | |
| 0.644/0.565/0.565/0.628 | 0.645/0.565/0.566/0.628 | 0.646/0.566/0.567/0.629 | |
| 0.641/0.554/0.556/0.624 | 0.640/0.555/0.556/0.624 | 0.641/0.555/0.556/0.624 | |
| Case (Cauchy Errors) | |||
| 0.884/0.828/0.830/0.824 | 0.884/0.828/0.830/0.825 | 0.884/0.828/0.831/0.825 | |
| 0.883/0.826/0.829/0.829 | 0.883/0.826/0.829/0.829 | 0.883/0.825/0.828/0.828 | |
| 0.890/0.829/0.832/0.838 | 0.889/0.830/0.832/0.837 | 0.889/0.829/0.831/0.836 | |
| Case (Mixture Errors) | |||
| 0.779/0.709/0.709/0.729 | 0.780/0.709/0.709/0.729 | 0.780/0.710/0.709/0.730 | |
| 0.781/0.708/0.707/0.728 | 0.781/0.707/0.707/0.729 | 0.781/0.707/0.706/0.728 | |
| 0.794/0.715/0.715/0.737 | 0.793/0.715/0.715/0.737 | 0.792/0.714/0.714/0.736 | |
| Case I (Gaussian Errors) | |||
| 0.653/0.598/0.600/0.683 | 0.654/0.599/0.601/0.684 | 0.656/0.601/0.602/0.686 | |
| 0.644/0.588/0.589/0.678 | 0.645/0.588/0.589/0.678 | 0.646/0.589/0.591/0.678 | |
| 0.641/0.579/0.580/0.677 | 0.640/0.579/0.580/0.677 | 0.641/0.579/0.580/0.677 | |
| Case II (Cauchy Errors) | |||
| 0.884/0.842/0.844/0.854 | 0.884/0.842/0.844/0.854 | 0.884/0.843/0.846/0.853 | |
| 0.883/0.840/0.843/0.855 | 0.883/0.840/0.843/0.855 | 0.883/0.839/0.842/0.856 | |
| 0.890/0.843/0.847/0.864 | 0.889/0.843/0.847/0.864 | 0.889/0.842/0.846/0.863 | |
| Case III (Mixture Errors) | |||
| 0.779/0.729/0.729/0.769 | 0.780/0.729/0.730/0.769 | 0.780/0.729/0.730/0.770 | |
| 0.781/0.728/0.728/0.770 | 0.781/0.727/0.727/0.770 | 0.781/0.727/0.727/0.770 | |
| 0.794/0.737/0.738/0.777 | 0.793/0.736/0.737/0.778 | 0.792/0.735/0.736/0.777 | |
| Case I (Gaussian Errors) | |||
| 0.653/0.625/0.626/0.767 | 0.654/0.626/0.627/0.767 | 0.656/0.627/0.629/0.768 | |
| 0.644/0.615/0.616/0.764 | 0.645/0.616/0.617/0.764 | 0.646/0.617/0.618/0.764 | |
| 0.641/0.607/0.608/0.764 | 0.640/0.607/0.608/0.764 | 0.641/0.607/0.609/0.764 | |
| Case II (Cauchy Errors) | |||
| 0.884/0.862/0.864/0.897 | 0.884/0.862/0.864/0.897 | 0.884/0.863/0.865/0.897 | |
| 0.883/0.860/0.863/0.898 | 0.883/0.860/0.862/0.898 | 0.883/0.859/0.862/0.898 | |
| 0.890/0.865/0.868/0.903 | 0.889/0.864/0.868/0.903 | 0.889/0.864/0.868/0.902 | |
| Case III (Mixture Errors) | |||
| 0.779/0.755/0.755/0.835 | 0.780/0.755/0.755/0.835 | 0.780/0.755/0.755/0.836 | |
| 0.781/0.754/0.754/0.834 | 0.781/0.754/0.754/0.834 | 0.781/0.753/0.754/0.834 | |
| 0.794/0.764/0.764/0.841 | 0.793/0.763/0.764/0.840 | 0.792/0.762/0.762/0.840 | |
| Case I(Gaussian Errors) | |||
| 0.653/0.797/0.653/0.987 | 0.654/0.795/0.654/0.987 | 0.656/0.796/0.657/0.987 | |
| 0.644/0.792/0.645/0.990 | 0.645/0.791/0.645/0.990 | 0.646/0.792/0.646/0.990 | |
| 0.641/0.788/0.641/0.999 | 0.640/0.789/0.641/0.999 | 0.641/0.787/0.641/0.998 | |
| Case II (Cauchy Errors) | |||
| 0.884/1.024/0.888/1.005 | 0.884/1.023/0.888/1.005 | 0.884/1.023/0.888/1.005 | |
| 0.883/1.021/0.888/1.005 | 0.883/1.021/0.887/1.005 | 0.883/1.020/0.887/1.005 | |
| 0.890/1.029/0.894/1.006 | 0.889/1.027/0.893/1.006 | 0.889/1.026/0.893/1.006 | |
| Case III (Mixture Errors) | |||
| 0.779/0.934/0.781/0.997 | 0.780/0.935/0.781/0.996 | 0.780/0.936/0.781/0.996 | |
| 0.781/0.943/0.783/0.998 | 0.781/0.941/0.783/0.998 | 0.781/0.940/0.783/0.998 | |
| 0.794/0.955/0.796/1.003 | 0.793/0.954/0.795/1.003 | 0.792/0.952/0.794/1.002 | |
| Case I (Gaussian Errors) | |||
| 0.653/1.617/0.653/1.231 | 0.654/1.618/0.654/1.228 | 0.656/1.620/0.656/1.228 | |
| 0.644/1.637/0.644/1.262 | 0.645/1.639/0.645/1.261 | 0.646/1.635/0.646/1.254 | |
| 0.641/1.617/0.641/1.312 | 0.640/1.613/0.640/1.309 | 0.641/1.611/0.641/1.304 | |
| Case II (Cauchy Errors) | |||
| 0.884/1.882/0.884/1.118 | 0.884/1.882/0.884/1.119 | 0.884/1.890/0.884/1.118 | |
| 0.883/1.902/0.883/1.122 | 0.883/1.898/0.883/1.123 | 0.883/1.900/0.883/1.121 | |
| 0.890/1.873/0.890/1.137 | 0.889/1.869/0.889/1.139 | 0.889/1.864/0.889/1.136 | |
| Case III (Mixture Errors) | |||
| 0.779/1.975/0.779/1.169 | 0.780/1.971/0.780/1.171 | 0.780/1.993/0.780/1.169 | |
| 0.781/1.992/0.781/1.183 | 0.781/1.985/0.781/1.181 | 0.781/1.971/0.781/1.178 | |
| 0.794/1.999/0.794/1.209 | 0.793/2.001/0.793/1.208 | 0.792/2.009/0.792/1.205 | |
Appendix C. Robustness Checks for the Real-Data Analysis
| Direction A: Target = X | ||||
| Method | Offset 0 | Offset 1 | Offset 2 | Offset 3 |
| Trans-RR | 4.6230 (0.1732) | 4.5555 (0.1619) | 4.7143 (0.1660) | 4.5511 (0.1149) |
| Trans-RR-Ada | 4.6294 (0.1861) | 4.5569 (0.1703) | 4.7155 (0.1602) | 4.5669 (0.1202) |
| Pooled-RR | 5.0952 (0.0812) | 4.9644 (0.1462) | 5.0982 (0.0971) | 4.9367 (0.0769) |
| Trans-Lasso | 5.5668 (0.3650) | 5.5799 (0.5213) | 5.5239 (0.2604) | 5.3689 (0.3004) |
| Single-RR | 6.2666 (2.2628) | 6.3039 (2.0647) | 6.4540 (2.4723) | 6.4131 (2.3509) |
| Single-Lasso | 8.0672 (2.8904) | 8.1289 (3.5097) | 8.5205 (4.3280) | 8.7014 (3.3436) |
| Direction B: Target = | ||||
| Method | Offset 0 | Offset 1 | Offset 2 | Offset 3 |
| Trans-RR | 4.7933 (0.2736) | 4.7063 (0.1320) | 4.6329 (0.2049) | 4.6457 (0.1895) |
| Trans-RR-Ada | 4.8211 (0.3757) | 4.7405 (0.1403) | 4.6222 (0.1583) | 4.6447 (0.1779) |
| Pooled-RR | 5.4909 (0.1335) | 5.2180 (0.1444) | 5.4916 (0.1550) | 5.1584 (0.1728) |
| Trans-Lasso | 5.6803 (0.3131) | 5.9048 (0.5452) | 5.5392 (0.4612) | 5.6729 (0.3458) |
| Single-RR | 6.8272 (2.2674) | 6.9625 (2.5627) | 6.6887 (2.2085) | 6.6730 (2.4256) |
| Single-Lasso | 8.5607 (3.3739) | 9.0675 (3.7103) | 7.8504 (2.7262) | 7.9018 (2.8987) |
| Direction A | Direction B | |||
|---|---|---|---|---|
| Method | Whitened | Unwhitened | Whitened | Unwhitened |
| Trans-RR | 4.6230 (0.1732) | 4.6281 (0.2237) | 4.7933 (0.2736) | 4.9253 (0.2064) |
| Trans-RR-Ada | 4.6294 (0.1861) | 4.6225 (0.2182) | 4.8211 (0.3757) | 4.9386 (0.1926) |
| Pooled-RR | 5.0952 (0.0812) | 5.0615 (0.0959) | 5.4909 (0.1335) | 5.0178 (0.1401) |
| Trans-Lasso | 5.5668 (0.3650) | 4.8776 (0.3020) | 5.6803 (0.3131) | 5.0792 (0.2623) |
| Single-RR | 6.2666 (2.2628) | 4.8845 (0.3038) | 6.8272 (2.2674) | 5.2476 (0.3971) |
| Single-Lasso | 8.0672 (2.8904) | 4.9933 (0.3261) | 8.5607 (3.3739) | 5.2374 (0.3037) |
Appendix D. Assumptions
- Assumptions under which the whole proof goes through
- M1. .
- M2. There exists constants and such that and .
- M3. Suppose is an even function. Assume that is bounded and is Lipschitz and bounded. Moreover, we assume that and that for all .
- M4. Assume that there exist independent variables ’s and ’s such that . Suppose that ’s are i.i.d. with independent entries, and they have mean and . Suppose there exist and that vary with n, where and is bounded in n, such that for any convex 1-Lipschitz function G of , holds for all , where is the median of . We require the same assumption to hold for the columns of the design matrix . Additionally, we assume that the coordinates of have moments of all orders, and the k-th moment of the entries of is assumed to be uniformly bounded independently of n and p for all k. Also, for any , the vectors in satisfy: for any 1-Lipschitz (with respect to Euclidean norm) convex function G, if is a median of , for any , and can vary with n. As above, we assume that .
- M5. ’s are independent, with , being bounded, and growing at most like for some k. ’s may have finitely many possible distributions.
- M6. Suppose that ’s are independent and independent of ’s and ’s. They may have finitely many possible distributions, each with a density that is differentiable, symmetric, and unimodal. Furthermore, for any , if , independent of , has a differentiable density which is increasing on and decreasing on . .
- M7. ’s can have different distributions. Similarly, ’s can have different distributions. The fraction of occurrences for each possible combination of distributions for has a limit as .
- First part of the proof (Appendix E.3)
- O1. .
- O2. is twice differentiable, convex, and non-linear. . Note that since is convex. We assume that and .
- O3. and for some constant C. Furthermore, is assumed to be -Lipschitz with . We also assume that .
- O4. Assume that there exist independent variables ’s and ’s such that . Suppose that ’s are i.i.d. with independent entries, and they have mean and . Suppose there exist and that vary with n, where and is bounded in n, such that for any convex 1-Lipschitz function G of , holds for all , where is the median of . We require the same assumption to hold for the columns of the design matrix . Additionally, we assume that the coordinates of have moments of all orders, and the k-th moment of the entries of is assumed to be uniformly bounded independently of n and p for all k.
- O5. and are independent of . ’s are independent of each other.
- O6. and ’s are independent. Moreover, .
- O7. and .
- Second part of the proof (Appendix E.4)
- P1. ’s have independent entries. Furthermore, for any , the vectors in satisfy: for any 1-Lipschitz (with respect to Euclidean norm) convex function G, if is a median of , for any , and can vary with n. As above, we assume that .
- P2. .
- P3. , where . Furthermore, , where C is a constant independent of p and n. satisfies .
- P4. and . The latter implies that
- Last part of the proof (Appendix E.5)
- F1. ’s may have different distributions; however, they may only come from finitely many distributions. Furthermore, for any , if , independent of , has a differentiable density which is increasing on and decreasing on . .
- F2. . has Lipschitz constant . Furthermore, .
- F3. and .
- F4. there exists constant C such that .
- F5. ’s may have different distributions. The fraction of occurrences for each possible combination of distributions for has a limit as .
Appendix E. Proof for Theorem 1
Appendix E.1. Preliminaries
Appendix E.2. On and
Appendix E.3. Leave-One-Observation-Out
Appendix E.3.1. Deterministic Bounds
- i.
- On
- ii.
- On and related quantities
Appendix E.3.2. Stochastic Aspects
- i.
- On
- Consequences
- On the limiting variance of and
Appendix E.4. Leaving Out a Predictor
- Approximation to via leave-one-predictor-out
Appendix E.4.1. Deterministic Aspects
Appendix E.4.2. Stochastic Aspects
- On
- On
- About
- Controlling
- Control of
- Further results on and
- On
Appendix E.4.3. Final Conclusions
- On and
Appendix E.5. Last Steps of the Proof
Appendix E.5.1. On the Asymptotic Behavior of r ˜ i,(i)
Appendix E.5.2. On the Asymptotic Behavior of c τ
- Proof of the fact that is such that
- Final details
References
- Chen, M. Analysis on transfer learning models and applications in natural language processing. Highlights Sci. Eng. Technol. 2022, 16, 446–452. [Google Scholar] [CrossRef]
- Ma, Y.; Chen, S.; Ermon, S.; Lobell, D.B. Transfer learning in environmental remote sensing. Remote Sens. Environ. 2024, 301, 113924. [Google Scholar] [CrossRef]
- Gopalakrishnan, K.; Khaitan, S.K.; Choudhary, A.; Agrawal, A. Deep convolutional neural networks with transfer learning for computer vision-based data-driven pavement distress detection. Constr. Build. Mater. 2017, 157, 322–330. [Google Scholar] [CrossRef]
- Liu, D.; Luo, J.; Johnson, B.; Chew, H.; Blais, J.; Deik, A.; Paul, F.; Hanson, R.L.; Crandall, J.P.; Sun, Y.; et al. Modeling blood metabolite homeostatic levels reduces sample heterogeneity across cohorts. Proc. Natl. Acad. Sci. USA 2024, 121, e2307430121. [Google Scholar] [CrossRef]
- Chen, A.; Owen, A.B.; Shi, M. Data enriched linear regression. Electron. J. Stat. 2015, 9, 1078–1112. [Google Scholar] [CrossRef]
- Tripuraneni, N.; Jin, C.; Jordan, M. Provable meta-learning of linear representations. In Proceedings of the 38th International Conference on Machine Learning, Virtual, 18–24 July 2021; Volume 139, pp. 10434–10443. [Google Scholar]
- Bastani, H. Predicting with proxies: Transfer learning in high dimension. Manag. Sci. 2021, 67, 2964–2984. [Google Scholar] [CrossRef]
- Li, S.; Cai, T.T.; Li, H. Transfer learning for high-dimensional linear regression: Prediction, estimation and minimax optimality. J. R. Stat. Soc. Ser. B Stat. Methodol. 2022, 84, 149–173. [Google Scholar] [CrossRef]
- Tian, Y.; Feng, Y. Transfer learning under high-dimensional generalized linear models. J. Am. Stat. Assoc. 2023, 118, 2684–2697. [Google Scholar] [CrossRef]
- Li, S.; Zhang, L.; Cai, T.T.; Li, H. Estimation and inference for high-dimensional generalized linear models with knowledge transfer. J. Am. Stat. Assoc. 2024, 119, 1274–1285. [Google Scholar] [CrossRef]
- Cai, T.T.; Wei, H. Transfer learning for nonparametric classification: Minimax rate and adaptive classifier. Ann. Stat. 2021, 49, 100–128. [Google Scholar] [CrossRef]
- Cai, T.T.; Pu, H. Transfer learning for nonparametric regression: Non-asymptotic minimax analysis and adaptive procedure. arXiv 2024, arXiv:2401.12272. [Google Scholar] [CrossRef]
- Fan, J.; Gao, C.; Klusowski, J.M. Robust transfer learning with unreliable source data. arXiv 2023, arXiv:2310.04606. [Google Scholar] [CrossRef]
- Huber, P.J. Robust regression: Asymptotics, conjectures and Monte Carlo. Ann. Stat. 1973, 1, 799–821. [Google Scholar] [CrossRef]
- Portnoy, S. Asymptotic behavior of M-estimators of p regression parameters when p2/n is large. I. Consistency. Ann. Stat. 1984, 13, 1298–1309. [Google Scholar]
- Portnoy, S. Asymptotic behavior of M-estimators of p regression parameters when p2/n is large. II. Normal approximation. Ann. Stat. 1985, 13, 1403–1417. [Google Scholar] [CrossRef]
- Portnoy, S. Asymptotic behavior of the empiric distribution of M-estimated residuals from a regression model with many parameters. Ann. Stat. 1986, 14, 1152–1170. [Google Scholar] [CrossRef]
- Portnoy, S. A central limit theorem applicable to robust regression estimators. J. Multivar. Anal. 1987, 22, 24–50. [Google Scholar] [CrossRef][Green Version]
- Mammen, E. Asymptotics with increasing dimension for robust regression with applications to the bootstrap. Ann. Stat. 1989, 17, 382–400. [Google Scholar] [CrossRef]
- El Karoui, N.; Bean, D.; Bickel, P.J.; Lim, C.; Yu, B. On robust regression with high-dimensional predictors. Proc. Natl. Acad. Sci. USA 2013, 110, 14557–14562. [Google Scholar] [CrossRef]
- El Karoui, N. On the impact of predictor geometry on the performance on high-dimensional ridge-regularized generalized robust regression estimators. Probab. Theory Relat. Fields 2018, 170, 95–175. [Google Scholar] [CrossRef]
- Charbonnier, P.; Blanc-Feraud, L.; Aubert, G.; Barlaud, M. Deterministic edge-preserving regularization in computed imaging. IEEE Trans. Image Process. 1997, 6, 298–311. [Google Scholar] [CrossRef]
- Hartley, R.; Zisserman, A. Multiple View Geometry in Computer Vision; Cambridge University Press: Cambridge, UK, 2003. [Google Scholar]
- Yohai, V.J.; Maronna, R.A. Asymptotic behavior of M-estimators for the linear model. Ann. Stat. 1979, 7, 258–268. [Google Scholar] [CrossRef]
- El Karoui, N. Asymptotic behavior of unregularized and ridge-regularized high-dimensional robust regression estimators: Rigorous results. arXiv 2013, arXiv:1311.2445. [Google Scholar]
- El Karoui, N. Concentration of measure and spectra of random matrices: Applications to correlation matrices, elliptical distributions and beyond. Ann. Appl. Probab. 2009, 19, 2362–2405. [Google Scholar] [CrossRef]
- Ledoux, M. The Concentration of Measure Phenomenon; American Mathematical Society: Providence, RI, USA, 2001; Volume 89. [Google Scholar]
- Huber, P.J.; Ronchetti, E.M. Robust Statistics; John Wiley & Sons: Hoboken, NJ, USA, 2009. [Google Scholar]
- Karlin, S. Total Positivity; Stanford University Press: Stanford, CA, USA, 1968. [Google Scholar]
- Ibragimov, I.A. On the composition of unimodal distributions. Teor. Veroyatnost. Primen. 1956, 1, 283–288. [Google Scholar] [CrossRef]
- Dharmadhikari, S.; Joag-Dev, K. Unimodality, Convexity, and Applications; Probability and Mathematical Statistics; Academic Press, Inc.: Boston, MA, USA, 1988. [Google Scholar]
- Moreau, J.J. Proximité et dualité dans un espace hilbertien. Bull. Soc. Math. Fr. 1965, 93, 273–299. [Google Scholar] [CrossRef]
- Beck, A.; Teboulle, M. Gradient-based algorithms with applications to signal-recovery problems. In Convex Optimization in Signal Processing and Communications; Palomar, D.P., Eldar, Y.C., Eds.; Cambridge University Press: Cambridge, UK, 2010; pp. 42–88. [Google Scholar]
- Bean, D.; Bickel, P.J.; El Karoui, N.; Yu, B. Optimal M-estimation in high-dimensional regression. Proc. Natl. Acad. Sci. USA 2013, 110, 14563–14568. [Google Scholar] [CrossRef]
- Efron, B.; Stein, C. The jackknife estimate of variance. Ann. Stat. 1981, 9, 586–596. [Google Scholar] [CrossRef]
- Bhatia, R. Matrix Analysis; Springer: New York, NY, USA, 1997. [Google Scholar]
- Johnson, C.R.; Horn, R.A. Matrix Analysis; Cambridge University Press: Cambridge, UK, 1985. [Google Scholar]
- Stroock, D.W. Probability Theory: An Analytic View; Cambridge University Press: Cambridge, UK, 2010. [Google Scholar]
- van der Vaart, A.W. Asymptotic Statistics; Cambridge University Press: Cambridge, UK, 2000. [Google Scholar]





| () | Case | Case | Case | |||
|---|---|---|---|---|---|---|
| (200, 200) | 0.3653 (0.0318) | 0.3649 | 0.7163 (0.0738) | 0.7204 | 0.4683 (0.0478) | 0.4685 |
| (400, 400) | 0.3472 (0.0208) | 0.3477 | 0.6970 (0.0549) | 0.6923 | 0.5076 (0.0368) | 0.5065 |
| (800, 800) | 0.3603 (0.0151) | 0.3598 | 0.7206 (0.0374) | 0.7212 | 0.4989 (0.0238) | 0.4986 |
| (200, 50) | 1.3415 (0.0734) | 1.3419 | 2.8222 (0.3311) | 2.8216 | 2.2410 (0.2407) | 2.2415 |
| (400, 100) | 1.3565 (0.0531) | 1.3544 | 2.4896 (0.2363) | 2.4996 | 1.8087 (0.1590) | 1.8102 |
| (800, 200) | 1.5261 (0.0427) | 1.5247 | 2.7226 (0.1693) | 2.7219 | 2.0342 (0.1153) | 2.0301 |
| Case I (Gaussian Errors) | |||
| 0.653/0.655/0.651/0.883 | 0.654/0.656/0.652/0.883 | 0.656/0.658/0.654/0.883 | |
| 0.644/0.647/0.642/0.882 | 0.645/0.647/0.642/0.882 | 0.646/0.648/0.643/0.882 | |
| 0.641/0.642/0.637/0.885 | 0.640/0.642/0.637/0.885 | 0.641/0.643/0.637/0.885 | |
| Case II (Cauchy Errors) | |||
| 0.884/0.893/0.884/0.952 | 0.884/0.892/0.884/0.952 | 0.884/0.893/0.884/0.952 | |
| 0.883/0.893/0.884/0.951 | 0.883/0.892/0.883/0.951 | 0.883/0.892/0.884/0.951 | |
| 0.890/0.899/0.891/0.951 | 0.889/0.899/0.890/0.951 | 0.889/0.899/0.890/0.951 | |
| Case III (Mixture Errors) | |||
| 0.779/0.786/0.779/0.921 | 0.780/0.787/0.779/0.921 | 0.780/0.787/0.779/0.921 | |
| 0.781/0.788/0.781/0.920 | 0.781/0.787/0.780/0.920 | 0.781/0.787/0.780/0.921 | |
| 0.794/0.801/0.793/0.924 | 0.793/0.800/0.792/0.924 | 0.792/0.800/0.791/0.923 | |
| Case | h | Single-RR | Trans-RR | Trans-RR-Ada | Pooled-RR |
|---|---|---|---|---|---|
| 0.645/0.644 | 0.550/0.546 | 0.550/0.546 | 0.600/0.601 | ||
| 0.645/0.644 | 0.565/0.561 | 0.566/0.561 | 0.628/0.629 | ||
| 0.645/0.644 | 0.588/0.585 | 0.589/0.585 | 0.678/0.681 | ||
| 0.645/0.644 | 0.616/0.614 | 0.617/0.615 | 0.764/0.770 | ||
| 0.645/0.644 | 0.647/0.645 | 0.642/0.641 | 0.882/0.888 | ||
| 0.645/0.644 | 0.791/0.793 | 0.645/0.644 | 0.990/0.991 | ||
| 0.645/0.644 | 1.639/1.646 | 0.645/0.644 | 1.261/1.479 | ||
| 0.883/1.772 | 0.816/2.582 | 0.819/1.895 | 0.811/1.324 | ||
| 0.883/1.772 | 0.826/2.597 | 0.829/1.904 | 0.829/1.326 | ||
| 0.883/1.772 | 0.840/2.618 | 0.843/1.915 | 0.855/1.353 | ||
| 0.883/1.772 | 0.860/2.640 | 0.862/1.920 | 0.898/1.401 | ||
| 0.883/1.772 | 0.892/2.745 | 0.883/1.963 | 0.951/1.497 | ||
| 0.883/1.772 | 1.021/2.994 | 0.887/2.020 | 1.005/1.679 | ||
| 0.883/1.772 | 1.898/3.827 | 0.883/2.108 | 1.123/2.223 | ||
| 0.781/1.108 | 0.693/1.319 | 0.692/1.094 | 0.706/0.944 | ||
| 0.781/1.108 | 0.707/1.334 | 0.707/1.102 | 0.729/0.966 | ||
| 0.781/1.108 | 0.727/1.358 | 0.727/1.114 | 0.770/0.993 | ||
| 0.781/1.108 | 0.754/1.395 | 0.754/1.138 | 0.834/1.054 | ||
| 0.781/1.108 | 0.787/1.473 | 0.780/1.161 | 0.920/1.155 | ||
| 0.781/1.108 | 0.941/1.749 | 0.783/1.194 | 0.998/1.368 | ||
| 0.781/1.108 | 1.985/2.646 | 0.781/1.274 | 1.181/1.961 |
| Case | h | Single-RR | Trans-RR | Trans-RR-Ada | Pooled-RR |
|---|---|---|---|---|---|
| 0.645/0.645 | 0.550/0.545 | 0.550/0.546 | 0.600/0.595 | ||
| 0.645/0.645 | 0.565/0.561 | 0.566/0.562 | 0.628/0.625 | ||
| 0.645/0.645 | 0.588/0.585 | 0.589/0.586 | 0.678/0.675 | ||
| 0.645/0.645 | 0.616/0.615 | 0.617/0.616 | 0.764/0.763 | ||
| 0.645/0.645 | 0.647/0.647 | 0.642/0.643 | 0.882/0.883 | ||
| 0.645/0.645 | 0.791/0.797 | 0.645/0.645 | 0.990/0.995 | ||
| 0.645/0.645 | 1.639/1.642 | 0.645/0.645 | 1.261/1.299 | ||
| 0.883/0.891 | 0.816/0.828 | 0.819/0.830 | 0.811/0.823 | ||
| 0.883/0.891 | 0.826/0.836 | 0.829/0.839 | 0.829/0.839 | ||
| 0.883/0.891 | 0.840/0.850 | 0.843/0.852 | 0.855/0.865 | ||
| 0.883/0.891 | 0.860/0.869 | 0.862/0.872 | 0.898/0.905 | ||
| 0.883/0.891 | 0.892/0.900 | 0.883/0.891 | 0.951/0.954 | ||
| 0.883/0.891 | 1.021/1.023 | 0.887/0.895 | 1.005/1.007 | ||
| 0.883/0.891 | 1.898/1.871 | 0.883/0.891 | 1.123/1.129 | ||
| 0.781/0.793 | 0.693/0.703 | 0.692/0.703 | 0.706/0.713 | ||
| 0.781/0.793 | 0.707/0.717 | 0.707/0.718 | 0.729/0.738 | ||
| 0.781/0.793 | 0.727/0.740 | 0.727/0.740 | 0.770/0.777 | ||
| 0.781/0.793 | 0.754/0.765 | 0.754/0.765 | 0.834/0.842 | ||
| 0.781/0.793 | 0.787/0.798 | 0.780/0.791 | 0.920/0.925 | ||
| 0.781/0.793 | 0.941/0.950 | 0.783/0.794 | 0.998/1.002 | ||
| 0.781/0.793 | 1.985/1.972 | 0.781/0.793 | 1.181/1.199 |
| Case | h | Default Grid (9 pts, [1/9, 9]) | Wide Grid (13 pts, [1/27, 27]) |
|---|---|---|---|
| 0.645/0.550/0.550/0.600 | 0.645/0.550/0.550/0.600 | ||
| 0.645/0.565/0.566/0.628 | 0.645/0.565/0.566/0.628 | ||
| 0.645/0.588/0.589/0.678 | 0.645/0.589/0.590/0.678 | ||
| 0.645/0.616/0.617/0.764 | 0.645/0.618/0.619/0.764 | ||
| 0.645/0.647/0.642/0.882 | 0.645/0.648/0.642/0.883 | ||
| 0.645/0.791/0.645/0.990 | 0.645/0.791/0.645/0.990 | ||
| 0.645/1.639/0.645/1.261 | 0.645/1.639/0.645/1.261 | ||
| 0.883/0.816/0.819/0.811 | 0.887/0.824/0.826/0.811 | ||
| 0.883/0.826/0.829/0.829 | 0.887/0.835/0.838/0.829 | ||
| 0.883/0.840/0.843/0.855 | 0.887/0.850/0.852/0.856 | ||
| 0.883/0.860/0.862/0.898 | 0.887/0.869/0.871/0.900 | ||
| 0.883/0.892/0.883/0.951 | 0.887/0.896/0.888/0.955 | ||
| 0.883/1.021/0.887/1.005 | 0.887/1.021/0.891/1.006 | ||
| 0.883/1.898/0.883/1.123 | 0.887/1.899/0.887/1.120 | ||
| 0.781/0.693/0.692/0.706 | 0.782/0.694/0.694/0.706 | ||
| 0.781/0.707/0.707/0.729 | 0.782/0.709/0.709/0.729 | ||
| 0.781/0.727/0.727/0.770 | 0.782/0.731/0.731/0.770 | ||
| 0.781/0.754/0.754/0.834 | 0.782/0.758/0.758/0.834 | ||
| 0.781/0.787/0.780/0.920 | 0.782/0.789/0.781/0.922 | ||
| 0.781/0.941/0.783/0.998 | 0.782/0.941/0.783/0.999 | ||
| 0.781/1.985/0.781/1.181 | 0.782/1.985/0.782/1.180 |
| Case | h | τ = 1/3 | τ = 1 | τ = 3 | τ = 9 |
|---|---|---|---|---|---|
| 0.742/0.769/0.687/0.619 | 0.628/0.523/0.525/0.605 | 0.742/0.620/0.620/0.773 | 0.881/0.814/0.814/0.906 | ||
| 0.742/0.781/0.696/0.645 | 0.628/0.538/0.538/0.625 | 0.742/0.631/0.632/0.783 | 0.881/0.820/0.820/0.910 | ||
| 0.742/0.804/0.709/0.694 | 0.628/0.565/0.561/0.661 | 0.742/0.651/0.652/0.801 | 0.881/0.829/0.829/0.917 | ||
| 0.742/0.851/0.728/0.790 | 0.628/0.616/0.596/0.726 | 0.742/0.687/0.690/0.833 | 0.881/0.846/0.846/0.929 | ||
| 0.742/0.953/0.745/0.990 | 0.628/0.714/0.629/0.848 | 0.742/0.752/0.740/0.890 | 0.881/0.876/0.876/0.951 | ||
| 0.742/1.185/0.743/1.423 | 0.628/0.913/0.628/1.074 | 0.742/0.866/0.742/0.986 | 0.881/0.924/0.881/0.986 | ||
| 0.742/1.739/0.742/2.321 | 0.628/1.287/0.628/1.436 | 0.742/1.033/0.742/1.113 | 0.881/0.987/0.881/1.030 | ||
| 2.026/2.443/2.014/1.143 | 0.978/0.964/0.930/0.789 | 0.861/0.778/0.784/0.850 | 0.924/0.877/0.878/0.936 | ||
| 2.026/2.459/2.019/1.167 | 0.978/0.979/0.938/0.804 | 0.861/0.787/0.793/0.857 | 0.924/0.881/0.882/0.939 | ||
| 2.026/2.490/2.027/1.211 | 0.978/1.005/0.953/0.833 | 0.861/0.803/0.810/0.870 | 0.924/0.888/0.889/0.943 | ||
| 2.026/2.551/2.037/1.297 | 0.978/1.055/0.972/0.885 | 0.861/0.832/0.836/0.894 | 0.924/0.900/0.903/0.952 | ||
| 2.026/2.681/2.045/1.475 | 0.978/1.152/0.988/0.983 | 0.861/0.885/0.865/0.937 | 0.924/0.923/0.922/0.968 | ||
| 2.026/2.971/2.042/1.848 | 0.978/1.342/0.984/1.161 | 0.861/0.977/0.864/1.008 | 0.924/0.959/0.927/0.994 | ||
| 2.026/3.615/2.034/2.569 | 0.978/1.663/0.978/1.420 | 0.861/1.096/0.861/1.093 | 0.924/1.000/0.925/1.022 | ||
| 1.286/1.442/1.246/0.827 | 0.784/0.713/0.704/0.686 | 0.798/0.693/0.694/0.808 | 0.902/0.844/0.844/0.920 | ||
| 1.286/1.456/1.253/0.852 | 0.784/0.729/0.715/0.705 | 0.798/0.703/0.704/0.817 | 0.902/0.849/0.849/0.923 | ||
| 1.286/1.484/1.265/0.900 | 0.784/0.756/0.735/0.737 | 0.798/0.721/0.724/0.833 | 0.902/0.857/0.857/0.929 | ||
| 1.286/1.541/1.282/0.994 | 0.784/0.808/0.763/0.797 | 0.798/0.754/0.759/0.861 | 0.902/0.872/0.873/0.940 | ||
| 1.286/1.661/1.294/1.187 | 0.784/0.909/0.789/0.910 | 0.798/0.815/0.799/0.912 | 0.902/0.898/0.899/0.959 | ||
| 1.286/1.933/1.290/1.600 | 0.784/1.109/0.786/1.115 | 0.798/0.920/0.799/0.996 | 0.902/0.941/0.903/0.990 | ||
| 1.286/2.559/1.286/2.420 | 0.784/1.466/0.784/1.426 | 0.798/1.064/0.798/1.103 | 0.902/0.993/0.902/1.026 |
| Method | Direction A | Direction B |
|---|---|---|
| Trans-RR | 4.6230 (0.1732) | 4.7933 (0.2736) |
| Trans-RR-Ada | 4.6294 (0.1861) | 4.8211 (0.3757) |
| Pooled-RR | 5.0952 (0.0812) | 5.4909 (0.1335) |
| Trans-Lasso | 5.5668 (0.3650) | 5.6803 (0.3131) |
| Single-RR | 6.2666 (2.2628) | 6.8272 (2.2674) |
| Single-Lasso | 8.0672 (2.8904) | 8.5607 (3.3739) |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Lyu, L.; Guo, X.; Liu, Z. Transfer Learning for Moderate–Dimensional Ridge-Regularized Robust Linear Regression. Entropy 2026, 28, 543. https://doi.org/10.3390/e28050543
Lyu L, Guo X, Liu Z. Transfer Learning for Moderate–Dimensional Ridge-Regularized Robust Linear Regression. Entropy. 2026; 28(5):543. https://doi.org/10.3390/e28050543
Chicago/Turabian StyleLyu, Lingfeng, Xiao Guo, and Zongqi Liu. 2026. "Transfer Learning for Moderate–Dimensional Ridge-Regularized Robust Linear Regression" Entropy 28, no. 5: 543. https://doi.org/10.3390/e28050543
APA StyleLyu, L., Guo, X., & Liu, Z. (2026). Transfer Learning for Moderate–Dimensional Ridge-Regularized Robust Linear Regression. Entropy, 28(5), 543. https://doi.org/10.3390/e28050543

