# A High-Dimensional Counterpart for the Ridge Estimator in Multicollinear Situations

^{1}

^{2}

^{3}

^{4}

^{5}

^{*}

## Abstract

**:**

## 1. Introduction

#### An Existing Two-Parameter Biased Estimator

## 2. The Proposed Estimator

**(A1)**- $1/{k}_{p}=o\left(1\right)$. There exists a constant $0\le \delta <0.5$, such that a component of $\mathbf{X}$ is $O\left(\right)open="("\; close=")">{k}_{p}^{\delta}$.
**(A2)**- ${d}_{p}=o\left(1\right)$. There exists a constant $0\le \eta <0.5$, such that a component of ${\mathbf{X}}^{+}$ is $O\left(\right)open="("\; close=")">{d}_{p}^{-\eta}$.
**(A3)**- For sufficiently large p, there is a vector ${\mathbf{b}}_{p\times 1}$, such that $\mathit{\beta}={\mathbf{X}}^{\top}\mathbf{X}\mathbf{b}$, and there exists a constant $\epsilon >0$, such that each component of the vector ${\mathbf{b}}_{p\times 1}$ is $O(1/{p}^{\epsilon +1.5})$, and ${k}_{p}=o\left({p}^{\epsilon}{a}_{p}\right)$, with ${a}_{p}=o\left(1\right)$. (An example of such choice is ${k}_{p}=\sqrt{p}$ and $\epsilon =0.5+\delta $).
**(A4)**- For sufficiently large p, there exists a constant $\delta >0$, such that each component of $\mathit{\beta}$ is $O\left({p}^{-2-\delta}\right)$ and $1/{d}_{p}=o\left({p}^{\delta}\right)$. Further, ${k}_{p}^{\delta -1}=o\left({d}_{p}\right)$.

**(A3)**is adopted from Luo [11]. Let ${\widehat{\mathit{\beta}}}^{\mathrm{HD}}={\left(\right)}^{{\widehat{\mathit{\beta}}}_{1}^{\mathrm{HD}}}\top $.

**Theorem 1.**

**(A1)**and

**(A2)**. Then, $\mathrm{var}\left(\right)open="("\; close=")">{\widehat{\mathit{\beta}}}_{i}^{\mathrm{HD}}$ for all $i=1,\dots ,p$.

**Proof.**

**Theorem 2.**

**(A1)**–

**(A3)**. Further, suppose ${\lambda}_{ip}=O\left(\right)open="("\; close=")">{k}_{p}$, where ${\lambda}_{ip}>0$ is the ith eigenvalue of ${\mathbf{X}}^{\top}\mathbf{X}$. Then, $bias\left(\right)open="("\; close=")">{\widehat{\mathit{\beta}}}_{i}^{\mathrm{HD}}$ for all $i=1,2,\dots ,p$.

**Proof.**

**Theorem 3.**

**Proof.**

## 3. Generalized cross Validation

**Theorem 4.**

**Proof.**

## 4. Numerical Investigations

#### 4.1. Simulation

#### 4.2. Review of Results

- (1)
- The performance of the estimators is affected by the number of observations (n), the number of variables (p), the signal to noise ratio (c), and the degree of multicollinearity ($\rho $).
- (2)
- By increasing the degree of multicollinearity, $\rho $, although for both cases of error distributions, the QB of the proposed estimator increases for $c=0.5$ and 1, its MSE decreases dramatically since the RMSE increases.
- (3)
- The signal-to-noise shows the effect of $\mathit{\beta}$ in the model. Lower values (less than 1) are a sign of model sparsity, since, when c is small, the proposed estimator performs better than the ridge. This is evidence that our estimator is a better candidate as an alternative in sparse models in the MSE sense. However, the QB increases for large c values, which forces the model to overestimate the parameters.
- (4)
- As p increases, although the proposed estimator is superior to the ridge in sparse models (small c values), the efficiency decreases. This is more evident when the ratio $p/n$ becomes larger. This fact may come as poor performance of the proposed estimators, but our estimator is still preferred in high dimensions for sparse models.
- (5)
- Obviously, as n increases, so does the RMSE; however, the QB becomes very large, and it is due to the nature of the proposed estimators because of its complicated form. It must be noted that this does not contradict the results of Theorem 2, since the simulation scheme does not obey the regularity condition.
- (6)
- There is evidence of robustness for the distribution tail for sparse models, i.e., the QB and RMSE are the same for both normal and t distributions. However, as c increases, the QB of the proposed estimator explodes for the heavier tail distribution. This may be seen as a disadvantage of the proposed estimators, but even for large values of c, the RMSE stays the same, evidence of relatively small variance for the heavier tail distribution.

#### 4.3. AML Data Analysis

- (1)
- Using the GCV, the proposed estimator is shown to be consistently superior to the ridge estimator, relative to RMSE and RMPE criteria.
- (2)
- Similarly to the results of simulations, with growing p, the MSE of the proposed estimator increases compared to the ridge estimator. However, as p gets larger the mean prediction error becomes smaller, which shows the superiority for prediction purposes.

## 5. Conclusions

## Author Contributions

## Funding

## Institutional Review Board Statement

## Informed Consent Statement

## Data Availability Statement

## Acknowledgments

## Conflicts of Interest

## Appendix A. Proof of the Main Results

**Proof of Theorem 1.**

**(A1)**, $\mathbf{X}/{k}_{p}=O\left(1\right){k}_{p}^{\delta -1}=o\left(1\right)$ and ${\mathbf{X}}^{\top}\mathbf{X}/{k}_{p}+{\mathbf{I}}_{p}\to {\mathbf{I}}_{p}$. By

**(A2)**, ${d}_{p}{\mathbf{X}}^{+}=O\left(1\right){d}_{p}^{1-\eta}=o\left(1\right)$. Hence, $\mathrm{var}\left(\right)open="("\; close=")">{\widehat{\mathit{\beta}}}_{i}^{\mathrm{HD}}$ as $p\to \infty $, and the proof is complete. □

**Proof of Theorem 2.**

**(A2)**, ${d}_{p}{\mathbf{X}}^{+}\mathbf{X}=o\left(1\right)$. The proof is complete using Theorem 2 of Luo [11]. □

**Proof of Theorem 3.**

**(A1)**, ${\mathbf{X}}^{\top}\mathbf{X}/{k}_{p}+{\mathbf{I}}_{p}\to {\mathbf{I}}_{p}$, by

**(A2)**, ${d}_{p}{\mathbf{X}}^{+}\mathbf{X}=o\left(1\right)$, and by

**(A4)**, $\mathbf{X}/{k}_{p}{d}_{p}=o\left(1\right)$. Hence,

**Proof of Theorem 4.**

## References

- Hoerl, A.E.; Kennard, R.W. Ridge regression: Biased estimation for non-orthogonal problems. Technometrics
**1970**, 12, 55–67. [Google Scholar] [CrossRef] - Tikhonov, A.N. Solution of incorrectly formulated problems and the regularization method. Sov. Math. Dokl.
**1963**, 4, 1035–1038. [Google Scholar] - Saleh, A.K.M.E.; Arashi, M.; Kibria, B.M.G. Theory of Ridge Regression Estimation with Applications; John Wiley: Hoboken, NJ, USA, 2019. [Google Scholar]
- Wang, X.; Dunson, D.; Leng, C. No penalty no tears: Least squares in high-dimensional models. In Proceedings of the 33rd International Conference on Machine Learning, New York, NY, USA, 20–22 June 2016; pp. 1814–1822. [Google Scholar]
- Bühlmann, P. Statistical significance in high-dimensional linear models. Bernoulli
**2013**, 19, 1212–1242. [Google Scholar] [CrossRef] [Green Version] - Shao, J.; Deng, X. Estimation in high-dimensional linear models with deterministic design matrices. Ann. Stat.
**2012**, 40, 812–831. [Google Scholar] [CrossRef] [Green Version] - Dicker, L.H. Ridge regression and asymptotic minimum estimation over spheres of growing dimension. Bernoulli
**2016**, 22, 1–37. [Google Scholar] [CrossRef] - Liu, K. A new class of biased estimate in linear regression. Commun. Stat. Theory Methods
**1993**, 22, 393–402. [Google Scholar] - Ozkale, M.R.; Kaciranlar, S. The restricted and unrestricted two-parameter estimators. Commun. Stat. Theory Methods
**2007**, 36, 2707–2725. [Google Scholar] [CrossRef] - Wang, X.; Leng, C. High dimensional ordinary least squares projection for screening variables. J. R. Stat. Soc. Ser. B
**2015**. [Google Scholar] [CrossRef] [Green Version] - Luo, J. The discovery of mean square error consistency of ridge estimator. Stat. Probab. Lett.
**2010**, 80, 343–347. [Google Scholar] [CrossRef] - Amini, M.; Roozbeh, M. Optimal partial ridge estimation in restricted semiparametric regression models. J. Multivar. Anal.
**2015**, 136, 26–40. [Google Scholar] [CrossRef] - Akdeniz, F.; Roozbeh, M. Generalized difference-based weighted mixed almost unbiased ridge estimator in partially linear models. Stat. Pap.
**2019**, 60, 1717–1739. [Google Scholar] [CrossRef] - McDonald, G.C.; Galarneau, D.I. A Monte Carlo of Some Ridge-Type Estimators. J. Am. Stat. Assoc.
**1975**, 70, 407–416. [Google Scholar] [CrossRef] - Zhu, L.P.; Li, L.; Li, R.; Zhu, L.X. Model-free feature screening for ultrahigh dimensional data. J. Am. Stat. Assoc.
**2011**, 106, 1464–1475. [Google Scholar] [CrossRef] [PubMed] [Green Version] - Metzeler, K.H.; Hummel, M.; Bloomfield, C.D.; Spiekermann, K.; Braess, J.; Sauerl, M.C.; Heinecke, A.; Radmacher, M.; Marcucci, G.; Whitman, S.P.; et al. An 86 Probe Set Gene Expression Signature Predicts Survival in Cytogenetically Normal Acute Myeloid Leukemia. Blood
**2008**, 112, 4193–4201. [Google Scholar] [CrossRef] [PubMed] - Sill, M.; Hielscher, T.; Becker, N.; Zucknick, M. c060: Extended Inference for Lasso and Elastic-Net Regularized Cox and Generalized Linear Models; R Package Version 0.2-4; 2014. Available online: http://CRAN.R-project.org/package=c060 (accessed on 1 January 2021).
- Montgomery, D.C.; Peck, E.A.; Vining, G.G. Introduction to Linear Regression Analysis, 5th ed.; Wiley: Hoboken, NJ, USA, 2012. [Google Scholar]

$\mathit{\rho}=0.8$ | $\mathit{\rho}=0.95$ | |||||
---|---|---|---|---|---|---|

$\mathcal{N}(\mathbf{0},{\mathbf{I}}_{\mathit{n}})$ | ${\mathit{t}}_{\mathit{n}}(\mathbf{0},{\mathbf{I}}_{\mathit{n}},\mathbf{5})$ | $\mathcal{N}(\mathbf{0},{\mathbf{I}}_{\mathit{n}})$ | ${\mathit{t}}_{\mathit{n}}(\mathbf{0},{\mathbf{I}}_{\mathit{n}},\mathbf{5})$ | |||

$\mathit{p}$ | $\mathit{c}$ | $\mathit{n}$ | diff | diff | diff | diff |

256 | $0.5$ | 30 | 5.7657 | 5.7643 | 10.0535 | 10.1134 |

50 | 6.4911 | 6.4941 | 11.4722 | 11.5088 | ||

100 | 17.8314 | 17.8493 | 30.1008 | 30.4137 | ||

1 | 30 | 22.9671 | 487.4169 | 39.6556 | 459.3473 | |

50 | 25.8621 | 522.6298 | 45.1138 | 480.5501 | ||

100 | 70.8693 | 798.6551 | 118.1326 | 676.1919 | ||

2 | 30 | 91.7526 | 2413.9746 | 158.0026 | 2256.1664 | |

50 | 103.3996 | 2587.7382 | 179.8509 | 2357.4111 | ||

100 | 283.0549 | 3922.0057 | 470.4114 | 3259.2211 | ||

512 | $0.5$ | 30 | 3.1943 | 3.2012 | 6.5528 | 6.6001 |

50 | 4.4800 | 4.4781 | 9.5861 | 9.6151 | ||

100 | 10.2121 | 10.2489 | 20.1828 | 20.3366 | ||

1 | 30 | 12.7657 | 926.7540 | 26.0911 | 916.2663 | |

50 | 17.8861 | 1009.3595 | 38.0549 | 969.4353 | ||

100 | 40.7254 | 1192.4455 | 79.9094 | 1095.9628 | ||

2 | 30 | 51.0605 | 4621.0862 | 104.2892 | 4555.0569 | |

50 | 71.5157 | 5029.3107 | 151.9461 | 4809.2595 | ||

100 | 162.7616 | 5920.6337 | 318.7343 | 5397.7878 | ||

1024 | $0.5$ | 30 | 1.7594 | 1.7584 | 3.7384 | 3.7410 |

50 | 3.9188 | 3.9345 | 9.2523 | 9.3437 | ||

100 | 5.1236 | 5.1189 | 12.6469 | 12.6455 | ||

1 | 30 | 7.0318 | 1637.6798 | 14.8960 | 1636.5664 | |

50 | 15.6758 | 1804.8548 | 36.9649 | 1763.7468 | ||

100 | 20.4564 | 1940.6091 | 50.2993 | 1856.0197 | ||

2 | 30 | 28.1221 | 8181.4255 | 59.5312 | 8167.9835 | |

50 | 62.7157 | 9008.4246 | 147.8715 | 8781.1968 | ||

100 | 81.7756 | 9682.7404 | 147.8715 | 9229.7803 |

$\mathit{\rho}=0.8$ | $\mathit{\rho}=0.95$ | |||||
---|---|---|---|---|---|---|

$\mathcal{N}(\mathbf{0},{\mathbf{I}}_{\mathit{n}})$ | ${\mathit{t}}_{\mathit{n}}(\mathbf{0},{\mathbf{I}}_{\mathit{n}},\mathbf{5})$ | $\mathcal{N}(\mathbf{0},{\mathbf{I}}_{\mathit{n}})$ | ${\mathit{t}}_{\mathit{n}}(\mathbf{0},{\mathbf{I}}_{\mathit{n}},\mathbf{5})$ | |||

$\mathit{p}$ | $\mathit{c}$ | $\mathit{n}$ | RMSE | RMSE | RMSE | RMSE |

256 | $0.5$ | 30 | 1.0050 | 1.0050 | 1.0140 | 1.0139 |

50 | 1.0058 | 1.0058 | 1.0161 | 1.0160 | ||

100 | 1.0222 | 1.0222 | 1.0543 | 1.0539 | ||

1 | 30 | 1.0032 | 1.0032 | 1.0179 | 1.0178 | |

50 | 1.0039 | 1.0039 | 1.0209 | 1.0209 | ||

100 | 1.0221 | 1.0220 | 1.0883 | 1.0876 | ||

2 | 30 | 0.9816 | 0.9816 | 0.9852 | 0.9851 | |

50 | 0.9793 | 0.9793 | 0.9829 | 0.9829 | ||

100 | 0.9434 | 0.9435 | 0.9587 | 0.9584 | ||

512 | $0.5$ | 30 | 1.0011 | 1.0011 | 1.0031 | 1.0031 |

50 | 1.0016 | 1.0016 | 1.0048 | 1.0048 | ||

100 | 1.0041 | 1.0041 | 1.0119 | 1.0119 | ||

1 | 30 | 1.0004 | 1.0004 | 1.0029 | 1.0029 | |

50 | 1.0007 | 1.0007 | 1.0048 | 1.0048 | ||

100 | 1.0023 | 1.0023 | 1.0139 | 1.0139 | ||

2 | 30 | 0.9948 | 0.9948 | 0.9924 | 0.9924 | |

50 | 0.9929 | 0.9929 | 0.9895 | 0.9895 | ||

100 | 0.9843 | 0.9843 | 0.9810 | 0.9809 | ||

1024 | $0.5$ | 30 | 1.0003 | 1.0003 | 1.0009 | 1.0009 |

50 | 1.0007 | 1.0007 | 1.0022 | 1.0022 | ||

100 | 1.0009 | 1.0009 | 1.0031 | 1.0031 | ||

1 | 30 | 1.0001 | 1.0001 | 1.0006 | 1.0006 | |

50 | 1.0002 | 1.0002 | 1.0017 | 1.0017 | ||

100 | 1.0003 | 1.0003 | 1.0025 | 1.0025 | ||

2 | 30 | 0.9984 | 0.9984 | 0.9973 | 0.9973 | |

50 | 0.9964 | 0.9964 | 0.9933 | 0.9933 | ||

100 | 0.9954 | 0.9954 | 0.9910 | 0.9911 |

Criterion | $\mathit{p}=1000$ | $\mathit{p}=2000$ |
---|---|---|

${\mathrm{RMPE}}_{\mathrm{boot}}$ | $1.001981$ | $1.002278$ |

${\mathrm{RMSE}}_{\mathrm{boot}}$ | $1.046073$ | $1.039997$ |

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |

© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Arashi, M.; Norouzirad, M.; Roozbeh, M.; Khan, N.M.
A High-Dimensional Counterpart for the Ridge Estimator in Multicollinear Situations. *Mathematics* **2021**, *9*, 3057.
https://doi.org/10.3390/math9233057

**AMA Style**

Arashi M, Norouzirad M, Roozbeh M, Khan NM.
A High-Dimensional Counterpart for the Ridge Estimator in Multicollinear Situations. *Mathematics*. 2021; 9(23):3057.
https://doi.org/10.3390/math9233057

**Chicago/Turabian Style**

Arashi, Mohammad, Mina Norouzirad, Mahdi Roozbeh, and Naushad Mamode Khan.
2021. "A High-Dimensional Counterpart for the Ridge Estimator in Multicollinear Situations" *Mathematics* 9, no. 23: 3057.
https://doi.org/10.3390/math9233057