Sufficient Dimension Reduction: An Information-Theoretic Viewpoint
Abstract
:1. Introduction
- (a).
- We can view sufficient dimension reduction as a means of preserving information that is relevant to a response variable. It can be interpreted as performing the information bottleneck in two directions.
- (b).
- Conversely, we will see that the information bottleneck is performing sufficient dimension reduction in a certain sense.
- (c).
- By moving to mutual information, we can relax some of the distributional assumptions needed for sufficient dimension reduction in a manner different from that in [12,13,14,15,16]. This direction is a departure from the viewpoint that SDR serves as a means to estimate a target parameter, typically the span of basis vectors of the central subspace.
- (d).
- In the case of Gaussian variables, we can develop a method for identifying ‘phase transitions’ in the structural dimension of central subspaces by expanding the work of [17] to handle sufficient dimension reduction procedures.
2. Background and Preliminaries
2.1. Data Structures and Review of Dimension Reduction Methods
- (a).
- ‘Slice’ the response variable Y into J slices, denoted as ;
- (b).
- Standardize the predictor observations as
- (c).
- Calculate sample mean estimates within slices: where , ;
- (d).
- Estimate the population covariance matrix of Z given Y by
- (e).
- Compute the eigenvalues of . These are the estimates of the basis vectors for the central subspace.
2.2. Limitations of Sufficient Dimension Reduction
3. Graphical Models, Connections and Information Theoretic Results
- 1.
- , where .
- 2.
- .
- 3.
- .
- 4
- .
- 5.
- .
4. The Case of Gaussian Variables
5. Numerical Illustration
6. Discussion
- We can avoid the goal of SDR as estimating a parameter, namely the basis of the central subspace, and view it instead as a means for information compression while simultaneously preserving association with an outcome variable. This information-theoretic view can allow for one to relax distributional assumptions in a way that is different from the —field approach described in [16].
- By recognizing that the Gaussian bottleneck information theorem (Theorem 3.1 of [17]) is identical to solving a generalized eigenvalue problem, we can extend the results of [17] to a variety of sufficient dimension reduction methods. There, we see that the goals of information compression and central subspace dimension estimation are dual to each other.
Funding
Institutional Review Board Statement
Informed Consent Statement
Acknowledgments
Conflicts of Interest
References
- Li, B. Sufficient Dimension Reduction: Methods and Applications with R; CRC Press: Boca Raton, FL, USA, 2018. [Google Scholar]
- Brillinger, D.R. A generalized linear model with “Gaussian” regressor variables. In Selected Works of David Brillinger; Springer: Berlin/Heidelberg, Germany, 2012; pp. 589–606. [Google Scholar]
- Li, K.C.; Duan, N. Regression analysis under link violation. Ann. Stat. 1989, 17, 1009–1052. [Google Scholar] [CrossRef]
- Tishby, N.; Pereira, F.; Bialek, W.; Hajek, B.; Sreenivas, R. The informational bottleneck method. In Proceedings of the 37th Annual Allerton Conference on Communication, Control and Computing, Florham Park, NJ, USA, 30 September 1999; Available online: https://www.bibsonomy.org/bibtex/15bd5efbf394791da00b09839b9a5757 (accessed on 11 December 2021).
- Blahut, R. Computation of channel capacity and rate-distortion functions. IEEE Trans. Inf. Theory 1972, 18, 460–473. [Google Scholar] [CrossRef] [Green Version]
- Arimoto, S. An algorithm for computing the capacity of arbitrary discrete memoryless channels. IEEE Trans. Inf. Theory 1972, 18, 14–20. [Google Scholar] [CrossRef] [Green Version]
- Slonim, N.; Tishby, N. Agglomerative Information Bottleneck; ACM: New York, NY, USA, 1999; Volume 4. [Google Scholar]
- Slonim, N.; Tishby, N. Document clustering using word clusters via the information bottleneck method. In Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Athens, Greece, 24–28 July 2000; pp. 208–215. [Google Scholar]
- Slonim, N.; Friedman, N.; Tishby, N. Multivariate information bottleneck. Neural Comput. 2006, 18, 1739–1789. [Google Scholar] [CrossRef]
- Tishby, N.; Zaslavsky, N. Deep learning and the information bottleneck principle. In Proceedings of the 2015 IEEE Information Theory Workshop (ITW), Jerusalem, Israel, 26 April–1 May 2015; pp. 1–5. [Google Scholar]
- Saxe, A.M.; Bansal, Y.; Dapello, J.; Advani, M.; Kolchinsky, A.; Tracey, B.D.; Cox, D.D. On the information bottleneck theory of deep learning. J. Stat. Mech. Theory Exp. 2019, 2019, 124020. [Google Scholar] [CrossRef]
- Xia, Y.; Tong, H.; Li, W.K.; Zhu, L.X. An adaptive estimation of dimension reduction space. J. R. Stat. Soc. Ser. B 2002, 64, 299–346. [Google Scholar] [CrossRef]
- Fukumizu, K.; Bach, F.R.; Jordan, M.I. Dimensionality reduction for supervised learning with reproducing kernel Hilbert spaces. J. Mach. Learn. Res. 2004, 5, 73–99. [Google Scholar]
- Fukumizu, K.; Bach, F.R.; Jordan, M.I. Kernel dimension reduction in regression. Ann. Stat. 2009, 37, 1871–1905. [Google Scholar] [CrossRef] [Green Version]
- Li, B.; Artemiou, A.; Li, L. Principal support vector machines for linear and nonlinear sufficient dimension reduction. Ann. Stat. 2011, 39, 3182–3210. [Google Scholar] [CrossRef] [Green Version]
- Lee, K.Y.; Li, B.; Chiaromonte, F. A general theory for nonlinear sufficient dimension reduction: Formulation and estimation. Ann. Stat. 2013, 41, 221–249. [Google Scholar] [CrossRef]
- Chechik, G.; Globerson, A.; Tishby, N.; Weiss, Y. Information Bottleneck for Gaussian Variables. J. Mach. Learn. Res. 2005, 6, 165–188. [Google Scholar]
- Wang, Q.; Yin, X.; Critchley, F. Dimension reduction based on the Hellinger integral. Biometrika 2015, 102, 95–106. [Google Scholar] [CrossRef]
- Liese, F.; Vajda, I. On divergences and informations in statistics and information theory. IEEE Trans. Inf. Theory 2006, 52, 4394–4412. [Google Scholar] [CrossRef]
- Yin, X. Canonical correlation analysis based on information theory. J. Multivar. Anal. 2004, 91, 161–176. [Google Scholar] [CrossRef] [Green Version]
- Iaci, R.; Yin, X.; Sriram, T.; Klingenberg, C.P. An informational measure of association and dimension reduction for multiple sets and groups with applications in morphometric analysis. J. Am. Stat. Assoc. 2008, 103, 1166–1176. [Google Scholar] [CrossRef]
- Yin, X.; Sriram, T. Common canonical variates for independent groups using information theory. Stat. Sin. 2008, 18, 335–353. [Google Scholar]
- Xue, Y.; Wang, Q.; Yin, X. A unified approach to sufficient dimension reduction. J. Stat. Plan. Inference 2018, 197, 168–179. [Google Scholar] [CrossRef]
- Cook, R.D.; Ni, L. Sufficient dimension reduction via inverse regression: A minimum discrepancy approach. J. Am. Stat. Assoc. 2005, 100, 410–428. [Google Scholar] [CrossRef]
- Yao, W.; Nandy, D.; Lindsay, B.G.; Chiaromonte, F. Covariate information matrix for sufficient dimension reduction. J. Am. Stat. Assoc. 2019, 114, 1752–1764. [Google Scholar] [CrossRef]
- Lauritzen, S.L. Graphical Models; Clarendon Press: Oxford, UK, 1996; Volume 17. [Google Scholar]
- Ichimura, H. Semiparametric least squares (SLS) and weighted SLS estimation of single-index models. J. Econom. 1993, 58, 71–120. [Google Scholar] [CrossRef] [Green Version]
- Li, K.C. Sliced inverse regression for dimension reduction. J. Am. Stat. Assoc. 1991, 86, 316–327. [Google Scholar] [CrossRef]
- Cook, R.D. Regression Graphics: Ideas for Studying Regressions through Graphics; John Wiley & Sons: Hoboken, NJ, USA, 2009; Volume 482. [Google Scholar]
- Yin, X.; Li, B.; Cook, R.D. Successive direction extraction for estimating the central subspace in a multiple-index regression. J. Multivar. Anal. 2008, 99, 1733–1757. [Google Scholar] [CrossRef] [Green Version]
- Li, K.C. On principal Hessian directions for data visualization and dimension reduction: Another application of Stein’s lemma. J. Am. Stat. Assoc. 1992, 87, 1025–1039. [Google Scholar] [CrossRef]
- Wu, Q.; Liang, F.; Mukherjee, S. Kernel sliced inverse regression: Regularization and consistency. In Abstract and Applied Analysis; Hindawi: London, UK, 2013; Volume 2013. [Google Scholar]
- Hall, P.; Li, K.C. On almost linearity of low dimensional projections from high dimensional data. Ann. Stat. 1993, 21, 867–889. [Google Scholar] [CrossRef]
- Chiaromonte, F.; Cook, R.D.; Li, B. Sufficient dimension reduction in regressions with categorical predictors. Ann. Stat. 2002, 30, 475–497. [Google Scholar] [CrossRef]
- Pearl, J. Causality; Cambridge University Press: Cambridge, UK, 2009. [Google Scholar]
- Berge, C. Hypergraphs: Combinatorics of Finite Sets; Elsevier: Amsterdam, The Netherlands, 1984; Volume 45. [Google Scholar]
- Cover, T.M.; Thomas, J. Elements of Information Theory, 2nd ed.; John Wiley & Sons: Hoboken, NJ, USA, 2006. [Google Scholar]
- Berk, R.; Buja, A.; Brown, L.; George, E.; Kuchibhotla, A.K.; Su, W.; Zhao, L. Assumption lean regression. Am. Stat. 2019, 75, 76–84. [Google Scholar] [CrossRef] [Green Version]
- Ye, Z.; Weiss, R.E. Using the bootstrap to select one of a new class of dimension reduction methods. J. Am. Stat. Assoc. 2003, 98, 968–979. [Google Scholar] [CrossRef]
- Luo, W.; Li, B. Combining eigenvalues and variation of eigenvectors for order determination. Biometrika 2016, 103, 875–887. [Google Scholar] [CrossRef]
- Luo, W.; Li, B. On order determination by predictor augmentation. Biometrika 2021, 108, 557–574. [Google Scholar] [CrossRef]
- Barber, R.F.; Candès, E.J. Controlling the false discovery rate via knockoffs. Ann. Stat. 2015, 43, 2055–2085. [Google Scholar] [CrossRef] [Green Version]
- Substance Abuse and Mental Health Services Administration. Opioid Treatment Program (OTP) Guidance; Substance Abuse and Mental Health Services Administration: Rckville, MD, USA, 2020.
- Saxon, A.J.; Ling, W.; Hillhouse, M.; Thomas, C.; Hasson, A.; Ang, A.; Doraimani, G.; Tasissa, G.; Lokhnygina, Y.; Leimberger, J.; et al. Buprenorphine/naloxone and methadone effects on laboratory indices of liver health: A randomized trial. Drug Alcohol Depend. 2013, 128, 71–76. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Naik, P.; Tsai, C.L. Partial least squares estimator for single-index models. J. R. Stat. Soc. Ser. B 2000, 62, 763–771. [Google Scholar] [CrossRef]
- Li, K.C.; Shedden, K. Identification of shared components in large ensembles of time series using dimension reduction. J. Am. Stat. Assoc. 2002, 97, 759–765. [Google Scholar] [CrossRef]
- Cai, Z.; Li, R.; Zhu, L. Online Sufficient Dimension Reduction Through Sliced Inverse Regression. J. Mach. Learn. Res. 2020, 21, 1–25. [Google Scholar]
- Artemiou, A.; Dong, Y.; Shin, S.J. Real-time sufficient dimension reduction through principal least squares support vector machines. Pattern Recognit. 2021, 112, 107768. [Google Scholar] [CrossRef]
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Ghosh, D. Sufficient Dimension Reduction: An Information-Theoretic Viewpoint. Entropy 2022, 24, 167. https://doi.org/10.3390/e24020167
Ghosh D. Sufficient Dimension Reduction: An Information-Theoretic Viewpoint. Entropy. 2022; 24(2):167. https://doi.org/10.3390/e24020167
Chicago/Turabian StyleGhosh, Debashis. 2022. "Sufficient Dimension Reduction: An Information-Theoretic Viewpoint" Entropy 24, no. 2: 167. https://doi.org/10.3390/e24020167
APA StyleGhosh, D. (2022). Sufficient Dimension Reduction: An Information-Theoretic Viewpoint. Entropy, 24(2), 167. https://doi.org/10.3390/e24020167