# General Hyperplane Prior Distributions Based on Geometric Invariances for Bayesian Multivariate Linear Regression

## Abstract

**:**

## 1. Introduction

^{2}, kg).

## 2. Problem Statement

**z**

_{i}is the response vector,

**y**

_{i}the model value vector,

**x**

_{i}the vector of the L covariates for observation i,

**t**the intercept vector and

**A**the M × L-dimensional matrix of adjacent regression coefficients. The observation noise ϵ

_{i}of each data point is often considered as Gaussian distributed, ${\u03f5}_{i}~\text{N(0,}\sum \text{)}$. This regression model can also be considered as estimating the “best” L-dimensional hyperplane in an N-dimensional space, because in an N-dimensional space, an L-dimensional hyperplane is given by:

**A**) = F (a

_{11}, ⋯, a

_{ML}, t

_{1}, ⋯, t

_{M}|I) for the coefficients a

_{11}, ⋯, a

_{ML}, t

_{1}, ⋯, t

_{M}, which remains invariant under translations and rotations of the coordinate system.

## 3. Derivation

#### 3.1. Invariance under Translations

_{i}= 1, g

_{j,j}

_{≠}

_{i}= 0. Then, the equation in the primed variables reads:

**A**,

**t**) can be a function of $\overrightarrow{a}$ only. Since F (

**A**|I) does not depend on $\overrightarrow{t}$, the prior distribution is improper (not normalizable in

**t**) as long as there are no limits on the magnitude of

**t**.

_{i}results in the same conclusion.

#### 3.2. Invariance under Rotations

_{i}and e

_{j}simply as rotation in the x

_{i}x

_{j}-plane.

#### 3.2.1. Rotation in the x_{i}x_{j}-Plane

_{i}and e

_{j}, preserving all other coordinates: ${{x}^{\prime}}_{k}={x}_{k}\phantom{\rule{0.2em}{0ex}}\forall \phantom{\rule{0.2em}{0ex}}k\ne (j,i)$ and

#### 3.2.2. Rotation in the y_{i}y_{j}-Plane

_{i}and e

_{j}; thus ${{y}^{\prime}}_{k}={y}_{k}\phantom{\rule{0.2em}{0ex}}\forall \phantom{\rule{0.2em}{0ex}}k\ne (j,i)$ and:

#### 3.2.3. Rotation in a Plane Spanned by x_{i}y_{j}-Axes

## 4. The PDE System

_{1}, ⋯, t

_{M}, so F is of the form F(a

_{11}, ⋯, a

_{ML}|I). Rotation invariance with respect to the y-axis requires F to fulfill the homogeneous, linear system of first order partial differential equations (i, j ∈ [1, M], i ≠ j) (i.e., Equation (21)):

## 5. Solution

^{n}denotes a submatrix (minor) of size n × n (this notation is used at various places throughout the paper and should not be confused with the power of a matrix, which does not occur in this paper) and P = Min (M, L). Equation (34) does not appear unreasonable from the onset as prior density, because it preserves the underlying symmetry of the problem (permutation invariance of the parameters) and it is non-negative.

## 6. Proof

#### 6.1. Preliminaries

^{n}by deletion of the i-th row and j-th column (by definition M

^{0}:= 1). The cofactor matrix ${A}_{ij}^{n-1}$ is defined to be:

^{n}

^{+1}, resulting in the minor ${M}_{ki}^{n}$:

#### 6.2. x_{i}x_{j}- and y_{i}y_{j}-Rotations

**A**) via:

**A**) has a very simple form: it is given by a sum of positive terms. This almost decouples the problem, and we can largely proceed on a term-by-term basis. Using the equality:

#### 6.3. (x_{i}y_{j})-Rotations

#### 6.3.1. Matrices with Either Row j or Column i

^{m,r}of size m × m, m ∈ (1, 2, ⋯, P − 1) with label $r=1,2,\cdots ,\left(\begin{array}{c}M\\ m\end{array}\right)\left(\begin{array}{c}L\\ m\end{array}\right)-\left(\begin{array}{c}M-1\\ m\end{array}\right)\left(\begin{array}{c}L-1\\ m\end{array}\right)$ using the Laplace expansion (here, the expansion with respect to row j is shown):

**A**) in the last term of Equation (48).

#### 6.3.2. Matrices with Neither Row j nor Column i

_{ji}now only containing determinants with neither row j nor column i. If we now only consider the relevant term of H

_{ji}, we can write:

^{m}

^{+1}

^{,r}) by its Laplace expansion together with multiplication by (−1)

^{(}

^{i}

^{+}

^{j}

^{)}, the equation reads:

^{2}

^{i}equals one in the second term. Therefore, the third term cancels with the second term for k = j, and the remaining equation is given by:

_{ji}) of the second term in Equation (48). This schema can be repeated down to n = 1, and the last step (n = 0) is easily explicitly calculated. This finishes our derivation.

## 7. Relation to Previously-Derived Special Cases

_{1}

_{i}= 0, we obtain:

## 8. Practical Hints

## 9. Conclusions

## Conflicts of Interest

## Appendix

_{nm}is derived. A rotation perpendicular to the x

_{i}y

_{j}-plane relates x

_{i}, y

_{j}with ${{x}^{\prime}}_{i},{{y}^{\prime}}_{j}$ by:

_{j}, we obtain:

_{j}is given by:

_{n}by:

## References

- Gosling, J.P.; Oakley, J.E.; O’Hagan, A. Nonparametric elicitation for heavy-tailed prior distributions. Bayesian Anal
**2007**, 2, 693–718. [Google Scholar] - Jaynes, E.T. Prior Probabilities. IEEE Trans. Syst. Sci. Cybern
**1968**, SSC4, 227–241. [Google Scholar] - Kendall, M.; Moran, P. Geometrical Probability; Griffin: London, UK, 1963. [Google Scholar]
- Von der Linden, W.; Dose, V.; von Toussaint, U. Bayesian Probability Theory: Application to the Physical Sciences, 1st ed.; Cambridge University Press: Cambridge, UK, 2014. [Google Scholar]
- Von Toussaint, U.; Gori, S.; Dose, V. Bayesian Neural-Networks-Based Evaluation of Binary Speckle Data. Appl. Opt
**2004**, 43, 5356–5363. [Google Scholar] - Hinton, G.; Salakhutdinov, R. Reducing the Dimensionality of Data with Neural Networks. Science
**2006**, 313, 504–507. [Google Scholar] - Minh, V.; Kavukcuoglu, K.; Silver, D.; Rusu, A.A.; Veness, J.; Bellemare, M.G.; Graves, A.; Riedmiller, M.; Fidjeland, A.K.; Ostrovski, G.; et al. Human-level control through deep reinforcement learning. Nature
**2015**, 518, 529–533. [Google Scholar] - Dose, V. Hyperplane Priors. In Bayesian Inference and Maximum Entropy Methods in Science and Engineering, Volume AIP Conference Proceedings 659; Williams, C.J., Ed.; American Institute of Physics: Melville, NY, USA, 2003; pp. 350–357. [Google Scholar]
- Box, G.E.P.; Tiao, G.C. Bayesian Inference in Statistical Analysis; Wiley: New York, NY, USA, 1992; Reprint from 1973. [Google Scholar]
- Zellner, A. An Introduction to Bayesian Inference in Econometrics; Wiley: New York, NY, USA, 1971. [Google Scholar]
- West, M. Outlier Models and Prior Distributions in Bayesian Linear Regression. J. R. Stat. Soc. B
**1984**, 46, 431–439. [Google Scholar] - O’Hagan, A. Kendall’s Advanced Theory of Statistics, Bayesian Inference, 1st ed.; Arnold Publishers: New York, NY, USA, 1994; Volume 2B. [Google Scholar]
- Landau, L.; Lifschitz, E. Lehrbuch der Theoretischen Physik I, 1st ed.; Akademie Verlag: Berlin, Germany; 1962. [Google Scholar]

**Figure 1.**Comparison of two different priors. (

**a**) 15 random samples drawn from p (a|I) = 1/50, i.e., a uniform distribution in the slope with 0 ≤ a ≤ 50. (

**b**) the density p (a|I) ∼ (1 + a

^{2})

^{−}

^{3}

^{/}

^{2}, corresponding to a distribution uniform in the angle, is visualized by 15 samples.

**Figure 2.**Probability density of p (a

_{11}, a

_{21}| a

_{12}, a

_{22}, I) for a

_{12}= 3 and a

_{22}= 5 for the case N = 4, L = 2. The probability density exhibits the typical “Cauchy-”like shape with heavy tails compared to a binormal distribution. Due to the symmetry of the prior distribution, slices with respect to the other parameters display the same basic features.

© 2015 by the authors; licensee MDPI, Basel, Switzerland This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution license (http://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Von Toussaint, U.
General Hyperplane Prior Distributions Based on Geometric Invariances for Bayesian Multivariate Linear Regression. *Entropy* **2015**, *17*, 3898-3912.
https://doi.org/10.3390/e17063898

**AMA Style**

Von Toussaint U.
General Hyperplane Prior Distributions Based on Geometric Invariances for Bayesian Multivariate Linear Regression. *Entropy*. 2015; 17(6):3898-3912.
https://doi.org/10.3390/e17063898

**Chicago/Turabian Style**

Von Toussaint, Udo.
2015. "General Hyperplane Prior Distributions Based on Geometric Invariances for Bayesian Multivariate Linear Regression" *Entropy* 17, no. 6: 3898-3912.
https://doi.org/10.3390/e17063898