# Statistical Information: A Bayesian Perspective

^{1}

^{2}

^{*}

## Abstract

**:**

## 1. Introduction

“Information is what it does for you, it changes your opinion”.

- Information about what?

- Where is the information?

- How is information extracted?

- How much information is extracted?

## 2. Definitions

## 3. Information after an Experiment is Performed

#### 3.1. Statistical Principles and Information

**Theorem 1**The Sufficiency and the Conditionality Principles hold iff the Likelihood Principle holds.

#### 3.2. Information in the Observation

i/j | 0 | 1 | 2 |
---|---|---|---|

1 | 0.33 | 0.67 | - |

2 | 0.20 | 0.50 | 0.80 |

3 | 0 | 0.50 | 1 |

- The Euclidean distance: $In{f}_{E}({X}_{i},{x}_{i},\theta )=\sqrt{\sum _{j}{\left[P(\theta =j)-P(\theta =j|{X}_{i}={x}_{i})\right]}^{2}}$.
- $In{f}_{V}({X}_{i},{x}_{i},\theta )={[E\left(\theta \right|{X}_{i}={x}_{i})-E\left(\theta \right)]}^{2}$.
- Kullback–Leibler divergence: $In{f}_{KL}({X}_{i},{x}_{i},\theta )=\sum _{j}P(\theta =j|{X}_{i}={x}_{i})log\left(\frac{P(\theta =j|{X}_{i}={x}_{i})}{P(\theta =j)}\right)$.

**Table 2.**From left to right, tables for $In{f}_{E}({X}_{i},j,\theta )$, $In{f}_{V}({X}_{i},j,\theta )$ and $In{f}_{KL}({X}_{i},j,\theta )$.

i/j | 0 | 1 | 2 | i/j | 0 | 1 | 2 | i/j | 0 | 1 | 2 |
---|---|---|---|---|---|---|---|---|---|---|---|

1 | 0.23 | 0.23 | - | 1 | 0.03 | 0.03 | - | 1 | 0.02 | 0.02 | - |

2 | 0.42 | 0 | 0.42 | 2 | 0.09 | 0 | 0.09 | 2 | 0.08 | 0 | 0.08 |

3 | 0.7 | 0 | 0.7 | 3 | 0.25 | 0 | 0.25 | 3 | 0.30 | 0 | 0.30 |

## 4. Information before an Experiment is Performed

#### 4.1. Blackwell Sufficiency

- For any $y\in \mathcal{X}\left(Y\right)$, $H(\xb7,y)$ is measurable on the σ-algebra induced by X, ${\Im}_{|X}$.
- For any $x\in \mathcal{X}\left(X\right)$, $H(x,\xb7)$ is a probability (density) function defined on $(\mathcal{X}\left(Y\right),{\Im}_{|Y})$.
- For any $y\in \mathcal{X}\left(Y\right)$, $p\left(y\right|\theta )=E(H(X,y)\left|\theta \right)$, the conditional expectation of $H(X,y)$ given θ.

**Example 1**Let X and Y be two experiments, π a quantity of interest in $[0,1]$ and q and p known constants in $[0,1]$. Representing the Bernoulli distribution with parameter p by Ber$\left(p\right)$, consider also that the conditional distributions of X and Y given π are, respectively:

**Example 2**Next, we generalize the example of Section 3.2. Consider an urn with N balls. θ of these balls are black and $N-\theta $ are white. $n\phantom{\rule{3.33333pt}{0ex}}(\le N)$ balls are drawn from the urn.

- Conditionally on θ, ${X}_{1}\sim \mathrm{Ber}\left(\frac{\theta}{N}\right)$;
- Conditionally on θ, ${X}_{1},\dots ,{X}_{n}$ are identically distributed;
- ${X}_{i+1}$ is conditionally independent of $({X}_{i},\dots ,{X}_{1})$ given θ, $\forall i\in \{1,\dots ,n-1\}$.

- Conditionally on θ, ${Y}_{1}\sim \mathrm{Ber}\left(\frac{\theta}{N}\right)$;
- ${Y}_{i+1}|({y}_{i},\dots ,{y}_{1},\theta )\sim \mathrm{Ber}\left(\frac{\theta -{\sum}_{j=1}^{i}{y}_{j}}{N-i}\right)$,$\forall i\in \{1,\dots ,n-1\}$, $\forall ({y}_{i},\dots ,{y}_{1})\in {\{0,1\}}^{i}$.

- ${A}_{i+1}\sim \mathrm{Ber}\left(\frac{N-i}{N}\right)$, and is independent of all other variables;
- ${B}_{i+1}|{T}_{i}={t}_{i}\sim \mathrm{Ber}\left(\frac{{t}_{i}}{i}\right)$;
- $\forall i\in \{1,\dots ,n\}$, conditionally on ${T}_{i}={t}_{i}$, ${B}_{i}$ is jointly independent of $({A}_{1},\dots ,{A}_{n}),({B}_{1},\dots ,{B}_{i-1}),({Y}_{i+1},\dots ,{Y}_{n})$ and θ.

#### 4.2. Equivalence Relation in Experiment Information

**Theorem 2**Let $X\in \mathcal{X}$ and $Y\in \mathcal{Y}$ be two experiments. $X\approx Y$ iff, for every likelihood function $L(\xb7)$,

**Lemma 1**Consider a Markov Chain on a countable space $\mathcal{X}$ with a transition matrix M and no transient states. Let M have irreducible components $C\left(1\right)$, …, $C\left(n\right)$, …. Then, there exists an unique set of probability functions $\{{p}_{j}(\xb7):j\in N\}$, with ${p}_{j}\left(x\right)$ defined in $\{1,\dots ,|C\left(j\right)\left|\right\}$, such that all invariant measures (μ) of M can be written as the following:

For any information function, $Inf$, satisfying the Likelihood Principle — if x and y yield proportional likelihood functions, then $Inf(X,x,\theta )=Inf(Y,y,\theta )$ —, X is Blackwell Equivalent to Y, if and only if, the distribution of $(Inf,\theta )$ for X and Y are the same.

#### 4.3. Experiment Information Function

## 5. Conclusions

## Acknowledgments

## References

- Basu, D. A Note on Likelihood. In Statistical Information and Likelihood : A Collection of Critical Essays; Ghosh, J.K., Ed.; Springer: Berlin, Germany, 1988. [Google Scholar]
- Torgersen, E.N. Comparison of Experiments; Cambridge Press: Cambridge, UK, 1991. [Google Scholar]
- Birnbaum, A. On the foundations of statistical inference. J. Am. Stat. Assoc.
**1962**, 57, 269–326. [Google Scholar] [CrossRef] - Basu, D. Statistical information and likelihood: a collection of critical essays. In Statistical Information & Likelihood; Ghosh, J.K., Ed.; Springer: Berlin, Germany, 1988. [Google Scholar]
- Lindley, D.V.; Philips, L.D. Inference for a bernoulli process (a bayesian view). Am. Stat.
**1976**, 30, 112–119. [Google Scholar] - Pereira, C.; Lindley, D.V. Examples questioning the use of partial likelihood. Statistician
**1987**, 36, 15–20. [Google Scholar] [CrossRef] - Wechsler, S.; Pereira, C.; Marques, P. Birnbaum’s theorem redux. AIP Conf. Proc.
**2008**, 1073, 96–100. [Google Scholar] - Blackwell, D. Comparison of experiments. In Proceedings of the Second Berkeley Symposium on Mathematical Statistics and Probability, Berkeley, CA, USA, 31 July–12 August 1950.
- Basu, D.; Pereira, C. Blackwell sufficiency and bernoulli experiments. Braz. J. Prob. Stat.
**1990**, 4, 137–145. [Google Scholar] - Goel, P.K.; Ginebra, J. When is one experiment "always better than" another? Statistician
**2003**, 52, 515–537. [Google Scholar] [CrossRef] - Basu, D.; Pereira, C. A note on blackwell sufficiency and skibinsky characterization of distributions. Sankhya A
**1983**, 45, 99–104. [Google Scholar] - Ferrari, P.; Galves, A. Coupling and Regeneration for Stochastic Processes; Sociedad Venezoelana de Matematicas: Caracas, Venezuela, 2000. [Google Scholar]
- DeGroot, M.H. Optimal Statistical Decisions; Wiley: New York, NY, USA, 1970. [Google Scholar]
- DeGroot, M.H. Uncertainty, information, and sequential experiments. Ann. Math. Stat.
**1971**, 33, 404–419. [Google Scholar] [CrossRef]

© 2012 by the authors; licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution license (http://creativecommons.org/licenses/by/3.0/).

## Share and Cite

**MDPI and ACS Style**

Stern, R.B.; Pereira, C.A.d.B. Statistical Information: A Bayesian Perspective. *Entropy* **2012**, *14*, 2254-2264.
https://doi.org/10.3390/e14112254

**AMA Style**

Stern RB, Pereira CAdB. Statistical Information: A Bayesian Perspective. *Entropy*. 2012; 14(11):2254-2264.
https://doi.org/10.3390/e14112254

**Chicago/Turabian Style**

Stern, Rafael B., and Carlos A. de B. Pereira. 2012. "Statistical Information: A Bayesian Perspective" *Entropy* 14, no. 11: 2254-2264.
https://doi.org/10.3390/e14112254