# Information-Theoretic Analysis of Memoryless Deterministic Systems

^{1}

^{2}

^{*}

## Abstract

**:**

## 1. Introduction

#### Related Work

## 2. Definition and Elementary Properties of Information Loss and Relative Information Loss

#### 2.1. Notation and Preliminaries

#### 2.2. Information Loss

**Definition**

**1**(Information Loss)

**.**

**Proposition**

**1**(Information Loss of a Cascade)

**.**

**Proof.**

**Proposition**

**2**

**.**Let $g:\phantom{\rule{4pt}{0ex}}\mathcal{X}\to \mathcal{Y}$, $\mathcal{X}\subseteq {\mathbb{R}}^{N}$, and let the input RV X be such that its probability measure ${P}_{X}$ has an absolutely continuous component ${P}_{X}^{ac}\ll {\lambda}^{N}$ which is supported on $\mathcal{X}$. If there exists a set $B\subseteq \mathcal{Y}$ of positive ${P}_{Y}$-measure such that the preimage ${g}^{-1}(y)$ is uncountable for every $y\in B$, then

#### 2.3. Relative Information Loss

**Definition**

**2**(Relative Information Loss)

**.**

**Definition**

**3**

**.**The information dimension of an RV X is

**Proposition**

**3**

**.**Let X be an N-dimensional RV with positive information dimension $d(X)$ and finite $H({\widehat{X}}^{(0)})$. If $H({\widehat{X}}^{(0)}|Y=y)<\infty $ and $d(X|Y=y)$ exists ${P}_{Y}$-a.s., then the relative information loss equals

#### 2.4. Interplay between Information Loss and Relative Information Loss

**Proposition**

**4**

**.**Let X be such that $H(X)=\infty $ and let $l(X\to Y)>0$. Then, $L(X\to Y)=\infty $.

## 3. Information Loss for Piecewise Bijective Functions

**Definition**

**4**(Piecewise Bijective Function)

**.**

#### 3.1. Information Loss in PBFs

**Proposition**

**5**

**.**The information loss induced by a PBF is given as

#### 3.2. Upper Bounds on the Information Loss

**Proposition**

**6**

**.**The information loss induced by a PBF can be upper bounded by the following ordered set of inequalities:

#### 3.3. Reconstruction and Reconstruction Error Probability

**Definition**

**5**(Reconstructor & Reconstruction Error)

**.**

**Proposition**

**7**(MAP Reconstructor)

**.**

**Proof.**

**Definition**

**6**(Bijective Part)

**.**

**Proposition**

**8**(Fano-Type Bound)

**.**

**Proof.**

**Proposition**

**9**

**.**The information loss $L(X\to Y)$ in a PBF is lower bounded by the error probability ${P}_{e}$ of a MAP reconstructor via

## 4. Information Loss for Systems that Reduce Dimensionality

#### 4.1. Relative Information Loss for Continuous Input RVs

**Proposition**

**10**(Relative Information Loss in Dimensionality Reduction)

**.**

**Proof.**

**Corollary**

**1.**

**Corollary**

**2.**

#### 4.2. Reconstruction and Reconstruction Error Probability

**Proposition**

**11.**

**Proof.**

## 5. Some Examples from Signal Processing and Communications

#### 5.1. Quantizer

#### 5.2. Center Clipper

#### 5.3. Adding Two RVs

#### 5.4. Square-Law Device and Gaussian Input

#### 5.5. Polynomials

#### 5.6. Energy Detection of Communication Signals

#### 5.7. Principal Components Analysis and Dimensionality Reduction

## 6. Discussion and Outlook

## Acknowledgments

## Author Contributions

## Conflicts of Interest

## Abbreviations

MSRE | mean squared reconstruction error |

RV | random variable |

probability density function | |

MMSE | minimum mean squared error |

PBF | piecewise bijective function |

MAP | maximum a posteriori |

PCA | principal components analysis |

## Appendix A. Proof of Proposition 8

## Appendix B. Proof of Proposition 10

## Appendix C. Proof of Proposition 11

## References

- Oppenheim, A.V.; Schafer, R.W. Discrete-Time Signal Processing, 3rd ed.; Pearson: Upper Saddle River, NJ, USA, 2010. [Google Scholar]
- Khalil, H.K. Nonlinear Systems, 3rd ed.; Pearson: Upper Saddle River, NJ, USA, 2000. [Google Scholar]
- Manolakis, D.G.; Ingle, V.K.; Kogon, S.M. Statistical and Adaptive Signal Processing; Artech House: London, UK, 2005. [Google Scholar]
- Papoulis, A.; Pillai, U.S. Probability, Random Variables and Stochastic Processes, 4th ed.; McGraw Hill: New York, NY, USA, 2002. [Google Scholar]
- Gray, R.M. Entropy and Information Theory; Springer: New York, NY, USA, 1990. [Google Scholar]
- Guo, D.; Shamai, S.; Verdú, S. Mutual Information and Minimum Mean-Square Error in Gaussian Channels. IEEE Trans. Inf. Theory
**2005**, 51, 1261–1282. [Google Scholar] [CrossRef] - Bernhard, H.P. Tight Upper Bound on the Gain of Linear and Nonlinear Predictors. IEEE Trans. Signal Process.
**1998**, 46, 2909–2917. [Google Scholar] [CrossRef] - Geiger, B.C.; Kubin, G. On the Information Loss in Memoryless Systems: The Multivariate Case. In Proceedings of the International Zurich Seminar on Communications (IZS), Zurich, Switzerland, 29 February–2 March 2012; pp. 32–35.
- Geiger, B.C.; Kubin, G. Relative Information Loss in the PCA. In Proceedings of the IEEE Information Theory Workshop (ITW), Lausanne, Switzerland, 3–8 September 2012; pp. 562–566.
- Geiger, B.C.; Feldbauer, C.; Kubin, G. Information Loss in Static Nonlinearities. In Proceedings of the IEEE International Symposium on Wireless Communication Systems (ISWSC), Aachen, Germany, 6–9 November 2011; pp. 799–803.
- Geiger, B.C. Information Loss in Deterministic Systems. Ph.D. Thesis, Graz University of Technology, Graz, Austria, 2014. [Google Scholar]
- Pippenger, N. The Average Amount of Information Lost in Multiplication. IEEE Trans. Inf. Theory
**2005**, 51, 684–687. [Google Scholar] [CrossRef] - Watanabe, S.; Abraham, C.T. Loss and Recovery of Information by Coarse Observation of Stochastic Chain. Inf. Control
**1960**, 3, 248–278. [Google Scholar] [CrossRef] - Geiger, B.C.; Kubin, G. Some Results on the Information Loss in Dynamical Systems. In Proceedings of the IEEE International Symposium on Wireless Communication Systems (ISWSC), Aachen, Germany, 6–9 November 2011; pp. 794–798.
- Shannon, C.E. A Mathematical Theory of Communication. Bell Syst. Tech. J.
**1948**, 27, 379–423, 623–656. [Google Scholar] [CrossRef] - Sinanović, S.; Johnson, D.H. Toward a Theory of Information Processing. Signal Process.
**2007**, 87, 1326–1344. [Google Scholar] [CrossRef] - Akaike, H. Information Theory and an Extension of the Maximum Likelihood Principle. In Breakthroughs in Statistics: Foundations and Basic Theory; Kotz, S., Johnson, N.L., Eds.; Springer: New York, NY, USA, 1992. [Google Scholar]
- Akaike, H. A New Look at the Statistical Model Identification. IEEE Trans. Autom. Control
**1974**, 19, 716–723. [Google Scholar] [CrossRef] - Tishby, N.; Pereira, F.C.; Bialek, W. The Information Bottleneck Method. In Proceedings of the Allerton Conference on Communication, Control, and Computing, Monticello, IL, USA, 22–24 September 1999; pp. 368–377.
- Wohlmayr, M.; Markaki, M.; Stylianou, Y. Speech-Nonspeech Discrimination based on Speech-Relevant Spectrogram Modulations. In Proceedings of the European Signal Processing Conference (EUSIPCO), Poznan, Poland, 3–7 September 2007; pp. 1551–1555.
- Yella, S.; Bourlard, H. Information Bottleneck Based Speaker Diarization of Meetings Using Non-Speech as Side Information. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Florence, Italy, 4–9 May 2014; pp. 96–100.
- Zeitler, G.; Singer, A.C.; Kramer, G. Low-Precision A/D Conversion for Maximum Information Rate in Channels with Memory. IEEE Trans. Commun.
**2012**, 60, 2511–2521. [Google Scholar] [CrossRef] - Erdogmus, D.; Principe, J.C. An Error-Entropy Minimization Algorithm for Supervised Training of Nonlinear Adaptive Systems. IEEE Trans. Signal Process.
**2002**, 50, 1780–1786. [Google Scholar] [CrossRef] - Li, X.L.; Adalı, T. Complex-Valued Linear and Widely Linear Filtering Using MSE and Gaussian Entropy. IEEE Trans. Signal Process.
**2012**, 60, 5672–5684. [Google Scholar] - Cover, T.M.; Thomas, J.A. Elements of Information Theory, 2nd ed.; Wiley Interscience: Hoboken, NJ, USA, 2006. [Google Scholar]
- Polyanskiy, Y.; Wu, Y. Strong Data-Processing Inequalities for Channels and Bayesian Networks. 2016; arXiv:1508.06025v4. [Google Scholar]
- Baez, J.C.; Fritz, T.; Leinster, T. A Characterization of Entropy in Terms of Information Loss. Entropy
**2011**, 13, 1945–1957. [Google Scholar] [CrossRef] - Johnson, D.H. Information Theory and Neural Information Processing. IEEE Trans. Inf. Theory
**2010**, 56, 653–666. [Google Scholar] [CrossRef] - Geiger, B.C.; Kubin, G. Information Loss and Anti-Aliasing Filters in Multirate Systems. In Proceedings of the International Zurich Seminar on Communications (IZS), Zurich, Switzerland, 26–28 February 2014.
- Geiger, B.C.; Kubin, G. Signal Enhancement as Minimization of Relevant Information Loss. In Proceedings of the ITG Conference on Systems, Communication and Coding (SCC), Munich, Germany, 21–24 January 2013.
- Deco, G.; Obradovic, D. An Information-Theoretic Approach to Neural Computing; Springer: New York, NY, USA, 1996. [Google Scholar]
- Pinsker, M.S. Information and Information Stability of Random Variables and Processes; Holden Day: San Francisco, CA, USA, 1964. [Google Scholar]
- Rényi, A. On the Dimension and Entropy of Probability Distributions. Acta Math. Hung.
**1959**, 10, 193–215. [Google Scholar] [CrossRef] - Wu, Y.; Verdú, S. Rényi Information Dimension: Fundamental Limits of Almost Lossless Analog Compression. IEEE Trans. Inf. Theory
**2010**, 56, 3721–3748. [Google Scholar] [CrossRef] - Śmieja, M.; Tabor, J. Entropy of the Mixture of Sources and Entropy Dimension. IEEE Trans. Inf. Theory
**2012**, 58, 2719–2728. [Google Scholar] [CrossRef] - Wu, Y.; Verdú, S. MMSE Dimension. IEEE Trans. Inf. Theory
**2011**, 57, 4857–4879. [Google Scholar] [CrossRef] - Wiegerinck, W.; Tennekes, H. On the Information Flow for One-Dimensional Maps. Phys. Lett. A
**1990**, 144, 145–152. [Google Scholar] [CrossRef] - Ruelle, D. Positivity of Entropy Production in Nonequilibrium Statistical Mechanics. J. Stat. Phys.
**1996**, 85, 1–23. [Google Scholar] [CrossRef] - Karapistoli, E.; Pavlidou, F.N.; Gragopoulos, I.; Tsetsinas, I. An Overview of the IEEE 802.15.4a Standard. IEEE Commun. Mag.
**2010**, 48, 47–53. [Google Scholar] [CrossRef] - Ho, S.W.; Yeung, R. On the Discontinuity of the Shannon Information Measures. IEEE Trans. Inf. Theory
**2009**, 55, 5362–5374. [Google Scholar] - Feder, M.; Merhav, N. Relations Between Entropy and Error Probability. IEEE Trans. Inf. Theory
**1994**, 40, 259–266. [Google Scholar] [CrossRef] - Yeh, J. Lectures on Real Analysis; World Scientific Publishing: Singapore, 2000. [Google Scholar]
- Wu, Y. Shannon Theory for Compressed Sensing. Ph.D. Thesis, Princeton University, Princeton, NJ, USA, 2011. [Google Scholar]
- Vary, P.; Martin, R. Digital Speech Transmission: Enhancement, Coding and Error Concealment; John Wiley & Sons: Chichester, UK, 2006. [Google Scholar]
- Verdugo Lazo, A.C.; Rathie, P.N. On the Entropy of Continuous Probability Distributions. IEEE Trans. Inf. Theory
**1978**, 24, 120–122. [Google Scholar] [CrossRef] - Abramowitz, M.; Stegun, I.A. Handbook of Mathematical Functions with Formulas, Graphs, and Mathematical Tables, 9th ed.; Dover: New York, NY, USA, 1972. [Google Scholar]
- Linsker, R. Self-Organization in a Perceptual Network. IEEE Comput.
**1988**, 21, 105–117. [Google Scholar] [CrossRef] - Plumbley, M. On Information Theory and Unsupervised Neural Networks; University of Cambridge: Cambridge, UK, 1991. [Google Scholar]
- Rao, R.N. When Are the Most Informative Components for Inference Also the Principal Components? 2013; arXiv:1302.1231. [Google Scholar]
- Geiger, B.C.; Kubin, G. On the Rate of Information Loss in Memoryless Systems. 2013; arXiv:1304.5057. [Google Scholar]

**Figure 1.**Two different outputs of the rectifier, a (

**left**) and b (

**right**) with $a>b$. Both outputs lead to the same uncertainty about the input (and to the same reconstruction error probability), but to different mean squared reconstruction errors (MSREs): Assuming both possible inputs are equally probable, the MSREs are $2{a}^{2}>2{b}^{2}$. Energy and information behave differently.

**Figure 2.**Definition and properties of information loss. (

**a**) Model for computing the information loss in a memoryless input-output system g. Q is a quantizer with partition ${\mathcal{P}}_{n}$; (

**b**) The information loss of the cascade equals the sum of the individual information losses of the constituent systems.

**Figure 3.**The center clipper—an example for a system with infinite information loss and infinite information transfer.

**Figure 4.**Third-order polynomial of Section 5.5. (

**a**) The function and its MAP reconstructor indicated with a thick red line; (

**b**) Information loss as a function of input variance ${\sigma}^{2}$.

**Figure 5.**Constellation diagrams used in the example in Section 5.6: (

**a**) 16-PSK; (

**b**) 16-QAM; and (

**c**) circular 16-QAM.

**Figure 6.**Mutual information between the constellation points of Figure 5 (normalized to unit energy) and the noisy output of the energy detector ${Y}_{1}={\int}_{0}^{{T}_{I}}{(R(t)+N(t))}^{2}dt$. $N(t)$ is a Gaussian noise signal with standard deviation σ (see text). Note that the maximum mutual information in the noiseless case is bounded by $4-L((A,B)\to {Y}_{1})=4-L((A,B)\to Y)$ according to Table 1. The mutual information for 16-PSK with ${T}_{I}=1$ is zero and hence not depicted.

**Table 1.**Information loss $L((A,B)\to ({Y}_{1},\cdots ,{Y}_{1/{T}_{I}}))$ in the energy detector as a function of the constellation and the integration time ${T}_{I}$.

T_{I} | 1, 1/2, 1/4 | 1/3 |
---|---|---|

16-PSK | 4 | 1.75 |

16-QAM | 2.5 | 2 |

Circular 16-QAM | 2 | 1.5 |

**Table 2.**Comparison of results for some examples from Section 5. While there is a close connection between information loss and the reconstruction error probability (cf. Propositions 8 and 11), there is no apparent connection between information loss and the MSRE—energy and information behave inherently differently.

Example | MSRE | L(X → Y) | l(X → Y) | P_{e} |
---|---|---|---|---|

$Y={\widehat{X}}^{(n)}$, ${P}_{X}\ll \lambda $ | $\approx {2}^{-2n}/12$ | ∞ | 1 | 100% |

Center Clipper, ${P}_{X}\ll \lambda $ | ${P}_{X}(\mathcal{C})\mathbb{E}\left({X}^{2}|X\in \mathcal{C}\right)$ | ∞ | ${P}_{X}(\mathcal{C})$ | ${P}_{X}(\mathcal{C})$ |

$Y={X}_{1}+{X}_{2}$, ${P}_{{X}_{1},{X}_{2}}\ll {\lambda}^{2N}$ | – | ∞ | $1/2$ | 100% |

$Y={X}^{2}$, ${f}_{X}(x)={f}_{X}(-x)$ | $\mathbb{E}\left({X}^{2}\right)$ | 1 | 0 | 50% |

$Y={X}^{3}-100X$, X Gaussian | – | Figure 4b | 0 | $2Q\left(\frac{10}{\sqrt{3}\sigma}\right)-2Q\left(\frac{20}{\sqrt{3}\sigma}\right)$ |

PCA, ${P}_{X}\ll {\lambda}^{N}$ | min | ∞ | $\frac{N-M}{N}$ | 100% |

© 2016 by the authors; licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC-BY) license (http://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Geiger, B.C.; Kubin, G.
Information-Theoretic Analysis of Memoryless Deterministic Systems. *Entropy* **2016**, *18*, 410.
https://doi.org/10.3390/e18110410

**AMA Style**

Geiger BC, Kubin G.
Information-Theoretic Analysis of Memoryless Deterministic Systems. *Entropy*. 2016; 18(11):410.
https://doi.org/10.3390/e18110410

**Chicago/Turabian Style**

Geiger, Bernhard C., and Gernot Kubin.
2016. "Information-Theoretic Analysis of Memoryless Deterministic Systems" *Entropy* 18, no. 11: 410.
https://doi.org/10.3390/e18110410