# Divergence and Sufficiency for Convex Optimization

## Abstract

**:**

## 1. Introduction

## 2. Structure of the State Space

## 3. Optimization

**Definition**

**1.**

**Proposition**

**1.**

- ${D}_{F}\left(s,a\right)\ge 0$ with equality if a is optimal for s.
- $s\to {D}_{F}\left(s,a\right)$ is a convex function.
- If $\overline{a}$ is optimal for the state $\overline{s}=\sum {t}_{i}\xb7{s}_{i}$ where $\left({t}_{1},{t}_{2},\dots ,{t}_{\ell}\right)$ is a probability vector then$$\sum {t}_{i}\xb7{D}_{F}\left({s}_{i},a\right)=\sum {t}_{i}\xb7{D}_{F}\left({s}_{i},\overline{a}\right)+{D}_{F}\left(\overline{s},a\right).$$
- $\sum {t}_{i}\xb7{D}_{F}\left({s}_{i},a\right)$ is minimal if a is optimal for $\overline{s}=\sum {t}_{i}\xb7{s}_{i}$.

**Definition**

**2.**

**Proposition**

**2.**

**Proposition**

**3.**

- ${D}_{F}\left({s}_{1},{s}_{0}\right)\ge 0$ with equality if there exists an action a that is optimal for both ${s}_{1}$ and ${s}_{0}$.
- ${s}_{1}\to {D}_{F}\left({s}_{1},{s}_{0}\right)$ is a convex function.

- ${D}_{F}\left({s}_{1},{s}_{0}\right)=0$ implies ${s}_{1}={s}_{0}$.
- The function F is strictly convex.

**Example**

**1.**

**Proposition**

**4.**

- The function F is differentiable in the interior of any face of $\mathcal{S}$.
- The regret ${D}_{F}$ is a Bregman divergence.
- The Bregman identity (5) is always satisfied.
- For any probability vectors $\left({t}_{1},{t}_{2},\dots ,{t}_{n}\right)$ the sum $\sum {t}_{i}\xb7{D}_{F}\left({s}_{i},s\right)$ is always minimal when $s=\sum {t}_{i}\xb7{s}_{i}$.

## 4. Examples

#### 4.1. Information Theory

#### 4.2. Scoring Rules

**Example**

**2.**

#### 4.3. Statistical Mechanics

#### 4.4. Portfolio Theory

**Example**

**3.**

**Definition**

**3.**

**Example**

**4.**

**Lemma**

**1.**

**Proof.**

**Theorem**

**1.**

**Proof.**

## 5. Sufficiency Conditions

**Theorem**

**2.**

- The function F equals entropy times a negative constant plus an affine function.
- The regret ${D}_{F}$ is proportional to information divergence.
- The regret is monotone.
- The regret satisfies sufficiency.
- The regret is local.

#### 5.1. Entropy and Information Divergence

**Definition**

**4.**

**Definition**

**5.**

#### 5.2. Monotonicity

**Proposition**

**5**(The principle of lost opportunities).

**Proof.**

**Corollary**

**1**(Semi-monotonicity)

**Proof.**

**Definition**

**6.**

**Proposition**

**6.**

**Proof.**

**Theorem**

**3.**

**Proof.**

#### 5.3. Sufficiency

**Definition**

**7.**

**Proposition**

**7.**

**Proof.**

**Definition**

**8.**

**Proposition**

**8.**

**Proof.**

#### 5.4. Locality

**Definition**

**9.**

**Example**

**5.**

**Proposition**

**9.**

**Proof.**

**Theorem**

**4.**

**Proof.**

## 6. Applications

#### 6.1. Information Theory

**Theorem**

**5.**

**Proof.**

#### 6.2. Statistics

**Definition**

**10.**

**Theorem**

**6.**

**Proof.**

**Corollary**

**2.**

**Example**

**6.**

**Example**

**7.**

#### 6.3. Statistical Mechanics

#### 6.4. Monotone Regret for Portfolios

**Theorem**

**7.**

**Proof.**

**Example**

**8.**

**Corollary**

**3.**

**Proof.**

## 7. Concluding Remarks

## Acknowledgments

## Conflicts of Interest

## References

- Kullback, S.; Leibler, R. On Information and Sufficiency. Ann. Math. Stat.
**1951**, 22, 79–86. [Google Scholar] [CrossRef] - Jaynes, E.T. Information Theory and Statistical Mechanics, I. Phys. Rev.
**1957**, 106, 620–630. [Google Scholar] [CrossRef] - Jaynes, E.T. Information Theory and Statistical Mechanics, II. Phys. Rev.
**1957**, 108, 171–190. [Google Scholar] [CrossRef] - Jaynes, E.T. Clearing up mysteries—The original goal. In Maximum Entropy and Bayesian Methods; Skilling, J., Ed.; Kluwer: Dordrecht, The Netherlands, 1989. [Google Scholar]
- Liese, F.; Vajda, I. Convex Statistical Distances; Teubner: Leipzig, Germany, 1987. [Google Scholar]
- Barron, A.R.; Rissanen, J.; Yu, B. The Minimum Description Length Principle in Coding and Modeling. IEEE Trans. Inf. Theory
**1998**, 44, 2743–2760. [Google Scholar] [CrossRef] - Csiszár, I.; Shields, P. Information Theory and Statistics: A Tutorial; Foundations and Trends in Communications and Information Theory; Now Publishers Inc.: Delft, The Netherlands, 2004. [Google Scholar]
- Grünwald, P.D.; Dawid, A.P. Game Theory, Maximum Entropy, Minimum Discrepancy, and Robust Bayesian Decision Theory. Ann. Math. Stat.
**2004**, 32, 1367–1433. [Google Scholar] - Grünwald, P. The Minimum Description Length Principle; MIT Press: Cambridge, MA, USA, 2007. [Google Scholar]
- Holevo, A.S. Probabilistic and Statistical Aspects of Quantum Theory; North-Holland Series in Statistics and Probability; North-Holland: Amsterdam, The Netherlands, 1982; Volume 1. [Google Scholar]
- Krumm, M.; Barnum, H.; Barrett, J.; Müller, M. Thermodynamics and the structure of quantum theory. arXiv, 2016; arXiv:1608.04461. [Google Scholar]
- Barnum, H.; Müller, M.P.; Ududec, C. Higher-order interference and single-system postulates characterizing quantum theory. New J. Phys.
**2014**, 16, 123029. [Google Scholar] [CrossRef] - Harremoës, P. Maximum Entropy and Sufficiency. arXiv, 2016; arXiv:1607.02259. [Google Scholar]
- Harremoës, P. Quantum information on Spectral Sets. arXiv, 2017; arXiv:1701.06688. [Google Scholar]
- Barnum, H.; Lee, C.M.; Scandolo, C.M.; Selby, J.H. Ruling out higher-order interference from purity principles. arXiv, 2017; arXiv:1704.05106. [Google Scholar]
- Servage, L.J. The Theory of Statistical Decision. J. Am. Stat. Assoc.
**1951**, 46, 55–67. [Google Scholar] - Bell, D.E. Regret in decision making under uncertainty. Oper. Res.
**1982**, 30, 961–981. [Google Scholar] [CrossRef] - Fishburn, P.C. The Foundations of Expected Utility; Springer: Berlin/Heidelberg, Germany, 1982. [Google Scholar]
- Loomes, G.; Sugden, R. Regret theory: An alternative theory of rational choice under uncertainty. Econ. J.
**1982**, 92, 805–824. [Google Scholar] [CrossRef] - Bikhchandani, S.; Segal, U. Transitive regret. Theor. Econ.
**2011**, 6, 95–108. [Google Scholar] [CrossRef] - Kiwiel, K.C. Proximal Minimization Methods with Generalized Bregman Functions. SIAM J. Control Optim.
**1997**, 35, 1142–1168. [Google Scholar] [CrossRef] - Kiwiel, K.C. Free-steering Relaxation Methods for Problems with Strictly Convex Costs and Linear Constraints. Math. Oper. Res.
**1997**, 22, 326–349. [Google Scholar] [CrossRef] - Rockafellar, R.T. Convex Analysis; Princeton University Press: Princeton, NJ, USA, 1970. [Google Scholar]
- Hendrickson, A.D.; Buehler, R.J. Proper scores for probability forecasters. Ann. Math. Stat.
**1971**, 42, 1916–1921. [Google Scholar] [CrossRef] - Rao, C.R.; Nayak, T.K. Cross Entropy, Dissimilarity Measures, and Characterizations of Quadratic Entropy. IEEE Trans. Inf. Theory
**1985**, 31, 589–593. [Google Scholar] [CrossRef] - Banerjee, A.; Merugu, S.; Dhillon, I.S.; Ghosh, J. Clustering with Bregman Divergences. J. Mach. Learn. Res.
**2005**, 6, 1705–1749. [Google Scholar] - Kraft, L.G. A Device for Quanitizing, Grouping and Coding Amplitude Modulated Pulses. Master’s Thesis, Department of Electrical Engineering, MIT University, Cambridge, MA, USA, 1949. [Google Scholar]
- Han, T.S.; Kobayashi, K. Mathematics of Information and Coding; Translations of Mathematical Monographs; American Mathematical Society: Providence, RI, USA, 2002; Volume 203. [Google Scholar]
- De Finetti, B. Theory of Probability; Wiley: Hoboken, NJ, USA, 1974. [Google Scholar]
- McCarthy, J. Measures of the value of information. Proc. Natl. Acad. Sci. USA
**1956**, 42, 654–655. [Google Scholar] [CrossRef] [PubMed] - Gneiting, T.; Raftery, A.E. Strictly Proper Scoring Rules, Prediction, and Estimation. J. Am. Stat. Assoc.
**2007**, 102, 359–378. [Google Scholar] [CrossRef] - Ovcharov, E.Y. Proper Scoring Rules and Bregman Divergences. arXiv, 2015; arXiv:1502.01178. [Google Scholar]
- Gundersen, T. An Introduction to the Concept of Exergy and Energy Quality; Lecture notes; Norwegian University of Science and Technology: Trondheim, Norway, 2011. [Google Scholar]
- Harremoës, P. Time and Conditional Independence; IMFUFA-Tekst; IMFUFA Roskilde University: Roskilde, Denmark, 1993; Volume 255. [Google Scholar]
- Kelly, J.L. A New Interpretation of Information Rate. Bell Syst. Tech. J.
**1956**, 35, 917–926. [Google Scholar] [CrossRef] - Cover, T.M.; Thomas, J.A. Elements of Information Theory; Wiley: Hoboken, NJ, USA, 1991. [Google Scholar]
- Cover, T.M. Universal portfolios. Math. Finance
**1991**, 1, 1–29. [Google Scholar] [CrossRef] - Uhlmann, A. On the Shannon Entropy and Related Functionals on Convex Sets. Rep. Math. Phys.
**1970**, 1, 147–159. [Google Scholar] [CrossRef] - Müller-Hermes, A.; Reeb, D. Monotonicity of the Quantum Relative Entropy under Positive Maps. Annales Henri Poincaré
**2017**, 18, 1777–1788. [Google Scholar] [CrossRef] - Christandl, M.; Müller-Hermes, A. Relative Entropy Bounds on Quantum, Private and Repeater Capacities. arXiv, 2016; arXiv:1604.03448. [Google Scholar]
- Petz, D. Monotonicity of Quantum Relative Entropy Revisited. Rev. Math. Phys.
**2003**, 15, 79–91. [Google Scholar] [CrossRef] - Petz, D. Sufficiency of Channels over von Neumann algebras. Q. J. Math. Oxf.
**1988**, 39, 97–108. [Google Scholar] [CrossRef] - Jenčová, A.; Petz, D. Sufficiency in quantum statistical inference. Commun. Math. Phys.
**2006**, 263, 259–276. [Google Scholar] [CrossRef] - Harremoës, P.; Tishby, N. The Information Bottleneck Revisited or How to Choose a Good Distortion Measure. In Proceedings of the IEEE International Symposium on Information Theory, Nice, France, 24–29 June 2007; pp. 566–571. [Google Scholar]
- Jiao, J.; Courtade, T.A.; No, A.; Venkat, K.; Weissman, T. Information Measures: The Curious Case of the Binary Alphabet. IEEE Trans. Inf. Theory
**2014**, 60, 7616–7626. [Google Scholar] [CrossRef] - Jenčová, A. Preservation of a quantum Rényi relative entropy implies existence of a recovery map. J. Phys. A Math. Theor.
**2017**, 50, 085303. [Google Scholar] [CrossRef] - Tishby, N.; Pereira, F.; Bialek, W. The information bottleneck method. In Proceedings of the 37th Annual Allerton Conference on Communication, Control and Computing, Urbana, Illinois, USA, 22–24 September 1999; pp. 368–377. [Google Scholar]
- No, A.; Weissman, T. Universality of logarithmic loss in lossy compression. In Proceedings of the 2015 IEEE International Symposium on Information Theory (ISIT), Hongkong, China, 14–19 June 2015; pp. 2166–2170. [Google Scholar]
- Dawid, A.P.; Lauritzen, S.; Perry, M. Proper local scoring rules on discrete sample spaces. Ann. Stat.
**2012**, 40, 593–603. [Google Scholar] [CrossRef] - Bernardo, J.M. Expected Information as Expected Utility. Ann. Stat.
**1978**, 7, 686–690. [Google Scholar] [CrossRef] - Csiszár, I. Why least squares and maximum entropy? An axiomatic approach to inference for linear inverse problems. Ann. Stat.
**1991**, 19, 2032–2066. [Google Scholar] [CrossRef] - Bartlett, P.; Grünwald, P.; Harremoës, P.; Hedayati, F.; Kotlowski, W. Horizon-Independent Optimal Prediction with Log-Loss in Exponential Families. In Proceedings of the Conference on Learning Theory (COLT 2013), Princeton, NJ, USA, 12–14 June 2013; p. 23. [Google Scholar]
- Lieb, E.; Yngvason, J. A Guide to Entropy and the Second Law of Thermodynamics. Not. AMS
**1998**, 45, 571–581. [Google Scholar] - Lieb, E.; Yngvason, J. The Mathematics of the Second Law of Thermodynamics. In Visions in Mathematics; Alon, N., Bourgain, J., Connes, A., Gromov, M., Milman, V., Eds.; Birkhäuser: Basel, Switzerland, 2010; pp. 334–358. [Google Scholar]
- Marletto, C. Constructor Theory of Thermodynamics. arXiv, 2016; arXiv:1608.02625. [Google Scholar]
- Bauschke, H.H.; Borwein, J.M. Joint and Separate Convexity of the Bregman Distance. In Inherently Parallel Algorithms in Feasibility and Optimization and Their Applications; Dan Butnariu, Y.C., Reich, S., Eds.; Elsevier: Amsterdam, The Netherlands, 2001; Volume 8, pp. 23–36. [Google Scholar]
- Hansen, F.; Zhang, Z. Characterisation of Matrix Entropies. Lett. Math. Phys.
**2015**, 105, 1399–1411. [Google Scholar] [CrossRef] - Pitrik, J.; Virosztek, D. On the Joint Convexity of the Bregman Divergence of Matrices. Lett. Math. Phys.
**2015**, 105, 675–692. [Google Scholar] [CrossRef] - Topsøe, F. Game theoretical optimization inspired by information theory. J. Glob. Optim.
**2008**, 43, 553–564. [Google Scholar] [CrossRef] - Topsøe, F. Cognition and Inference in an Abstract Setting. In Proceedings of the Fourth Workshop on Information Theoretic Methods in Science and Engineering (WITMSE 2011), Helsinki, Finland, 7–10 August 2011; pp. 67–70. [Google Scholar]
- Deutch, D.; Marletto, C. Constructor theory of information. Proc. R. Soc. A
**2014**, 471, 20140540. [Google Scholar] [CrossRef] [PubMed] - Amari, S.I. α-Divergence Is Unique, Belonging to Both f-Divergence and Bregman Divergence Classes. IEEE Trans. Inf. Theory
**2009**, 55, 4925–4931. [Google Scholar] [CrossRef]

© 2017 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Harremoës, P. Divergence and Sufficiency for Convex Optimization. *Entropy* **2017**, *19*, 206.
https://doi.org/10.3390/e19050206

**AMA Style**

Harremoës P. Divergence and Sufficiency for Convex Optimization. *Entropy*. 2017; 19(5):206.
https://doi.org/10.3390/e19050206

**Chicago/Turabian Style**

Harremoës, Peter. 2017. "Divergence and Sufficiency for Convex Optimization" *Entropy* 19, no. 5: 206.
https://doi.org/10.3390/e19050206